xref: /openbmc/docs/designs/bmc-boot-ready.md (revision d045c8aa)
1# BMC Boot Ready
2
3Author: Andrew Geissler (geissonator)
4
5Other contributors:
6
7Created: May 12, 2022
8
9## Problem Description
10
11There are services which run on the BMC which are required for the BIOS (host
12firmware) to power on and boot the system. The goal of this design is to define
13a mechanism to ensure these dependencies are met before a power on or boot is
14started.
15
16For example, on some system, you can not power on the chassis until the VPD has
17been collected from the VRM's by the BMC to determine their characteristics. On
18other systems, the BIOS service is needed so the host firmware can look for any
19overrides.
20
21Currently, OpenBMC has an undefined behavior in this area. If a particular BMC
22has a large time gap between when the webserver is available and when all BMC
23services have completed running, there is a window there that a user could
24request a power on via the webserver when not all needed services are running.
25
26## Background and References
27
28The mailing list discussion can be found [here][1]. The BMC currently has three
29major [state][2] management interfaces in a system. The BMC, Chassis, and Host.
30Within each state interface, the current state and requested state are tracked.
31
32The [BMC][3] state object is considered `Ready` once the systemd
33`multi-user.target` has successfully started all if its services.
34
35There are three options that have been discussed to solve this issue:
36
371. D-Bus objects don't exist until the backend is prepared to handle them.
382. If a user tries something that system is not in proper state to handle then
39   return an error code.
403. If a user tries something that system is not in proper state to handle then
41   queue it up.
42
43Option 1 is challenging because D-Bus interfaces provided by OpenBMC state
44applications have a mix of read-only properties (like current state) and
45writeable properties that are used to request state changes. Not showing any
46until everything is available could have unknown consequences. This also has
47similar issues to option 2 in that applications and clients must have proper
48code to handle missing interfaces.
49
50Option 2 is challenging because Redfish clients and internal applications like
51the op-panel code now need to properly handle error codes like this. You can
52argue that they already should, but that is definitely not the case with a lot
53of OpenBMC applications and clients.
54
55Option 3 is the most user friendly option. No client or OpenBMC application
56changes would be needed. One concern is that having a system somewhat randomly
57power on at some later point in time could be unexpected. The general consensus
58in this review though has been that this is the most preferred option.
59
60[1]: https://lists.ozlabs.org/pipermail/openbmc/2022-April/030175.html
61[2]:
62  https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/State
63[3]:
64  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/State/BMC.interface.yaml
65
66## Requirements
67
68- Queue up chassis and host requested state changes until the BMC is in the
69  proper state to allow the request
70  - What the "proper state" is will be implementation specific but by default
71    phosphor-state-manager will queue all requests until the BMC state has
72    reached `Ready`
73
74## Proposed Design
75
76If a power on or boot request is made to the Chassis or Host state objects and
77the BMC is not at `Ready` then the request will be queued and the state
78management code will begin monitoring for BMC `Ready`. Once reached, the
79requested operation will be automatically executed.
80
81## Alternatives Considered
82
83The "Background and References" section covered some alternative options and the
84complexity behind them.
85
86Another option is to code the dependencies directly into the services. For
87example, if the power on service requires the vrm vpd collection service, encode
88that dependency in the systemd files. This is easy to say but in practice has
89been challenging. Some OpenBMC services have built in assumptions that the
90multi-user.target and all of it's dependent services have completed prior to a
91power on being started. The general consensus within IBM was that it's much
92easier and safer to just have a global wait-for-bmc-ready function as proposed
93in this design.
94
95## Impacts
96
97Users will need to understand that their request to power on the system may be
98delayed by an undefined amount of time. In general, a BMC gets to Ready state
99within a couple of minutes.
100
101### Organizational
102
103This function will be implemented within the existing phosphor-state-manager
104repository. x86-power-control, an alternative to phosphor-state-manager, could
105also implement this logic.
106
107## Testing
108
109- Ensure a power on request is properly queued and executed when it is made
110  prior to the BMC being `Ready`.
111