xref: /openbmc/docs/designs/bmc-boot-ready.md (revision e87bd81d)
1# BMC Boot Ready
2
3Author: Andrew Geissler (geissonator)
4
5Other contributors:
6
7Created: May 12, 2022
8
9## Problem Description
10There are services which run on the BMC which are required for the BIOS (host
11firmware) to power on and boot the system. The goal of this design is to
12define a mechanism to ensure these dependencies are met before a power on
13or boot is started.
14
15For example, on some system, you can not power on the chassis until the VPD
16has been collected from the VRM's by the BMC to determine their characteristics.
17On other systems, the BIOS service is needed so the host firmware can look
18for any overrides.
19
20Currently, OpenBMC has an undefined behavior in this area. If a particular
21BMC has a large time gap between when the webserver is available and when all
22BMC services have completed running, there is a window there that a user could
23request a power on via the webserver when not all needed services are running.
24
25## Background and References
26
27The mailing list discussion can be found [here][1]. The BMC currently has
28three major [state][2] management interfaces in a system. The BMC, Chassis, and
29Host. Within each state interface, the current state and requested state are
30tracked.
31
32The [BMC][3] state object is considered `Ready` once the systemd
33`multi-user.target` has successfully started all if its services.
34
35There are three options that have been discussed to solve this issue:
361. D-Bus objects don't exist until the backend is prepared to handle them.
372. If a user tries something that system is not in proper state to handle then
38   return an error code.
393. If a user tries something that system is not in proper state to handle then
40   queue it up.
41
42Option 1 is challenging because D-Bus interfaces provided by OpenBMC state
43applications have a mix of read-only properties (like current state) and
44writeable properties that are used to request state changes. Not showing any
45until everything is available could have unknown consequences. This also has
46similar issues to option 2 in that applications and clients must have proper
47code to handle missing interfaces.
48
49Option 2 is challenging because Redfish clients and internal applications like
50the op-panel code now need to properly handle error codes like this. You can
51argue that they already should, but that is definitely not the case with a lot
52of OpenBMC applications and clients.
53
54Option 3 is the most user friendly option. No client or OpenBMC application
55changes would be needed. One concern is that having a system somewhat randomly
56power on at some later point in time could be unexpected. The general consensus
57in this review though has been that this is the most preferred option.
58
59[1]: https://lists.ozlabs.org/pipermail/openbmc/2022-April/030175.html
60[2]: https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/State
61[3]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/State/BMC.interface.yaml
62
63## Requirements
64
65- Queue up chassis and host requested state changes until the BMC is in the
66  proper state to allow the request
67  - What the "proper state" is will be implementation specific but by default
68    phosphor-state-manager will queue all requests until the BMC state has
69    reached `Ready`
70
71## Proposed Design
72
73If a power on or boot request is made to the Chassis or Host state objects and
74the BMC is not at `Ready` then the request will be queued and the state
75management code will begin monitoring for BMC `Ready`. Once reached, the
76requested operation will be automatically executed.
77
78## Alternatives Considered
79The "Background and References" section covered some alternative options
80and the complexity behind them.
81
82Another option is to code the dependencies directly into the services. For
83example, if the power on service requires the vrm vpd collection service,
84encode that dependency in the systemd files. This is easy to say but in practice
85has been challenging. Some OpenBMC services have built in assumptions that
86the multi-user.target and all of it's dependent services have completed prior
87to a power on being started. The general consensus within IBM was that it's
88much easier and safer to just have a global wait-for-bmc-ready function as
89proposed in this design.
90
91## Impacts
92
93Users will need to understand that their request to power on the system may
94be delayed by an undefined amount of time. In general, a BMC gets to Ready state
95within a couple of minutes.
96
97### Organizational
98This function will be implemented within the existing phosphor-state-manager
99repository. x86-power-control, an alternative to phosphor-state-manager, could
100also implement this logic.
101
102## Testing
103- Ensure a power on request is properly queued and executed when it is made
104  prior to the BMC being `Ready`.
105