xref: /openbmc/docs/designs/code-update.md (revision 5ef3526d)
1# Code Update Design
2
3Author: Jagpal Singh Gill <paligill@gmail.com>
4
5Created: 4th August 2023
6
7## Problem Description
8
9This section covers the limitations discoverd with
10[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
11
121. Current code update flow is complex as it involves 3 different daemons -
13   Image Manager, Image Updater and Update Service.
142. Update invocation flow has no explicit interface but rather depends upon the
15   discovery of a new file in /tmp/images by Image Manager.
163. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which
17   requires write access to filesystem. This poses a security risk.
184. Current design doesn't support parallel upgrades for different firmware
19   ([Issue](https://github.com/openbmc/bmcweb/issues/257)).
20
21## Background and References
22
23- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
24- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software)
25- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update)
26
27## Requirements
28
291. Able to start an update, given a firmware image and update settings.
30
31- Update settings shall be able to specify when to apply the image, for example
32  immediately or on device reset or on-demand.
33
342. Able to retrieve the update progress and status.
353. Able to produce an interface complaint with
36   [Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json)
374. Unprivileged daemons with access to DBus should be able to accept and perform
38   a firmware update.
395. Update request shall respond back immediately, so client can query the status
40   while update is in progress.
416. All errors shall propagate back to the client.
427. Able to support update for different type of hardware components such as
43   CPLD, NIC, BIOS, BIC, PCIe switches, etc.
448. Design shall impose no restriction to choose any specific image format.
459. Able to update multiple hardware components of same type running different
46   firmware images, for example, two instances of CPLDx residing on the board,
47   one performing functionX and other performing functionY and hence running
48   different firmware images.
4910. Able to update multiple components in parallel.
5011. Able to restrict critical system actions, such as reboot for entity under
51    update while the code update is in flight.
52
53## Proposed Design
54
55### Proposed End to End Flow
56
57```mermaid
58sequenceDiagram;
59participant CL as Client
60participant BMCW as BMCWeb
61participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX>
62
63% Bootstrap Action for CodeUpdater
64note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path
65note over CU: VersionId = Version Read from <deviceX> + Salt
66CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId>
67CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId>
68CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId> <br> with Status = Active
69CU ->> CU: Create functional association <br> from Version to Inventory Item
70
71CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray)
72
73loop For every RedfishTargetURI
74  note over BMCW: Map RedfishTargetURI to<br> System Inventory Item
75  note over BMCW: Get object path (i.e. /xyz/openbmc_project/Software/<deviceX>/<VersionId>)<br>for associated Version interface to System Inventory Item
76  note over BMCW: Get serviceName corresponding to the object path <br>from mapper.
77  BMCW ->> CU: StartUpdate(Image, ApplyTime)
78
79  note over CU: Verify Image
80  break Image Verification FAILED
81      CU -->> BMCW: {NULL, Update.Error}
82      BMCW -->> CL: Return Error
83  end
84  note over CU: VersionId = Version from Image + Salt
85  note over CU: ObjectPath = /xyz/openbmc_project/Software/<deviceX>/<VersionId>
86  CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath
87  CU -->> BMCW: {ObjectPath, Success}
88  CU ->> CU: << Delegate Update for asynchronous processing >>
89
90  par BMCWeb Processing
91      BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath)
92      BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath)
93      BMCW ->> BMCW: Create Task<br> to handle matcher notifications
94      BMCW -->> CL: <TaskNum>
95      loop
96          BMCW --) BMCW: Process notifications<br> and update Task attributes
97          CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum>
98          BMCW -->>CL: TaskStatus
99      end
100  and << Asynchronous Update in Progress >>
101      CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = Ready
102      CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath
103      CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath
104      note over CU: Start Update
105      loop
106          CU --) BMCW: Notify ActivationProgress.Progress change
107      end
108      note over CU: Finish Update
109      CU ->> CU: Activation.Status = Active
110      CU --) BMCW: Notify Activation.Status change
111      CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition
112      CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress
113      alt ApplyTime == Immediate
114          note over CU: Reset Device and<br> update functional association to System Inventory Item
115      else
116          note over CU: Create active association to System Inventory Item
117      end
118  end
119end
120```
121
122- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per
123  above flow) handling its update process and would need to implement the
124  proposed interfaces in next section. This satisfies the
125  [Requirement# 6](#requirements).
126- Since, there would be single daemon handling the update (as compared to
127  three), less hand shaking would be involved and hence addresses the
128  [Issue# 1](#problem-description) and [Requirement# 4](#requirements).
129
130### Proposed D-Bus Interface
131
132The DBus Interface for code update will consist of following -
133
134| Interface Name                                                                                                                                                                                         | Existing/New |                               Purpose                               |
135| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: |
136| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738)                                                                                           |     New      |                       Provides update method                        |
137| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml)                                       |   Existing   |                        Provides version info                        |
138| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml)                                 |   Existing   |                     Provides activation status                      |
139| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml)                 |   Existing   |               Provides activation progress percentage               |
140| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) |   Existing   | Signifies barrier for state transitions while update is in progress |
141| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml)                 |   Existing   |     Provides the redundancy priority for the version interface      |
142
143Introduction of xyz.openbmc_project.Software.Update interface streamlines the
144update invocation flow and hence addresses the [Issue# 2](#problem-description)
145and [Requirement# 1 & 2](#requirements).
146
147#### Association
148
149`running` : A `running` association from xyz.openbmc_project.Inventory.Item to
150xyz.openbmc_project.Software.Version represents the current functional or
151running software version for the associated inventory item. The `ran_on` would
152be the corresponding reverse association.
153
154`activating` : An `activating` association from
155xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version
156represents the activated (but not yet run) software version for the associated
157inventory item. There could be more than one active versions for an inventory
158item, for example, in case of A/B redundancy models there are 2 associated
159flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface
160defines the priority for each one.
161
162For A/B redundancy model with staging support, the
163xyz.openbmc_project.Software.Activation.Activations.Staged will help to define
164which software version is currently staged.
165
166The `activated_on` would be the corresponding reverse association.
167
168### Keep images in memory
169
170Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file
171descriptor rather than file path. Implementation needs to monitor appropriate
172memory limits to prevent parallel updates from running BMC out of memory.
173
174### Propagate errors to client
175
176xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any
177errors related to initial setup and image metadata/header parsing back to user.
178Any asynchronous errors which happen during the update process will be notified
179via failed activation status which maps to failed task associated with the
180update. Also, a phosphor-logging event will be created and sent back to client
181via
182[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json).
183
184Another alternative could be to use
185[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json).
186
187### Firmware Image Format
188
189Image parsing will be performed in \<deviceX>CodeUpdater and since
190\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any
191image format for the firmware image. This fulfills the
192[Requirement# 7](#requirements).
193
194### Multi part Images
195
196A multi part image has multiple component images as part of one image package.
197PLDM image is one such example of multi part image format. Sometimes, for multi
198part devices there is no concrete physical firmware device but firmware device
199itself consists of multiple phsyical components, each of which may have its own
200component image. In such a scenario, \<deviceX>CodeUpdater can create a logical
201inventory item for the firmware device. While performing the firmware device
202update, the client may target the logical firmware device which further knows
203how to update the corresponding child components for supplied component images.
204The user can also update the specific component by providing the image package
205with component as head node. The \<deviceX>CodeUpdater can implement the
206required logic to verify if the supplied image is targeted for itself (and child
207components) or not.
208
209### Update multiple devices of same type
210
211- For same type devices, extend the Dbus path to specify device instance, for
212  example, /xyz/openbmc_project/Software/\<deviceX>/\<InstanceNum>/\<VersionId>.
213  All the corresponding interfaces can reside on this path and same path will be
214  returned from xyz.openbmc_project.Software.Update.StartUpdate.
215
216This fulfills the [Requirement# 9](#requirements).
217
218### Parallel Upgrade
219
220- Different type hardware components:
221
222  Upgrade for different type hardware components can be handled either by
223  different <deviceX>CodeUpdater daemons or by a single daemon for hardware
224  components with common features, for example, PLDMd may handle update for
225  devices using PLDM specification. Such updates can be invoked in parallel from
226  BMCWeb and tracked via different tasks.
227
228- Similar type hardware component:
229
230  BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on
231  different D-Bus paths pertaining to each hardware instance. For more details
232  on D-Bus paths refer to
233  [Update multiple devices of same type](#update-multiple-devices-of-same-type).
234
235This fulfills the [Requirement# 9](#requirements).
236
237### Uninterrupted Updates
238
239`ActivationBlocksTransitions` interface will be created on the specific D-Bus
240path for a version update which will help to block any interruptions from
241critical system actions such as reboots. This interface can in turn start and
242stop services such as Boot Guard Service to prevent such interruptions.
243
244Moreover, when a device is being upgraded the sensor scanning for that device
245might need to be disabled. To achieve this, the sensor scanning flow can check
246for existence of `ActivationBlocksTransitions` interface on associated `Version`
247DBus path for the inventory item. If such interface exists, the sensor scanning
248for that device can be skipped by returning back relevant error (such as
249`EBUSY`) to the client. Another alternative is to check for existence of
250`ActivationBlocksTransitions` interface only if sensor scanning times out. This
251won't impact average case performance for sensor scanning but only the worst
252case scenario when device is busy, for example, due to update in progress.
253
254## Alternatives Considered
255
256### Centralized Design with Global Software Manager
257
258Single SoftwareManager which communicates with the BCMWeb, hosts all the
259interfaces such as Version, Activation, Progress for all hardware components
260within the system on different DBus paths. Software Manager keeps list of
261various hardware update services within the system and start them based on
262update request. These on-demand services update the hardware and interfaces
263hosted by Software Manager and exits.
264
265#### Pros
266
267- Most of the DBus interfaces gets implemented by Software Manager and vendors
268  would need to write minimal code to change properties for these interfaces
269  based on status and progress.
270- Under normal operating conditions (no update in flight), only Software Manager
271  will be running.
272
273#### Cons
274
275- Imposes the need of a common image format as Software Manager needs to parse
276  and verify the image for creating interfaces.
277- Limitation in the design, as there is a need to get the current running
278  version from the hardware at system bring up. So, Software Manager would need
279  to start each update daemon at system startup to get the running version.
280
281### Pull model for Status and Progress
282
283The proposed solution uses a push model where status and progress updates are
284asynchronously pushed to BMCWeb. Another alternative would be to use a pull
285model where Update interface can have get methods for status and progress (for
286example, getActivationStatus and getActivationProgress).
287
288#### Pros
289
290- Server doesn't have to maintain a Dbus matcher
291  ([Issue](https://github.com/openbmc/bmcweb/issues/202)).
292- Easier implementation in Server as no asynchronous handlers would be required.
293
294#### Cons
295
296- Server would still need maintain some info so it can map client's task status
297  request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling
298  getActivationStatus and getActivationProgress.
299- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more
300  of an implementation problem which can be resolved through implementation
301  changes.
302- Currently, activation and progress interfaces are being used in
303  [lot of Servers](#organizational). In future, harmonizing the flow to single
304  one will involve changing the push to pull model in all those places. With the
305  current proposal, the only change will be in update invocation flow.
306
307## Impacts
308
309The introduction of new DBus API will temporarily create two invocation flows
310from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code
311stacks. As all the code update daemons gets moved to the new flow, Servers would
312be changed to only support new API stack. No user-api impact as design adheres
313to Redfish UpdateService.
314
315## Organizational
316
317### Does this design require a new repository?
318
319Yes. There will be a device transport level repositories and multiple
320\<deviceX>CodeUpdater using similar transport layer can reside in same
321repository. For example, all devices using PMBus could have a common repository.
322
323### Who will be the initial maintainer(s) of this repository?
324
325Meta will propose repositories for following devices and `Jagpal Singh Gill` &
326`Patrick Williams` will be the maintainer for them.
327
328- VR Update
329- CPLD Update
330
331### Which repositories are expected to be modified to execute this design?
332
333Requires changes in following repositories to incorporate the new interface for
334update invocation -
335
336| Repository                                                                      | Modification Owner |
337| :------------------------------------------------------------------------------ | :----------------- |
338| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)     | Jagpal Singh Gill  |
339| [BMCWeb](https://github.com/openbmc/bmcweb)                                     | Jagpal Singh Gill  |
340| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid)           | Jagpal Singh Gill  |
341| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update)                   | Jagpal Singh Gill  |
342| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak    |
343| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation)   | Adriana Kobylak    |
344
345NOTE: For
346[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code
347seems unused, so not tracking for change.
348
349## Testing
350
351### Unit Testing
352
353All the functional testing of the reference implementation will be performed
354using GTest.
355
356### Integration Testing
357
358The end to end integration testing involving Servers (for example BMCWeb) will
359be covered using openbmc-test-automation.
360