xref: /openbmc/docs/designs/code-update.md (revision d376bef4f1aa7ca6ff2140841fd9ad61bff4adea)
1# Code Update Design
3Author: Jagpal Singh Gill <paligill@gmail.com>
5Created: 4th August 2023
7## Problem Description
9This section covers the limitations discoverd with
121. Current code update flow is complex as it involves 3 different daemons -
13   Image Manager, Image Updater and Update Service.
142. Update invocation flow has no explicit interface but rather depends upon the
15   discovery of a new file in /tmp/images by Image Manager.
163. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which
17   requires write access to filesystem. This poses a security risk.
184. Current design doesn't support parallel upgrades for different firmware
19   ([Issue](https://github.com/openbmc/bmcweb/issues/257)).
21## Background and References
23- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)
24- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software)
25- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update)
27## Requirements
291. Able to start an update, given a firmware image and update settings.
31- Update settings shall be able to specify when to apply the image, for example
32  immediately or on device reset or on-demand.
342. Able to retrieve the update progress and status.
353. Able to produce an interface complaint with
36   [Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json)
374. Unprivileged daemons with access to DBus should be able to accept and perform
38   a firmware update.
395. Update request shall respond back immediately, so client can query the status
40   while update is in progress.
416. All errors shall propagate back to the client.
427. Able to support update for different type of hardware components such as
43   CPLD, NIC, BIOS, BIC, PCIe switches, etc.
448. Design shall impose no restriction to choose any specific image format.
459. Able to update multiple hardware components of same type running different
46   firmware images, for example, two instances of CPLDx residing on the board,
47   one performing functionX and other performing functionY and hence running
48   different firmware images.
4910. Able to update multiple components in parallel.
5011. Able to restrict critical system actions, such as reboot for entity under
51    update while the code update is in flight.
53## Proposed Design
55### Proposed End to End Flow
59participant CL as Client
60participant BMCW as BMCWeb
61participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX>
63% Bootstrap Action for CodeUpdater
64note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path
65note over CU: Swid = <DeviceX>_<RandomId>
66CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<SwId>
67CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<SwId>
68CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<SwId> <br> with Status = Active
69CU ->> CU: Create functional association <br> from Version to Inventory Item
71CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray)
73loop For every RedfishTargetURI
74  note over BMCW: Map RedfishTargetURI /redfish/v1/UpdateService/FirmwareInventory/<SwId> to<br> Object path /xyz/openbmc_project/software/<SwId>
75  note over BMCW: Get serviceName corresponding to the object path <br>from mapper.
76  BMCW ->> CU: StartUpdate(Image, ApplyTime)
78  note over CU: Swid = <DeviceX>_<RandomId>
79  note over CU: ObjectPath = /xyz/openbmc_project/Software/<SwId>
80  CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = NotReady
81  CU -->> BMCW: {ObjectPath, Success}
82  CU ->> CU: << Delegate Update for asynchronous processing >>
84  par BMCWeb Processing
85      BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath)
86      BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath)
87      BMCW ->> BMCW: Create Task<br> to handle matcher notifications
88      BMCW -->> CL: <TaskNum>
89      loop
90          BMCW --) BMCW: Process notifications<br> and update Task attributes
91          CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum>
92          BMCW -->>CL: TaskStatus
93      end
94  and << Asynchronous Update in Progress >>
95      note over CU: Verify Image
96      break Image Verification FAILED
97        CU ->> CU: Activation.Status = Invalid
98        CU --) BMCW: Notify Activation.Status change
99      end
100      CU ->> CU: Activation.Status = Ready
101      CU --) BMCW: Notify Activation.Status change
103      CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath
104      CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath
105      CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath
106      CU ->> CU: Activation.Status = Activating
107      CU --) BMCW: Notify Activation.Status change
108      note over CU: Start Update
109      loop
110          CU --) BMCW: Notify ActivationProgress.Progress change
111      end
112      note over CU: Finish Update
113      CU ->> CU: Activation.Status = Active
114      CU --) BMCW: Notify Activation.Status change
115      CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition
116      CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress
117      alt ApplyTime == Immediate
118          note over CU: Reset Device
119          CU ->> CU: Update functional association to System Inventory Item
120          CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at ObjectPath
121          note over CU: Delete all interfaces on previous ObjectPath
122      else
123          note over CU: Create active association to System Inventory Item
124      end
125  end
129- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per
130  above flow) handling its update process and would need to implement the
131  proposed interfaces in next section. This satisfies the
132  [Requirement# 6](#requirements).
133- Since, there would be single daemon handling the update (as compared to
134  three), less hand shaking would be involved and hence addresses the
135  [Issue# 1](#problem-description) and [Requirement# 4](#requirements).
137### Proposed D-Bus Interface
139The DBus Interface for code update will consist of following -
141| Interface Name                                                                                                                                                                                         | Existing/New |                               Purpose                               |
142| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: |
143| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738)                                                                                           |     New      |                       Provides update method                        |
144| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml)                                       |   Existing   |                        Provides version info                        |
145| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml)                                 |   Existing   |                     Provides activation status                      |
146| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml)                 |   Existing   |               Provides activation progress percentage               |
147| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) |   Existing   | Signifies barrier for state transitions while update is in progress |
148| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml)                 |   Existing   |     Provides the redundancy priority for the version interface      |
150Introduction of xyz.openbmc_project.Software.Update interface streamlines the
151update invocation flow and hence addresses the [Issue# 2](#problem-description)
152and [Requirement# 1 & 2](#requirements).
154#### Association
156`running` : A `running` association from xyz.openbmc_project.Inventory.Item to
157xyz.openbmc_project.Software.Version represents the current functional or
158running software version for the associated inventory item. The `ran_on` would
159be the corresponding reverse association.
161`activating` : An `activating` association from
162xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version
163represents the activated (but not yet run) software version for the associated
164inventory item. There could be more than one active versions for an inventory
165item, for example, in case of A/B redundancy models there are 2 associated
166flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface
167defines the priority for each one.
169For A/B redundancy model with staging support, the
170xyz.openbmc_project.Software.Activation.Activations.Staged will help to define
171which software version is currently staged.
173The `activated_on` would be the corresponding reverse association.
175### Keep images in memory
177Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file
178descriptor rather than file path. Implementation needs to monitor appropriate
179memory limits to prevent parallel updates from running BMC out of memory.
181### Propagate errors to client
183xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any
184errors related to initial setup and image metadata/header parsing back to user.
185Any asynchronous errors which happen during the update process will be notified
186via failed activation status which maps to failed task associated with the
187update. Also, a phosphor-logging event will be created and sent back to client
189[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json).
191Another alternative could be to use
192[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json).
194### Firmware Image Format
196Image parsing will be performed in \<deviceX>CodeUpdater and since
197\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any
198image format for the firmware image. This fulfills the
199[Requirement# 7](#requirements).
201#### PLDM Image Packaging
203The PLDM for
204[Firmware Update Specification](https://www.dmtf.org/sites/default/files/standards/documents/DSP0267_1.3.0.pdf)
205provides a standardized packaging format for images, incorporating both standard
206and user-defined descriptors. This format can be utilized to package firmware
207update images for non-PLDM devices as well. For such devices, the CodeUpdater
208will parse the entity manager configuration to identify applicable PLDM
209descriptors, which can include but are not limited to the following -
211| PLDM Package Descriptor | Decsriptor Type |                                                                                           Description                                                                                            |
212| :---------------------: | :-------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
213|   IANA Enterprise ID    |    Standard     |                                       [IANA Enterprise Id](https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers) of the hardware vendor                                        |
214|       ASCII Model       |    Standard     | Compatible hardware name (com.\<vendor\>.Hardware.\<XXX\>) specified by hardware vendor in [phosphor-dbus-interfaces](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/com). |
216#### Entity Manager Configuration
218The entity manager configuration can provide firmware-related information as
219part of board configurations, which can be utilized for firmware validation and
220modeling device access details. These D-Bus objects can then be consumed by the
221CodeUpdater service to manage updates for the relevant firmware entities.
223For common firmware info definition
226The following example is one such instance of this definition for an i2c CPLD
227device firmware.
230"Exposes": [
231  ...
232  {
233    "Name": "MB_LCMX02_2000HC",
234    "Type": "CPLDFirmware",
235    ...
236    "FirmwareInfo" :
237    {
238      "VendorIANA": 0000A015,
239      "CompatibleHardware": "com.meta.Hardware.Yosemite4.MedusaBoard.CPLD.LCMX02_2000HC"
240    }
241    ...
242  },
243  ...
247- `Name`: The name of the firmware entity instance.
248- `Type`: This field is used by the CodeUpdater service to determine which
249  firmware EM configurations it should process.
250- `VendorIANA`: This field maps to the `IANA Enterprise ID` descriptor in PLDM
251  package header.
252- `CompatibleHardware`: This field maps to the `ASCII Model` descriptor in PLDM
253  package header.
255### Multi part Images
257A multi part image has multiple component images as part of one image package.
258PLDM image is one such example of multi part image format. Sometimes, for multi
259part devices there is no concrete physical firmware device but firmware device
260itself consists of multiple phsyical components, each of which may have its own
261component image. In such a scenario, \<deviceX>CodeUpdater can create a logical
262inventory item for the firmware device. While performing the firmware device
263update, the client may target the logical firmware device which further knows
264how to update the corresponding child components for supplied component images.
265The user can also update the specific component by providing the image package
266with component as head node. The \<deviceX>CodeUpdater can implement the
267required logic to verify if the supplied image is targeted for itself (and child
268components) or not.
270### Update multiple devices of same type
272- For same type devices, extend the Dbus path to specify device instance, for
273  example, /xyz/openbmc_project/Software/\<deviceX>\_\<InstanceNum>\_\<SwId>.
274  All the corresponding interfaces can reside on this path and same path will be
275  returned from xyz.openbmc_project.Software.Update.StartUpdate.
277This fulfills the [Requirement# 9](#requirements).
279### Parallel Upgrade
281- Different type hardware components:
283  Upgrade for different type hardware components can be handled either by
284  different <deviceX>CodeUpdater daemons or by a single daemon for hardware
285  components with common features, for example, PLDMd may handle update for
286  devices using PLDM specification. Such updates can be invoked in parallel from
287  BMCWeb and tracked via different tasks.
289- Similar type hardware component:
291  BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on
292  different D-Bus paths pertaining to each hardware instance. For more details
293  on D-Bus paths refer to
294  [Update multiple devices of same type](#update-multiple-devices-of-same-type).
296This fulfills the [Requirement# 9](#requirements).
298### Uninterrupted Updates
300`ActivationBlocksTransitions` interface will be created on the specific D-Bus
301path for a version update which will help to block any interruptions from
302critical system actions such as reboots. This interface can in turn start and
303stop services such as Boot Guard Service to prevent such interruptions.
305Moreover, when a device is being upgraded the sensor scanning for that device
306might need to be disabled. To achieve this, the sensor scanning flow can check
307for existence of `ActivationBlocksTransitions` interface on associated `Version`
308DBus path for the inventory item. If such interface exists, the sensor scanning
309for that device can be skipped by returning back relevant error (such as
310`EBUSY`) to the client. Another alternative is to check for existence of
311`ActivationBlocksTransitions` interface only if sensor scanning times out. This
312won't impact average case performance for sensor scanning but only the worst
313case scenario when device is busy, for example, due to update in progress.
315## Alternatives Considered
317### Centralized Design with Global Software Manager
319Single SoftwareManager which communicates with the BCMWeb, hosts all the
320interfaces such as Version, Activation, Progress for all hardware components
321within the system on different DBus paths. Software Manager keeps list of
322various hardware update services within the system and start them based on
323update request. These on-demand services update the hardware and interfaces
324hosted by Software Manager and exits.
326#### Pros
328- Most of the DBus interfaces gets implemented by Software Manager and vendors
329  would need to write minimal code to change properties for these interfaces
330  based on status and progress.
331- Under normal operating conditions (no update in flight), only Software Manager
332  will be running.
334#### Cons
336- Imposes the need of a common image format as Software Manager needs to parse
337  and verify the image for creating interfaces.
338- Limitation in the design, as there is a need to get the current running
339  version from the hardware at system bring up. So, Software Manager would need
340  to start each update daemon at system startup to get the running version.
342### Pull model for Status and Progress
344The proposed solution uses a push model where status and progress updates are
345asynchronously pushed to BMCWeb. Another alternative would be to use a pull
346model where Update interface can have get methods for status and progress (for
347example, getActivationStatus and getActivationProgress).
349#### Pros
351- Server doesn't have to maintain a Dbus matcher
352  ([Issue](https://github.com/openbmc/bmcweb/issues/202)).
353- Easier implementation in Server as no asynchronous handlers would be required.
355#### Cons
357- Server would still need maintain some info so it can map client's task status
358  request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling
359  getActivationStatus and getActivationProgress.
360- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more
361  of an implementation problem which can be resolved through implementation
362  changes.
363- Currently, activation and progress interfaces are being used in
364  [lot of Servers](#organizational). In future, harmonizing the flow to single
365  one will involve changing the push to pull model in all those places. With the
366  current proposal, the only change will be in update invocation flow.
368## Impacts
370The introduction of new DBus API will temporarily create two invocation flows
371from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code
372stacks. As all the code update daemons gets moved to the new flow, Servers would
373be changed to only support new API stack. No user-api impact as design adheres
374to Redfish UpdateService.
376## Organizational
378### Does this design require a new repository?
380Yes. There will be a device transport level repositories and multiple
381\<deviceX>CodeUpdater using similar transport layer can reside in same
382repository. For example, all devices using PMBus could have a common repository.
384### Who will be the initial maintainer(s) of this repository?
386Meta will propose repositories for following devices and `Jagpal Singh Gill` &
387`Patrick Williams` will be the maintainer for them.
389- VR Update
390- CPLD Update
392### Which repositories are expected to be modified to execute this design?
394Requires changes in following repositories to incorporate the new interface for
395update invocation -
397| Repository                                                                      | Modification Owner |
398| :------------------------------------------------------------------------------ | :----------------- |
399| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt)     | Jagpal Singh Gill  |
400| [BMCWeb](https://github.com/openbmc/bmcweb)                                     | Jagpal Singh Gill  |
401| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid)           | Jagpal Singh Gill  |
402| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update)                   | Jagpal Singh Gill  |
403| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak    |
404| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation)   | Adriana Kobylak    |
406NOTE: For
407[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code
408seems unused, so not tracking for change.
410## Testing
412### Unit Testing
414All the functional testing of the reference implementation will be performed
415using GTest.
417### Integration Testing
419The end to end integration testing involving Servers (for example BMCWeb) will
420be covered using openbmc-test-automation.