1# Code Update Design 2 3Author: Jagpal Singh Gill <paligill@gmail.com> 4 5Created: 4th August 2023 6 7## Problem Description 8 9This section covers the limitations discoverd with 10[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 11 121. Current code update flow is complex as it involves 3 different daemons - 13 Image Manager, Image Updater and Update Service. 142. Update invocation flow has no explicit interface but rather depends upon the 15 discovery of a new file in /tmp/images by Image Manager. 163. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which 17 requires write access to filesystem. This poses a security risk. 184. Current design doesn't support parallel upgrades for different firmware 19 ([Issue](https://github.com/openbmc/bmcweb/issues/257)). 20 21## Background and References 22 23- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 24- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software) 25- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update) 26 27## Requirements 28 291. Able to start an update, given a firmware image and update settings. 30 31- Update settings shall be able to specify when to apply the image, for example 32 immediately or on device reset or on-demand. 33 342. Able to retrieve the update progress and status. 353. Able to produce an interface complaint with 36 [Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json) 374. Unprivileged daemons with access to DBus should be able to accept and perform 38 a firmware update. 395. Update request shall respond back immediately, so client can query the status 40 while update is in progress. 416. All errors shall propagate back to the client. 427. Able to support update for different type of hardware components such as 43 CPLD, NIC, BIOS, BIC, PCIe switches, etc. 448. Design shall impose no restriction to choose any specific image format. 459. Able to update multiple hardware components of same type running different 46 firmware images, for example, two instances of CPLDx residing on the board, 47 one performing functionX and other performing functionY and hence running 48 different firmware images. 4910. Able to update multiple components in parallel. 5011. Able to restrict critical system actions, such as reboot for entity under 51 update while the code update is in flight. 52 53## Proposed Design 54 55### Proposed End to End Flow 56 57```mermaid 58sequenceDiagram; 59participant CL as Client 60participant BMCW as BMCWeb 61participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX> 62 63% Bootstrap Action for CodeUpdater 64note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path 65note over CU: Swid = <DeviceX>_<RandomId> 66CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<SwId> 67CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<SwId> 68CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<SwId> <br> with Status = Active 69CU ->> CU: Create functional association <br> from Version to Inventory Item 70 71CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray) 72 73loop For every RedfishTargetURI 74 note over BMCW: Map RedfishTargetURI /redfish/v1/UpdateService/FirmwareInventory/<SwId> to<br> Object path /xyz/openbmc_project/software/<SwId> 75 note over BMCW: Get serviceName corresponding to the object path <br>from mapper. 76 BMCW ->> CU: StartUpdate(Image, ApplyTime) 77 78 note over CU: Verify Image 79 break Image Verification FAILED 80 CU -->> BMCW: {NULL, Update.Error} 81 BMCW -->> CL: Return Error 82 end 83 note over CU: Swid = <DeviceX>_<RandomId> 84 note over CU: ObjectPath = /xyz/openbmc_project/Software/<SwId> 85 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath 86 CU -->> BMCW: {ObjectPath, Success} 87 CU ->> CU: << Delegate Update for asynchronous processing >> 88 89 par BMCWeb Processing 90 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath) 91 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath) 92 BMCW ->> BMCW: Create Task<br> to handle matcher notifications 93 BMCW -->> CL: <TaskNum> 94 loop 95 BMCW --) BMCW: Process notifications<br> and update Task attributes 96 CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum> 97 BMCW -->>CL: TaskStatus 98 end 99 and << Asynchronous Update in Progress >> 100 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = Ready 101 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath 102 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath 103 note over CU: Start Update 104 loop 105 CU --) BMCW: Notify ActivationProgress.Progress change 106 end 107 note over CU: Finish Update 108 CU ->> CU: Activation.Status = Active 109 CU --) BMCW: Notify Activation.Status change 110 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition 111 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress 112 alt ApplyTime == Immediate 113 note over CU: Reset Device and<br> update functional association to System Inventory Item 114 else 115 note over CU: Create active association to System Inventory Item 116 end 117 end 118end 119``` 120 121- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per 122 above flow) handling its update process and would need to implement the 123 proposed interfaces in next section. This satisfies the 124 [Requirement# 6](#requirements). 125- Since, there would be single daemon handling the update (as compared to 126 three), less hand shaking would be involved and hence addresses the 127 [Issue# 1](#problem-description) and [Requirement# 4](#requirements). 128 129### Proposed D-Bus Interface 130 131The DBus Interface for code update will consist of following - 132 133| Interface Name | Existing/New | Purpose | 134| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: | 135| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738) | New | Provides update method | 136| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml) | Existing | Provides version info | 137| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml) | Existing | Provides activation status | 138| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml) | Existing | Provides activation progress percentage | 139| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) | Existing | Signifies barrier for state transitions while update is in progress | 140| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml) | Existing | Provides the redundancy priority for the version interface | 141 142Introduction of xyz.openbmc_project.Software.Update interface streamlines the 143update invocation flow and hence addresses the [Issue# 2](#problem-description) 144and [Requirement# 1 & 2](#requirements). 145 146#### Association 147 148`running` : A `running` association from xyz.openbmc_project.Inventory.Item to 149xyz.openbmc_project.Software.Version represents the current functional or 150running software version for the associated inventory item. The `ran_on` would 151be the corresponding reverse association. 152 153`activating` : An `activating` association from 154xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version 155represents the activated (but not yet run) software version for the associated 156inventory item. There could be more than one active versions for an inventory 157item, for example, in case of A/B redundancy models there are 2 associated 158flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface 159defines the priority for each one. 160 161For A/B redundancy model with staging support, the 162xyz.openbmc_project.Software.Activation.Activations.Staged will help to define 163which software version is currently staged. 164 165The `activated_on` would be the corresponding reverse association. 166 167### Keep images in memory 168 169Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file 170descriptor rather than file path. Implementation needs to monitor appropriate 171memory limits to prevent parallel updates from running BMC out of memory. 172 173### Propagate errors to client 174 175xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any 176errors related to initial setup and image metadata/header parsing back to user. 177Any asynchronous errors which happen during the update process will be notified 178via failed activation status which maps to failed task associated with the 179update. Also, a phosphor-logging event will be created and sent back to client 180via 181[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json). 182 183Another alternative could be to use 184[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json). 185 186### Firmware Image Format 187 188Image parsing will be performed in \<deviceX>CodeUpdater and since 189\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any 190image format for the firmware image. This fulfills the 191[Requirement# 7](#requirements). 192 193### Multi part Images 194 195A multi part image has multiple component images as part of one image package. 196PLDM image is one such example of multi part image format. Sometimes, for multi 197part devices there is no concrete physical firmware device but firmware device 198itself consists of multiple phsyical components, each of which may have its own 199component image. In such a scenario, \<deviceX>CodeUpdater can create a logical 200inventory item for the firmware device. While performing the firmware device 201update, the client may target the logical firmware device which further knows 202how to update the corresponding child components for supplied component images. 203The user can also update the specific component by providing the image package 204with component as head node. The \<deviceX>CodeUpdater can implement the 205required logic to verify if the supplied image is targeted for itself (and child 206components) or not. 207 208### Update multiple devices of same type 209 210- For same type devices, extend the Dbus path to specify device instance, for 211 example, /xyz/openbmc_project/Software/\<deviceX>\_\<InstanceNum>\_\<SwId>. 212 All the corresponding interfaces can reside on this path and same path will be 213 returned from xyz.openbmc_project.Software.Update.StartUpdate. 214 215This fulfills the [Requirement# 9](#requirements). 216 217### Parallel Upgrade 218 219- Different type hardware components: 220 221 Upgrade for different type hardware components can be handled either by 222 different <deviceX>CodeUpdater daemons or by a single daemon for hardware 223 components with common features, for example, PLDMd may handle update for 224 devices using PLDM specification. Such updates can be invoked in parallel from 225 BMCWeb and tracked via different tasks. 226 227- Similar type hardware component: 228 229 BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on 230 different D-Bus paths pertaining to each hardware instance. For more details 231 on D-Bus paths refer to 232 [Update multiple devices of same type](#update-multiple-devices-of-same-type). 233 234This fulfills the [Requirement# 9](#requirements). 235 236### Uninterrupted Updates 237 238`ActivationBlocksTransitions` interface will be created on the specific D-Bus 239path for a version update which will help to block any interruptions from 240critical system actions such as reboots. This interface can in turn start and 241stop services such as Boot Guard Service to prevent such interruptions. 242 243Moreover, when a device is being upgraded the sensor scanning for that device 244might need to be disabled. To achieve this, the sensor scanning flow can check 245for existence of `ActivationBlocksTransitions` interface on associated `Version` 246DBus path for the inventory item. If such interface exists, the sensor scanning 247for that device can be skipped by returning back relevant error (such as 248`EBUSY`) to the client. Another alternative is to check for existence of 249`ActivationBlocksTransitions` interface only if sensor scanning times out. This 250won't impact average case performance for sensor scanning but only the worst 251case scenario when device is busy, for example, due to update in progress. 252 253## Alternatives Considered 254 255### Centralized Design with Global Software Manager 256 257Single SoftwareManager which communicates with the BCMWeb, hosts all the 258interfaces such as Version, Activation, Progress for all hardware components 259within the system on different DBus paths. Software Manager keeps list of 260various hardware update services within the system and start them based on 261update request. These on-demand services update the hardware and interfaces 262hosted by Software Manager and exits. 263 264#### Pros 265 266- Most of the DBus interfaces gets implemented by Software Manager and vendors 267 would need to write minimal code to change properties for these interfaces 268 based on status and progress. 269- Under normal operating conditions (no update in flight), only Software Manager 270 will be running. 271 272#### Cons 273 274- Imposes the need of a common image format as Software Manager needs to parse 275 and verify the image for creating interfaces. 276- Limitation in the design, as there is a need to get the current running 277 version from the hardware at system bring up. So, Software Manager would need 278 to start each update daemon at system startup to get the running version. 279 280### Pull model for Status and Progress 281 282The proposed solution uses a push model where status and progress updates are 283asynchronously pushed to BMCWeb. Another alternative would be to use a pull 284model where Update interface can have get methods for status and progress (for 285example, getActivationStatus and getActivationProgress). 286 287#### Pros 288 289- Server doesn't have to maintain a Dbus matcher 290 ([Issue](https://github.com/openbmc/bmcweb/issues/202)). 291- Easier implementation in Server as no asynchronous handlers would be required. 292 293#### Cons 294 295- Server would still need maintain some info so it can map client's task status 296 request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling 297 getActivationStatus and getActivationProgress. 298- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more 299 of an implementation problem which can be resolved through implementation 300 changes. 301- Currently, activation and progress interfaces are being used in 302 [lot of Servers](#organizational). In future, harmonizing the flow to single 303 one will involve changing the push to pull model in all those places. With the 304 current proposal, the only change will be in update invocation flow. 305 306## Impacts 307 308The introduction of new DBus API will temporarily create two invocation flows 309from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code 310stacks. As all the code update daemons gets moved to the new flow, Servers would 311be changed to only support new API stack. No user-api impact as design adheres 312to Redfish UpdateService. 313 314## Organizational 315 316### Does this design require a new repository? 317 318Yes. There will be a device transport level repositories and multiple 319\<deviceX>CodeUpdater using similar transport layer can reside in same 320repository. For example, all devices using PMBus could have a common repository. 321 322### Who will be the initial maintainer(s) of this repository? 323 324Meta will propose repositories for following devices and `Jagpal Singh Gill` & 325`Patrick Williams` will be the maintainer for them. 326 327- VR Update 328- CPLD Update 329 330### Which repositories are expected to be modified to execute this design? 331 332Requires changes in following repositories to incorporate the new interface for 333update invocation - 334 335| Repository | Modification Owner | 336| :------------------------------------------------------------------------------ | :----------------- | 337| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) | Jagpal Singh Gill | 338| [BMCWeb](https://github.com/openbmc/bmcweb) | Jagpal Singh Gill | 339| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid) | Jagpal Singh Gill | 340| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update) | Jagpal Singh Gill | 341| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak | 342| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation) | Adriana Kobylak | 343 344NOTE: For 345[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code 346seems unused, so not tracking for change. 347 348## Testing 349 350### Unit Testing 351 352All the functional testing of the reference implementation will be performed 353using GTest. 354 355### Integration Testing 356 357The end to end integration testing involving Servers (for example BMCWeb) will 358be covered using openbmc-test-automation. 359