1# Code Update Design 2 3Author: Jagpal Singh Gill <paligill@gmail.com> 4 5Created: 4th August 2023 6 7## Problem Description 8 9This section covers the limitations discoverd with 10[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 11 121. Current code update flow is complex as it involves 3 different daemons - 13 Image Manager, Image Updater and Update Service. 142. Update invocation flow has no explicit interface but rather depends upon the 15 discovery of a new file in /tmp/images by Image Manager. 163. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which 17 requires write access to filesystem. This poses a security risk. 184. Current design doesn't support parallel upgrades for different firmware 19 ([Issue](https://github.com/openbmc/bmcweb/issues/257)). 20 21## Background and References 22 23- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 24- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software) 25- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update) 26 27## Requirements 28 291. Able to start an update, given a firmware image and update settings. 30 31- Update settings shall be able to specify when to apply the image, for example 32 immediately or on device reset or on-demand. 33 342. Able to retrieve the update progress and status. 353. Able to produce an interface complaint with 36 [Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json) 374. Unprivileged daemons with access to DBus should be able to accept and perform 38 a firmware update. 395. Update request shall respond back immediately, so client can query the status 40 while update is in progress. 416. All errors shall propagate back to the client. 427. Able to support update for different type of hardware components such as 43 CPLD, NIC, BIOS, BIC, PCIe switches, etc. 448. Design shall impose no restriction to choose any specific image format. 459. Able to update multiple hardware components of same type running different 46 firmware images, for example, two instances of CPLDx residing on the board, 47 one performing functionX and other performing functionY and hence running 48 different firmware images. 4910. Able to update multiple components in parallel. 5011. Able to restrict critical system actions, such as reboot for entity under 51 update while the code update is in flight. 52 53## Proposed Design 54 55### Proposed End to End Flow 56 57```mermaid 58sequenceDiagram; 59participant CL as Client 60participant BMCW as BMCWeb 61participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX> 62 63% Bootstrap Action for CodeUpdater 64note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path 65note over CU: VersionId = Version Read from <deviceX> + Salt 66CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId> 67CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId> 68CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<deviceX>/<VersionId> <br> with Status = Active 69CU ->> CU: Create functional association <br> from Version to Inventory Item 70 71CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray) 72 73loop For every RedfishTargetURI 74 note over BMCW: Map RedfishTargetURI to<br> System Inventory Item 75 note over BMCW: Get object path (i.e. /xyz/openbmc_project/Software/<deviceX>/<VersionId>)<br>for associated Version interface to System Inventory Item 76 note over BMCW: Get serviceName corresponding to the object path <br>from mapper. 77 BMCW ->> CU: StartUpdate(Image, ApplyTime) 78 79 note over CU: Verify Image 80 break Image Verification FAILED 81 CU -->> BMCW: {NULL, Update.Error} 82 BMCW -->> CL: Return Error 83 end 84 note over CU: VersionId = Version from Image + Salt 85 note over CU: ObjectPath = /xyz/openbmc_project/Software/<deviceX>/<VersionId> 86 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath 87 CU -->> BMCW: {ObjectPath, Success} 88 CU ->> CU: << Delegate Update for asynchronous processing >> 89 90 par BMCWeb Processing 91 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath) 92 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath) 93 BMCW ->> BMCW: Create Task<br> to handle matcher notifications 94 BMCW -->> CL: <TaskNum> 95 loop 96 BMCW --) BMCW: Process notifications<br> and update Task attributes 97 CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum> 98 BMCW -->>CL: TaskStatus 99 end 100 and << Asynchronous Update in Progress >> 101 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = Ready 102 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath 103 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath 104 note over CU: Start Update 105 loop 106 CU --) BMCW: Notify ActivationProgress.Progress change 107 end 108 note over CU: Finish Update 109 CU ->> CU: Activation.Status = Active 110 CU --) BMCW: Notify Activation.Status change 111 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition 112 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress 113 alt ApplyTime == Immediate 114 note over CU: Reset Device and<br> update functional association to System Inventory Item 115 else 116 note over CU: Create active association to System Inventory Item 117 end 118 end 119end 120``` 121 122- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per 123 above flow) handling its update process and would need to implement the 124 proposed interfaces in next section. This satisfies the 125 [Requirement# 6](#requirements). 126- Since, there would be single daemon handling the update (as compared to 127 three), less hand shaking would be involved and hence addresses the 128 [Issue# 1](#problem-description) and [Requirement# 4](#requirements). 129 130### Proposed D-Bus Interface 131 132The DBus Interface for code update will consist of following - 133 134| Interface Name | Existing/New | Purpose | 135| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: | 136| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738) | New | Provides update method | 137| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml) | Existing | Provides version info | 138| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml) | Existing | Provides activation status | 139| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml) | Existing | Provides activation progress percentage | 140| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) | Existing | Signifies barrier for state transitions while update is in progress | 141| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml) | Existing | Provides the redundancy priority for the version interface | 142 143Introduction of xyz.openbmc_project.Software.Update interface streamlines the 144update invocation flow and hence addresses the [Issue# 2](#problem-description) 145and [Requirement# 1 & 2](#requirements). 146 147#### Association 148 149`running` : A `running` association from xyz.openbmc_project.Inventory.Item to 150xyz.openbmc_project.Software.Version represents the current functional or 151running software version for the associated inventory item. The `ran_on` would 152be the corresponding reverse association. 153 154`activating` : An `activating` association from 155xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version 156represents the activated (but not yet run) software version for the associated 157inventory item. There could be more than one active versions for an inventory 158item, for example, in case of A/B redundancy models there are 2 associated 159flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface 160defines the priority for each one. 161 162For A/B redundancy model with staging support, the 163xyz.openbmc_project.Software.Activation.Activations.Staged will help to define 164which software version is currently staged. 165 166The `activated_on` would be the corresponding reverse association. 167 168### Keep images in memory 169 170Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file 171descriptor rather than file path. Implementation needs to monitor appropriate 172memory limits to prevent parallel updates from running BMC out of memory. 173 174### Propagate errors to client 175 176xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any 177errors related to initial setup and image metadata/header parsing back to user. 178Any asynchronous errors which happen during the update process will be notified 179via failed activation status which maps to failed task associated with the 180update. Also, a phosphor-logging event will be created and sent back to client 181via 182[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json). 183 184Another alternative could be to use 185[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json). 186 187### Firmware Image Format 188 189Image parsing will be performed in \<deviceX>CodeUpdater and since 190\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any 191image format for the firmware image. This fulfills the 192[Requirement# 7](#requirements). 193 194### Multi part Images 195 196A multi part image has multiple component images as part of one image package. 197PLDM image is one such example of multi part image format. Sometimes, for multi 198part devices there is no concrete physical firmware device but firmware device 199itself consists of multiple phsyical components, each of which may have its own 200component image. In such a scenario, \<deviceX>CodeUpdater can create a logical 201inventory item for the firmware device. While performing the firmware device 202update, the client may target the logical firmware device which further knows 203how to update the corresponding child components for supplied component images. 204The user can also update the specific component by providing the image package 205with component as head node. The \<deviceX>CodeUpdater can implement the 206required logic to verify if the supplied image is targeted for itself (and child 207components) or not. 208 209### Update multiple devices of same type 210 211- For same type devices, extend the Dbus path to specify device instance, for 212 example, /xyz/openbmc_project/Software/\<deviceX>/\<InstanceNum>/\<VersionId>. 213 All the corresponding interfaces can reside on this path and same path will be 214 returned from xyz.openbmc_project.Software.Update.StartUpdate. 215 216This fulfills the [Requirement# 9](#requirements). 217 218### Parallel Upgrade 219 220- Different type hardware components: 221 222 Upgrade for different type hardware components can be handled either by 223 different <deviceX>CodeUpdater daemons or by a single daemon for hardware 224 components with common features, for example, PLDMd may handle update for 225 devices using PLDM specification. Such updates can be invoked in parallel from 226 BMCWeb and tracked via different tasks. 227 228- Similar type hardware component: 229 230 BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on 231 different D-Bus paths pertaining to each hardware instance. For more details 232 on D-Bus paths refer to 233 [Update multiple devices of same type](#update-multiple-devices-of-same-type). 234 235This fulfills the [Requirement# 9](#requirements). 236 237### Uninterrupted Updates 238 239`ActivationBlocksTransitions` interface will be created on the specific D-Bus 240path for a version update which will help to block any interruptions from 241critical system actions such as reboots. This interface can in turn start and 242stop services such as Boot Guard Service to prevent such interruptions. 243 244Moreover, when a device is being upgraded the sensor scanning for that device 245might need to be disabled. To achieve this, the sensor scanning flow can check 246for existence of `ActivationBlocksTransitions` interface on associated `Version` 247DBus path for the inventory item. If such interface exists, the sensor scanning 248for that device can be skipped by returning back relevant error (such as 249`EBUSY`) to the client. Another alternative is to check for existence of 250`ActivationBlocksTransitions` interface only if sensor scanning times out. This 251won't impact average case performance for sensor scanning but only the worst 252case scenario when device is busy, for example, due to update in progress. 253 254## Alternatives Considered 255 256### Centralized Design with Global Software Manager 257 258Single SoftwareManager which communicates with the BCMWeb, hosts all the 259interfaces such as Version, Activation, Progress for all hardware components 260within the system on different DBus paths. Software Manager keeps list of 261various hardware update services within the system and start them based on 262update request. These on-demand services update the hardware and interfaces 263hosted by Software Manager and exits. 264 265#### Pros 266 267- Most of the DBus interfaces gets implemented by Software Manager and vendors 268 would need to write minimal code to change properties for these interfaces 269 based on status and progress. 270- Under normal operating conditions (no update in flight), only Software Manager 271 will be running. 272 273#### Cons 274 275- Imposes the need of a common image format as Software Manager needs to parse 276 and verify the image for creating interfaces. 277- Limitation in the design, as there is a need to get the current running 278 version from the hardware at system bring up. So, Software Manager would need 279 to start each update daemon at system startup to get the running version. 280 281### Pull model for Status and Progress 282 283The proposed solution uses a push model where status and progress updates are 284asynchronously pushed to BMCWeb. Another alternative would be to use a pull 285model where Update interface can have get methods for status and progress (for 286example, getActivationStatus and getActivationProgress). 287 288#### Pros 289 290- Server doesn't have to maintain a Dbus matcher 291 ([Issue](https://github.com/openbmc/bmcweb/issues/202)). 292- Easier implementation in Server as no asynchronous handlers would be required. 293 294#### Cons 295 296- Server would still need maintain some info so it can map client's task status 297 request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling 298 getActivationStatus and getActivationProgress. 299- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more 300 of an implementation problem which can be resolved through implementation 301 changes. 302- Currently, activation and progress interfaces are being used in 303 [lot of Servers](#organizational). In future, harmonizing the flow to single 304 one will involve changing the push to pull model in all those places. With the 305 current proposal, the only change will be in update invocation flow. 306 307## Impacts 308 309The introduction of new DBus API will temporarily create two invocation flows 310from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code 311stacks. As all the code update daemons gets moved to the new flow, Servers would 312be changed to only support new API stack. No user-api impact as design adheres 313to Redfish UpdateService. 314 315## Organizational 316 317### Does this design require a new repository? 318 319Yes. There will be a device transport level repositories and multiple 320\<deviceX>CodeUpdater using similar transport layer can reside in same 321repository. For example, all devices using PMBus could have a common repository. 322 323### Who will be the initial maintainer(s) of this repository? 324 325Meta will propose repositories for following devices and `Jagpal Singh Gill` & 326`Patrick Williams` will be the maintainer for them. 327 328- VR Update 329- CPLD Update 330 331### Which repositories are expected to be modified to execute this design? 332 333Requires changes in following repositories to incorporate the new interface for 334update invocation - 335 336| Repository | Modification Owner | 337| :------------------------------------------------------------------------------ | :----------------- | 338| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) | Jagpal Singh Gill | 339| [BMCWeb](https://github.com/openbmc/bmcweb) | Jagpal Singh Gill | 340| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid) | Jagpal Singh Gill | 341| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update) | Jagpal Singh Gill | 342| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak | 343| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation) | Adriana Kobylak | 344 345NOTE: For 346[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code 347seems unused, so not tracking for change. 348 349## Testing 350 351### Unit Testing 352 353All the functional testing of the reference implementation will be performed 354using GTest. 355 356### Integration Testing 357 358The end to end integration testing involving Servers (for example BMCWeb) will 359be covered using openbmc-test-automation. 360