1# Code Update Design 2 3Author: Jagpal Singh Gill <paligill@gmail.com> 4 5Created: 4th August 2023 6 7## Problem Description 8 9This section covers the limitations discoverd with 10[phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 11 121. Current code update flow is complex as it involves 3 different daemons - 13 Image Manager, Image Updater and Update Service. 142. Update invocation flow has no explicit interface but rather depends upon the 15 discovery of a new file in /tmp/images by Image Manager. 163. Images POSTed via Redfish are downloaded by BMCWeb to /tmp/images which 17 requires write access to filesystem. This poses a security risk. 184. Current design doesn't support parallel upgrades for different firmware 19 ([Issue](https://github.com/openbmc/bmcweb/issues/257)). 20 21## Background and References 22 23- [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) 24- [Software DBus Interface](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/xyz/openbmc_project/Software) 25- [Code Update Design](https://github.com/openbmc/docs/tree/master/architecture/code-update) 26 27## Requirements 28 291. Able to start an update, given a firmware image and update settings. 30 31- Update settings shall be able to specify when to apply the image, for example 32 immediately or on device reset or on-demand. 33 342. Able to retrieve the update progress and status. 353. Able to produce an interface complaint with 36 [Redfish UpdateService](https://redfish.dmtf.org/schemas/v1/UpdateService.v1_11_3.json) 374. Unprivileged daemons with access to DBus should be able to accept and perform 38 a firmware update. 395. Update request shall respond back immediately, so client can query the status 40 while update is in progress. 416. All errors shall propagate back to the client. 427. Able to support update for different type of hardware components such as 43 CPLD, NIC, BIOS, BIC, PCIe switches, etc. 448. Design shall impose no restriction to choose any specific image format. 459. Able to update multiple hardware components of same type running different 46 firmware images, for example, two instances of CPLDx residing on the board, 47 one performing functionX and other performing functionY and hence running 48 different firmware images. 4910. Able to update multiple components in parallel. 5011. Able to restrict critical system actions, such as reboot for entity under 51 update while the code update is in flight. 52 53## Proposed Design 54 55### Proposed End to End Flow 56 57```mermaid 58sequenceDiagram; 59participant CL as Client 60participant BMCW as BMCWeb 61participant CU as <deviceX>CodeUpdater<br> ServiceName: xyz.openbmc_project.Software.<deviceX> 62 63% Bootstrap Action for CodeUpdater 64note over CU: Get device access info from<br> /xyz/openbmc_project/inventory/system/... path 65note over CU: Swid = <DeviceX>_<RandomId> 66CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at /xyz/openbmc_project/Software/<SwId> 67CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at /xyz/openbmc_project/Software/<SwId> 68CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at /xyz/openbmc_project/Software/<SwId> <br> with Status = Active 69CU ->> CU: Create functional association <br> from Version to Inventory Item 70 71CL ->> BMCW: HTTP POST: /redfish/v1/UpdateService/update <br> (Image, settings, RedfishTargetURIArray) 72 73loop For every RedfishTargetURI 74 note over BMCW: Map RedfishTargetURI /redfish/v1/UpdateService/FirmwareInventory/<SwId> to<br> Object path /xyz/openbmc_project/software/<SwId> 75 note over BMCW: Get serviceName corresponding to the object path <br>from mapper. 76 BMCW ->> CU: StartUpdate(Image, ApplyTime) 77 78 note over CU: Swid = <DeviceX>_<RandomId> 79 note over CU: ObjectPath = /xyz/openbmc_project/Software/<SwId> 80 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.Activation<br> at ObjectPath with Status = NotReady 81 CU -->> BMCW: {ObjectPath, Success} 82 CU ->> CU: << Delegate Update for asynchronous processing >> 83 84 par BMCWeb Processing 85 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.Activation,<br> ObjectPath) 86 BMCW ->> BMCW: Create Matcher<br>(PropertiesChanged,<br> xyz.openbmc_project.Software.ActivationProgress,<br> ObjectPath) 87 BMCW ->> BMCW: Create Task<br> to handle matcher notifications 88 BMCW -->> CL: <TaskNum> 89 loop 90 BMCW --) BMCW: Process notifications<br> and update Task attributes 91 CL ->> BMCW: /redfish/v1/TaskMonitor/<TaskNum> 92 BMCW -->>CL: TaskStatus 93 end 94 and << Asynchronous Update in Progress >> 95 note over CU: Verify Image 96 break Image Verification FAILED 97 CU ->> CU: Activation.Status = Invalid 98 CU --) BMCW: Notify Activation.Status change 99 end 100 CU ->> CU: Activation.Status = Ready 101 CU --) BMCW: Notify Activation.Status change 102 103 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Version<br> at ObjectPath 104 CU ->> CU: Create Interface<br>xyz.openbmc_project.Software.ActivationProgress<br> at ObjectPath 105 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition<br> at ObjectPath 106 CU ->> CU: Activation.Status = Activating 107 CU --) BMCW: Notify Activation.Status change 108 note over CU: Start Update 109 loop 110 CU --) BMCW: Notify ActivationProgress.Progress change 111 end 112 note over CU: Finish Update 113 CU ->> CU: Activation.Status = Active 114 CU --) BMCW: Notify Activation.Status change 115 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationBlocksTransition 116 CU ->> CU: Delete Interface<br> xyz.openbmc_project.Software.ActivationProgress 117 alt ApplyTime == Immediate 118 note over CU: Reset Device 119 CU ->> CU: Update functional association to System Inventory Item 120 CU ->> CU: Create Interface<br> xyz.openbmc_project.Software.Update<br> at ObjectPath 121 note over CU: Delete all interfaces on previous ObjectPath 122 else 123 note over CU: Create active association to System Inventory Item 124 end 125 end 126end 127``` 128 129- Each upgradable hardware type may have a separate daemon (\<deviceX\> as per 130 above flow) handling its update process and would need to implement the 131 proposed interfaces in next section. This satisfies the 132 [Requirement# 6](#requirements). 133- Since, there would be single daemon handling the update (as compared to 134 three), less hand shaking would be involved and hence addresses the 135 [Issue# 1](#problem-description) and [Requirement# 4](#requirements). 136 137### Proposed D-Bus Interface 138 139The DBus Interface for code update will consist of following - 140 141| Interface Name | Existing/New | Purpose | 142| :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------: | :-----------------------------------------------------------------: | 143| [xyz.openbmc_project.Software.Update](https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/65738) | New | Provides update method | 144| [xyz.openbmc_project.Software.Version](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Version.interface.yaml) | Existing | Provides version info | 145| [xyz.openbmc_project.Software.Activation](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/Activation.interface.yaml) | Existing | Provides activation status | 146| [xyz.openbmc_project.Software.ActivationProgress](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationProgress.interface.yaml) | Existing | Provides activation progress percentage | 147| [xyz.openbmc_project.Software.ActivationBlocksTransition](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/ActivationBlocksTransition.interface.yaml) | Existing | Signifies barrier for state transitions while update is in progress | 148| [xyz.openbmc_project.Software.RedundancyPriority](https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Software/RedundancyPriority.interface.yaml) | Existing | Provides the redundancy priority for the version interface | 149 150Introduction of xyz.openbmc_project.Software.Update interface streamlines the 151update invocation flow and hence addresses the [Issue# 2](#problem-description) 152and [Requirement# 1 & 2](#requirements). 153 154#### Association 155 156`running` : A `running` association from xyz.openbmc_project.Inventory.Item to 157xyz.openbmc_project.Software.Version represents the current functional or 158running software version for the associated inventory item. The `ran_on` would 159be the corresponding reverse association. 160 161`activating` : An `activating` association from 162xyz.openbmc_project.Inventory.Item to xyz.openbmc_project.Software.Version 163represents the activated (but not yet run) software version for the associated 164inventory item. There could be more than one active versions for an inventory 165item, for example, in case of A/B redundancy models there are 2 associated 166flash-banks and xyz.openbmc_project.Software.RedundancyPriority interface 167defines the priority for each one. 168 169For A/B redundancy model with staging support, the 170xyz.openbmc_project.Software.Activation.Activations.Staged will help to define 171which software version is currently staged. 172 173The `activated_on` would be the corresponding reverse association. 174 175### Keep images in memory 176 177Images will be kept in memory and passed to \<deviceX>CodeUpdater using a file 178descriptor rather than file path. Implementation needs to monitor appropriate 179memory limits to prevent parallel updates from running BMC out of memory. 180 181### Propagate errors to client 182 183xyz.openbmc_project.Software.Update.StartUpdate return value will propagate any 184errors related to initial setup and image metadata/header parsing back to user. 185Any asynchronous errors which happen during the update process will be notified 186via failed activation status which maps to failed task associated with the 187update. Also, a phosphor-logging event will be created and sent back to client 188via 189[Redfish Log Service](https://redfish.dmtf.org/schemas/v1/LogService.v1_4_0.json). 190 191Another alternative could be to use 192[Redfish Event Services](https://redfish.dmtf.org/schemas/v1/EventService.v1_10_0.json). 193 194### Firmware Image Format 195 196Image parsing will be performed in \<deviceX>CodeUpdater and since 197\<deviceX>CodeUpdater may be a device specific daemon, vendor may choose any 198image format for the firmware image. This fulfills the 199[Requirement# 7](#requirements). 200 201#### PLDM Image Packaging 202 203The PLDM for 204[Firmware Update Specification](https://www.dmtf.org/sites/default/files/standards/documents/DSP0267_1.3.0.pdf) 205provides a standardized packaging format for images, incorporating both standard 206and user-defined descriptors. This format can be utilized to package firmware 207update images for non-PLDM devices as well. For such devices, the CodeUpdater 208will parse the entity manager configuration to identify applicable PLDM 209descriptors, which can include but are not limited to the following - 210 211| PLDM Package Descriptor | Decsriptor Type | Description | 212| :---------------------: | :-------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | 213| IANA Enterprise ID | Standard | [IANA Enterprise Id](https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers) of the hardware vendor | 214| ASCII Model | Standard | Compatible hardware name (com.\<vendor\>.Hardware.\<XXX\>) specified by hardware vendor in [phosphor-dbus-interfaces](https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/yaml/com). | 215 216#### Entity Manager Configuration 217 218The entity manager configuration can provide firmware-related information as 219part of board configurations, which can be utilized for firmware validation and 220modeling device access details. These D-Bus objects can then be consumed by the 221CodeUpdater service to manage updates for the relevant firmware entities. 222 223For common firmware info definition 224[refer](https://gerrit.openbmc.org/c/openbmc/entity-manager/+/75947) 225 226The following example is one such instance of this definition for an i2c CPLD 227device firmware. 228 229```json 230"Exposes": [ 231 ... 232 { 233 "Name": "MB_LCMX02_2000HC", 234 "Type": "CPLDFirmware", 235 ... 236 "FirmwareInfo" : 237 { 238 "VendorIANA": 0000A015, 239 "CompatibleHardware": "com.meta.Hardware.Yosemite4.MedusaBoard.CPLD.LCMX02_2000HC" 240 } 241 ... 242 }, 243 ... 244] 245``` 246 247- `Name`: The name of the firmware entity instance. 248- `Type`: This field is used by the CodeUpdater service to determine which 249 firmware EM configurations it should process. 250- `VendorIANA`: This field maps to the `IANA Enterprise ID` descriptor in PLDM 251 package header. 252- `CompatibleHardware`: This field maps to the `ASCII Model` descriptor in PLDM 253 package header. 254 255### Multi part Images 256 257A multi part image has multiple component images as part of one image package. 258PLDM image is one such example of multi part image format. Sometimes, for multi 259part devices there is no concrete physical firmware device but firmware device 260itself consists of multiple phsyical components, each of which may have its own 261component image. In such a scenario, \<deviceX>CodeUpdater can create a logical 262inventory item for the firmware device. While performing the firmware device 263update, the client may target the logical firmware device which further knows 264how to update the corresponding child components for supplied component images. 265The user can also update the specific component by providing the image package 266with component as head node. The \<deviceX>CodeUpdater can implement the 267required logic to verify if the supplied image is targeted for itself (and child 268components) or not. 269 270### Update multiple devices of same type 271 272- For same type devices, extend the Dbus path to specify device instance, for 273 example, /xyz/openbmc_project/Software/\<deviceX>\_\<InstanceNum>\_\<SwId>. 274 All the corresponding interfaces can reside on this path and same path will be 275 returned from xyz.openbmc_project.Software.Update.StartUpdate. 276 277This fulfills the [Requirement# 9](#requirements). 278 279### Parallel Upgrade 280 281- Different type hardware components: 282 283 Upgrade for different type hardware components can be handled either by 284 different <deviceX>CodeUpdater daemons or by a single daemon for hardware 285 components with common features, for example, PLDMd may handle update for 286 devices using PLDM specification. Such updates can be invoked in parallel from 287 BMCWeb and tracked via different tasks. 288 289- Similar type hardware component: 290 291 BMCWeb will trigger xyz.openbmc_project.Software.Update.StartUpdate on 292 different D-Bus paths pertaining to each hardware instance. For more details 293 on D-Bus paths refer to 294 [Update multiple devices of same type](#update-multiple-devices-of-same-type). 295 296This fulfills the [Requirement# 9](#requirements). 297 298### Uninterrupted Updates 299 300`ActivationBlocksTransitions` interface will be created on the specific D-Bus 301path for a version update which will help to block any interruptions from 302critical system actions such as reboots. This interface can in turn start and 303stop services such as Boot Guard Service to prevent such interruptions. 304 305Moreover, when a device is being upgraded the sensor scanning for that device 306might need to be disabled. To achieve this, the sensor scanning flow can check 307for existence of `ActivationBlocksTransitions` interface on associated `Version` 308DBus path for the inventory item. If such interface exists, the sensor scanning 309for that device can be skipped by returning back relevant error (such as 310`EBUSY`) to the client. Another alternative is to check for existence of 311`ActivationBlocksTransitions` interface only if sensor scanning times out. This 312won't impact average case performance for sensor scanning but only the worst 313case scenario when device is busy, for example, due to update in progress. 314 315## Alternatives Considered 316 317### Centralized Design with Global Software Manager 318 319Single SoftwareManager which communicates with the BCMWeb, hosts all the 320interfaces such as Version, Activation, Progress for all hardware components 321within the system on different DBus paths. Software Manager keeps list of 322various hardware update services within the system and start them based on 323update request. These on-demand services update the hardware and interfaces 324hosted by Software Manager and exits. 325 326#### Pros 327 328- Most of the DBus interfaces gets implemented by Software Manager and vendors 329 would need to write minimal code to change properties for these interfaces 330 based on status and progress. 331- Under normal operating conditions (no update in flight), only Software Manager 332 will be running. 333 334#### Cons 335 336- Imposes the need of a common image format as Software Manager needs to parse 337 and verify the image for creating interfaces. 338- Limitation in the design, as there is a need to get the current running 339 version from the hardware at system bring up. So, Software Manager would need 340 to start each update daemon at system startup to get the running version. 341 342### Pull model for Status and Progress 343 344The proposed solution uses a push model where status and progress updates are 345asynchronously pushed to BMCWeb. Another alternative would be to use a pull 346model where Update interface can have get methods for status and progress (for 347example, getActivationStatus and getActivationProgress). 348 349#### Pros 350 351- Server doesn't have to maintain a Dbus matcher 352 ([Issue](https://github.com/openbmc/bmcweb/issues/202)). 353- Easier implementation in Server as no asynchronous handlers would be required. 354 355#### Cons 356 357- Server would still need maintain some info so it can map client's task status 358 request to Dbus path for /xyz/openbmc_project/Software/<deviceX> for calling 359 getActivationStatus and getActivationProgress. 360- Aforementioned [issue](https://github.com/openbmc/bmcweb/issues/202) is more 361 of an implementation problem which can be resolved through implementation 362 changes. 363- Currently, activation and progress interfaces are being used in 364 [lot of Servers](#organizational). In future, harmonizing the flow to single 365 one will involve changing the push to pull model in all those places. With the 366 current proposal, the only change will be in update invocation flow. 367 368## Impacts 369 370The introduction of new DBus API will temporarily create two invocation flows 371from Server. Servers (BMCWeb, IPMI, etc) can initially support both the code 372stacks. As all the code update daemons gets moved to the new flow, Servers would 373be changed to only support new API stack. No user-api impact as design adheres 374to Redfish UpdateService. 375 376## Organizational 377 378### Does this design require a new repository? 379 380Yes. There will be a device transport level repositories and multiple 381\<deviceX>CodeUpdater using similar transport layer can reside in same 382repository. For example, all devices using PMBus could have a common repository. 383 384### Who will be the initial maintainer(s) of this repository? 385 386Meta will propose repositories for following devices and `Jagpal Singh Gill` & 387`Patrick Williams` will be the maintainer for them. 388 389- VR Update 390- CPLD Update 391 392### Which repositories are expected to be modified to execute this design? 393 394Requires changes in following repositories to incorporate the new interface for 395update invocation - 396 397| Repository | Modification Owner | 398| :------------------------------------------------------------------------------ | :----------------- | 399| [phosphor-bmc-code-mgmt](https://github.com/openbmc/phosphor-bmc-code-mgmt) | Jagpal Singh Gill | 400| [BMCWeb](https://github.com/openbmc/bmcweb) | Jagpal Singh Gill | 401| [phosphor-host-ipmid](https://github.com/openbmc/phosphor-host-ipmid) | Jagpal Singh Gill | 402| [pldm](https://github.com/openbmc/pldm/tree/master/fw-update) | Jagpal Singh Gill | 403| [openpower-pnor-code-mgmt](https://github.com/openbmc/openpower-pnor-code-mgmt) | Adriana Kobylak | 404| [openbmc-test-automation](https://github.com/openbmc/openbmc-test-automation) | Adriana Kobylak | 405 406NOTE: For 407[phosphor-psu-code-mgmt](https://github.com/openbmc/phosphor-psu-code-mgmt) code 408seems unused, so not tracking for change. 409 410## Testing 411 412### Unit Testing 413 414All the functional testing of the reference implementation will be performed 415using GTest. 416 417### Integration Testing 418 419The end to end integration testing involving Servers (for example BMCWeb) will 420be covered using openbmc-test-automation. 421