1# Error and Event Logging 2 3Author: [Patrick Williams][patrick-email] `<stwcx>` 4 5[patrick-email]: mailto:patrick@stwcx.xyz 6 7Other contributors: 8 9Created: May 16, 2024 10 11## Problem Description 12 13There is currently not a consistent end-to-end error and event reporting design 14for the OpenBMC code stack. There are two different implementations, one 15primarily using phosphor-logging and one using rsyslog, both of which have gaps 16that a complete solution should address. This proposal is intended to be an 17end-to-end design handling both errors and tracing events which facilitate 18external management of the system in an automated and maintainable manner. 19 20## Background and References 21 22### Redfish LogEntry and Message Registry 23 24In Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that 25could be considered "logs", but one such use within OpenBMC is for an equivalent 26of the IPMI "System Event Log (SEL)". 27 28The IPMI SEL is the location where the BMC can collect errors and events, 29sometimes coming from other entities, such as the BIOS. Examples of these might 30be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful". 31These SEL records are exposed as human readable strings, either natively by a 32OEM SEL design or by tools such as `ipmitool`, which are typically unique to 33each system or manufacturer, and could hypothethically change with a BMC or 34firmware update, and are thus difficult to create automated tooling around. Two 35different vendors might use different strings to represent a critical 36temperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example] 37and ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is 38also no mechanism with IPMI to ask the machine "what are all of the SELs you 39might create". 40 41In order to solve two aspects of this problem, listing of possible events and 42versioning, Redfish has Message Registries. A message registry is a versioned 43collection of all of the error events that a system could generate and hints as 44to how they might be parsed and displayed to a user. An [informative 45reference][Registry-Example] from the DMTF gives this example: 46 47```json 48{ 49 "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry", 50 "Id": "Alert.1.0.0", 51 "RegistryPrefix": "Alert", 52 "RegistryVersion": "1.0.0", 53 "Messages": { 54 "LanDisconnect": { 55 "Description": "A LAN Disconnect on %1 was detected on system %2.", 56 "Message": "A LAN Disconnect on %1 was detected on system %2.", 57 "Severity": "Warning", 58 "NumberOfArgs": 2, 59 "Resolution": "None" 60 } 61 } 62} 63``` 64 65This example defines an event, `Alert.1.0.LanDisconnect`, which can record the 66disconnect state of a network device and contains placeholders for the affected 67device and system. When this event occurs, there might be a `LogEntry` recorded 68containing something like: 69 70```json 71{ 72 "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.", 73 "MessageId": "Alert.1.0.LanDisconnect", 74 "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"] 75} 76``` 77 78The `Message` contains a human readable string which was created by applying the 79`MessageArgs` to the placeholders from the `Message` field in the registry. 80System management software can rely on the message registry (referenced from the 81`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to 82perform string processing for reacting to the event. 83 84Within OpenBMC, there is currently a [limited design][existing-design] for this 85Redfish feature and it requires inserting specially formed Redfish-specific 86logging messages into any application that wants to record these events, tightly 87coupling all applications to the Redfish implementation. It has also been 88observed that these [strings][app-example], when used, are often out of date 89with the [message registry][obmc-registry-example] advertised by `bmcweb`. Some 90maintainers have rejected adding new Redfish-specific logging messages to their 91applications. 92 93[LogEntry]: 94 https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json 95[HPE-Example]: 96 https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html 97[Oracle-Example]: 98 https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068 99[Registry-Example]: 100 https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf 101[existing-design]: 102 https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md 103[app-example]: 104 https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143 105[obmc-registry-example]: 106 https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5 107 108### Existing phosphor-logging implementation 109 110**Note**: While the word 'exception' is used in this section, the existing (and 111proposed) types can be used by applications and execution contexts with 112exceptions disabled. They are 'exceptions' because they do inherit from 113`std::exception` and there is support in the `sdbusplus` bindings for them to be 114used in exception handling. 115 116The `sdbusplus` bindings have the capability to define new C++ exception types 117which can be thrown by a DBus server and turned into an error response to the 118client. `phosphor-logging` extended this to also add metadata associated to the 119log type. See the following example error definitions and usages. 120 121`sdbusplus` error binding definition (in 122`xyz/openbmc_project/Certs.errors.yaml`): 123 124```yaml 125- name: InvalidCertificate 126 description: Invalid certificate file. 127``` 128 129`phosphor-logging` metadata definition (in 130`xyz/openbmc_project/Certs.metadata.yaml`): 131 132```yaml 133- name: InvalidCertificate 134 meta: 135 - str: "REASON=%s" 136 type: string 137``` 138 139Application code reporting an error: 140 141```cpp 142elog<InvalidCertificate>(Reason("Invalid certificate file format")); 143// or 144report<InvalidCertificate>(Reason("Existing certificate file is corrupted")); 145``` 146 147In this sample, an error named 148`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can 149be sent between applications as a DBus response. The `InvalidCertificate` is 150expected to have additional metadata `REASON` which is a string. The two APIs 151`elog` and `report` have slightly different behaviors: `elog` throws an 152exception which can either result in an error DBus result or be handled 153elsewhere in the application, while `report` sends the event directly to 154`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the 155metadata is inserted into the `systemd` journal. 156 157When an error is sent to the `phosphor-logging` daemon, it will: 158 1591. Search back through the journal for recorded metadata associated with the 160 event (this is a relative slow operation). 1612. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object 162 with the associated data extracted from the journal. 1633. Persist a serialized version of the object. 164 165Within `bmcweb` there is support for translating 166`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging` 167into Redfish `LogEntries`, but this support does not reference a Message 168Registry. This makes the events of limited utility for consumption by system 169management software, as it cannot know all of the event types and is left to 170perform (hand-coded) regular-expressions to extract any information from the 171`Message` field of the `LogEntry`. Furthermore, these regular-expressions are 172likely to become outdated over time as internal OpenBMC error reporting 173structure, metadata, or message strings evolve. 174 175[Logging-Entry]: 176 https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1 177 178### Issues with the Status Quo 179 180- There are two different implementations of error logging, neither of which are 181 both complete and fully accepted by maintainers. These implementations also do 182 not cover tracing events. 183 184- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish 185 Message Registry and the reporting application. It also requires every 186 application to be "Redfish aware" which limits decoupling between applications 187 and external management interfaces. This also leaves gaps for reporting errors 188 in different management interfaces, such as inband IPMI and PLDM. The approach 189 also does not provide comple-time assurance of appropriate metadata 190 collection, which can lead to producing code being out-of-date with the 191 message registry definitions. 192 193- The `phosphor-logging` approach does not provide compile-time assurance of 194 appropriate metadata collection and requires expensive daemon processing of 195 the `systemd` journal on each error report, which limits scalability. 196 197- The `sdbusplus` bindings for error reporting do not currently handle lossless 198 transmission of errors between DBus servers and clients. 199 200- Similar applications can result in different Redfish `LogEntry` for the same 201 error scenario. This has been observed in sensor threshold exceeded events 202 between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and 203 `phosphor-health-monitor`. One cause of this is two different error reporting 204 approaches and disagreements amongst maintainers as to the preferred approach. 205 206## Requirements 207 208- Applications running on the BMC must be able to report errors and failure 209 which are persisted and available for external system management through 210 standards such as Redfish. 211 - These errors must be structured, versioned, and the complete set of errors 212 able to be created by the BMC should be available at built-time of a BMC 213 image. 214 - The set of errors, able to be created by the BMC, must be able to be 215 transformed into relevant data sets, such as Redfish Message Registries. 216 - For Redfish, the transformation must comply with the Redfish standard 217 requirements, such as conforming to semantic versioning expectations. 218 - For Redfish, the transformation should allow mapping internally defined 219 events to pre-existing Redfish Message Registries for broader 220 compatibility. 221 - For Redfish, the implementation must also support the EventService 222 mechanics for push-reporting. 223 - Errors reported by the BMC should contain sufficient information to allow 224 service of the system for these failures, either by humans or automation 225 (depending on the individual system requirements). 226 227- Applications running on the BMC should be able to report important tracing 228 events relevant to system management and/or debug, such as the system 229 successfully reaching a running state. 230 - All requirements relevant to errors are also applicable to tracing events. 231 - The implementation must have a mechanism for vendors to be able to disable 232 specific tracing events to conform to their own system design requirements. 233 234- Applications running on the BMC should be able to determine when a previously 235 reported error is no longer relevant and mark it as "resolved", while 236 maintaining the persistent record for future usages such as debug. 237 238- The BMC should provide a mechanism for managed entities within the server to 239 report their own errors and events. Examples of managed entities would be 240 firmware, such as the BIOS, and satellite management controllers. 241 242- The implementation on the BMC should scale to a minimum of 243 [10,000][error-discussion] error and events without impacting the BMC or 244 managed system performance. 245 246- The implementation should provide a mechanism to allow OEM or vendor 247 extensions to the error and event definitions (and generated artifacts such as 248 the Redfish Message Registry) for usage in closed-source or non-upstreamed 249 code. These extensions must be clearly identified, in all interfaces, as 250 vendor-specific and not be tied to the OpenBMC project. 251 252- APIs to implement error and event reporting should have good ergonomics. These 253 APIs must provide compile-time identification, for applicable programming 254 languages, of call sites which do not conform to the BMC error and event 255 specifications. 256 - The generated error classes and APIs should not require exceptions but 257 should also integrate with the `sdbusplus` client and server bindings, which 258 do leverage exceptions. 259 260[error-discussion]: 261 https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213 262 263## Proposed Design 264 265The proposed design has a few high-level design elements: 266 267- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error 268 reporting; expand it to cover tracing events; improve the ergonomics of the 269 associated APIs and add compile-time checking of missing metadata. 270 271- Add APIs to `phosphor-logging` to enable daemons to easily look up their own 272 previously reported events (for marking as resolved). 273 274- Add to `phosphor-logging` a compile-time mechanism to disable recording of 275 specific tracing events for vendor-level customization. 276 277- Generate a Redfish Message Registry for all error and events defined in 278 `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance 279 `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to 280 cover the Redfish Message Registry and `phosphor-logging` enhancements; 281 Leverage the Redfish `LogEntry.DiagnosticData` field to provide a 282 Base64-encoded JSON representation of the entire `Logging.Entry` for 283 additional diagnostics [[does this need to be optional?]]. Add support to the 284 `bmcweb` EventService implementation to support `phosphor-logging`-hosted 285 events. 286 287### `sdbusplus` 288 289The `Foo.errors.yaml` content will be combined with the content formerly in the 290`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new 291file type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the 292current `error` and `metadata` information as well as augment with additional 293information necessary to generate external facing datasets, such as Redfish 294Message Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files 295will be deprecated as their usage is replaced by the new format. 296 297The `sdbusplus` library will be enhanced to provide the following: 298 299- JSON serialization and de-serialization of generated exception types with 300 their assigned metadata; assignment of the JSON serialization to the `message` 301 field of `sd_bus_error_set` calls when errors are returned from DBus server 302 calls. 303 304- A facility to register exception types, at library load time, with the 305 `sdbusplus` library for automatic conversion back to C++ exception types in 306 DBus clients. 307 308The binding generator(s) will be expanded to do the following: 309 310- Generate complete C++ exception types, with compile-time checking of missing 311 metadata and JSON serialization, for errors and events. Metadata can be of one 312 of the following types: 313 - size-type and signed integer 314 - floating-point number 315 - string 316 - DBus object path 317 318- Generate a format that `bmcweb` can use to create and populate a Redfish 319 Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry` 320 for a set of errors and events 321 322For general users of `sdbusplus` these changes should have no impact, except for 323the availability of new generated exception types and that specialized instances 324of `sdbusplus::exception::generated_exception` will become available in DBus 325clients. 326 327### `phosphor-dbus-interfaces` 328 329Refactoring will be done to migrate existing `Foo.metadata.yaml` and 330`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by 331applications. Minor changes will take place to utilize the new binding 332generators from `sdbusplus`. A small library enhancement will be done to 333register all generated exception types with `sdbusplus`. Future contributors 334will be able to contribute new error and tracing event definitions. 335 336### `phosphor-logging` 337 338> TODO: Should a tracing event be a `Logging.Entry` with severity of 339> `Informational` or should they be a new type, such as `Logging.Event` and 340> managed separately. The `phosphor-logging` default `meson.options` have 341> `error_cap=200` and `error_info_cap=10`. If we increase the total number of 342> events allowed to 10K, the majority of them are likely going to be information 343> / tracing events. 344 345The `Logging.Entry` interface's `AdditionalData` property should change to 346`dict[string, variant[string,int64_t,size_t,object_path]]`. 347 348The `Logging.Create` interface will have a new method added: 349 350```yaml 351- name: CreateEntry 352 parameters: 353 - name: Message 354 type: string 355 - name: Severity 356 type: enum[Logging.Entry.Level] 357 - name: AdditionalData 358 type: dict[string, variant[string,int64_t,size_t,object_path]] 359 - name: Hint 360 type: string 361 default: "" 362 returns: 363 - name: Entry 364 type: object_path 365``` 366 367The `Hint` parameter is used for daemons to be able to query for their 368previously recorded error, for marking as resolved. These strings need to be 369globally unique and are suggested to be of the format `"<service_name>:<key>"`. 370 371A `Logging.SearchHint` interface will be created, which will be recorded at the 372same object path as a `Logging.Entry` when the `Hint` parameter was not an empty 373string: 374 375```yaml 376- property: Hint 377 type: string 378``` 379 380The `Logging.Manager` interface will be added with a single method: 381 382```yaml 383- name: FindEntry 384 parameters: 385 - name: Hint 386 type: String 387 returns: 388 - name: Entry 389 type: object_path 390 errors: 391 - xyz.openbmc_project.Common.ResourceNotFound 392``` 393 394A `lg2::commit` API will be added to support the new `sdbusplus` generated 395exception types, calling the new `Logging.Create.CreateEntry` method proposed 396earlier. This new API will support `sdbusplus::bus_t` for synchronous DBus 397operations and both `sdbusplus::async::context_t` and 398`sdbusplus::asio::connection` for asynchronous DBus operations. 399 400There are outstanding performance concerns with the `phosphor-logging` 401implementation that may impact the ability for scaling to 10,000 event records. 402This issue is expected to be self-contained within `phosphor-logging`, except 403for potential future changes to the log-retrieval interfaces used by `bmcweb`. 404In order to decouple the transition to this design, by callers of the logging 405APIs, from the experimentation and improvements in `phosphor-logging`, we will 406add a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit` 407behavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same 408approach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog` 409configuration and `bmcweb` support to use these directly. This will allow 410systems which knowingly scale to a large number of event records, using 411`rsyslog` mechanics, the same level of performance. One caveat of this support 412is that the hint and resolution behavior will not exist when that option is 413enabled. 414 415### `bmcweb` 416 417`bmcweb` already has support for build-time conversion from a Redfish Message 418Registry, codified in JSON, to header files it uses to serve the registry; this 419will be expanded to support Redfish Message Registries generated by `sdbusplus`. 420`bmcweb` will add a Meson option for additional message registries, provided 421from bitbake from `phosphor-dbus-interfaces` and vendor-specific event 422definitions as a path to a directory of Message Registry JSONs. Support will 423also be added for adding `phosphor-dbus-interfaces` as a Meson subproject for 424stand-alone testing. 425 426It is desirable for `sdbusplus` to generate a Redfish Message Registry directly, 427leveraging the existing scripts for integration with `bmcweb`. As part of this 428we would like to support mapping a `Logging.Entry` event to an existing 429standardized Redfish event (such as those in the Base registry). The generated 430information must contain the `Logging.Entry::Message` identifier, the 431`AdditionalData` to `MessageArgs` mapping, and the translation from the 432`Message` identifier to the Redfish Message ID (when the Message ID is not from 433"this" registry). In order to facilitate this, we will need to add OEM fields to 434the Redfish Message Registry JSON, which are only used by the `bmcweb` 435processing scripts, to generate the information necessary for this additional 436mapping. 437 438The `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be 439enhanced, to utilize these Message Registries, in four ways: 440 4411. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned 442 to the `DiagnosticData` property. 443 4442. If the `Logging.Entry::Message` contains an identifier corresponding to a 445 Registry entry, the `MessageId` property will be set to the corresponding 446 Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used 447 directly with no further transformation (as is done today). 448 4493. If the `Logging.Entry::Message` contains an identifier corresponding to a 450 Registry entry, the `MessageArgs` property will be filled in by obtaining the 451 corresponding values from the `AdditionalData` dictionary and the `Message` 452 field will be generated from combining these values with the `Message` string 453 from the Registry. 454 4554. A mechanism should be implemented to translate DBus `object_path` references 456 to Redfish Resource URIs. When an `object_path` cannot be translated, 457 `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value. 458 459The implementation of `EventService` should be enhanced to support 460`phosphor-logging` hosted events. The implementation of `LogService` should be 461enhanced to support log paging for `phosphor-logging` hosted events. 462 463### `phosphor-sel-logger` 464 465The `phosphor-sel-logger` has a meson option `send-to-logger` which toggles 466between using `phosphor-logging` or the [`REDFISH_MESSAGE_ID` 467mechanism][existing-design]. The `phosphor-logging`-utilizing paths will be 468updated to utilize `phosphor-dbus-interfaces` specified errors and events. 469 470### YAML format 471 472Consider an example file in `phosphor-dbus-interfaces` as 473`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors 474and events: 475 476```yaml 477version: 1.3.1 478 479errors: 480 - name: UpdateFailure 481 severity: critical 482 metadata: 483 - name: TARGET 484 type: string 485 primary: true 486 - name: ERRNO 487 type: int64 488 - name: CALLOUT_HARDWARE 489 type: object_path 490 primary: true 491 en: 492 description: While updating the firmware on a device, the update failed. 493 message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}. 494 resolution: Retry update. 495 496 - name: BMCUpdateFailure 497 severity: critical 498 deprecated: 1.0.0 499 en: 500 description: Failed to update the BMC 501 redfish-mapping: OpenBMC.FirmwareUpdateFailed 502 503events: 504 - name: UpdateProgress 505 metadata: 506 - name: TARGET 507 type: string 508 primary: true 509 - name: COMPLETION 510 type: double 511 primary: true 512 en: 513 description: An update is in progress and has reached a checkpoint. 514 message: Updating of {TARGET} is {COMPLETION}% complete. 515``` 516 517Each `foo.events.yaml` file would be used to generate both the C++ classes (via 518`sdbusplus`) for exception handling and event reporting, as well as a versioned 519Redfish Message Registry for the errors and events. The [YAML 520schema][yaml-schema] is contained in the sdbusplus repository. 521 522The above example YAML would generate C++ classes similar to: 523 524```cpp 525namespace sdbusplus::errors::xyz::openbmc_project::software::update 526{ 527 528class UpdateFailure 529{ 530 531 template <typename... Args> 532 UpdateFailure(Args&&... args); 533}; 534 535} 536 537namespace sdbusplus::events::xyz::openbmc_project::software::update 538{ 539 540class UpdateProgress 541{ 542 template <typename... Args> 543 UpdateProgress(Args&&... args); 544}; 545 546} 547``` 548 549The constructors here are variadic templates because the generated constructor 550implementation will provide compile-time assurance that all of the metadata 551fields have been populated (in any order). To raise an `UpdateFailure` a 552developers might do something like: 553 554```cpp 555// Immediately report the event: 556lg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path)); 557// or send it in a dbus response (when using sdbusplus generated binding): 558throw UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path); 559``` 560 561If one of the fields, such as `ERRNO` were omitted, a compile failure will be 562raised indicating the first missing field. 563 564[yaml-schema]: 565 https://github.com/openbmc/sdbusplus/blob/master/tools/sdbusplus/schemas/events.schema.yaml 566 567### Versioning Policy 568 569Assume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention. 570 571- Adjusting a description or message should result in a `PATCH` increment. 572- Adding a new error or event, or adding metadata to an existing error or event, 573 should result in a `MINOR` increment. 574- Deprecating an error or event should result in a `MAJOR` increment. 575 576There is [guidance on maintenance][registry-guidance] of the OpenBMC Message 577Registry. We will incorporate that guidance into the equivalent 578`phosphor-dbus-interfaces` policy. 579 580[registry-guidance]: 581 https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md 582 583### Generated Redfish Message Registry 584 585[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish 586Message Registries and dictates guidelines for identifiers. 587 588The hypothetical events defined above would create a message registry similar 589to: 590 591```json 592{ 593 "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1", 594 "Language": "en", 595 "Messages": { 596 "UpdateFailure": { 597 "Description": "While updating the firmware on a device, the update failed.", 598 "Message": "A failure occurred updating %1 on %2.", 599 "Resolution": "Retry update." 600 "NumberOfArgs": 2, 601 "ParamTypes": ["string", "string"], 602 "Severity": "Critical", 603 }, 604 "UpdateProgress" : { 605 "Description": "An update is in progress and has reached a checkpoint." 606 "Message": "Updating of %1 is %2\% complete.", 607 "Resolution": "None", 608 "NumberOfArgs": 2, 609 "ParamTypes": ["string", "number"], 610 "Severity": "OK", 611 } 612 } 613} 614``` 615 616The prefix `OpenBMC_Base` shall be exclusively reserved for use by events from 617`phosphor-logging`. Events defined in other repositories will be expected to use 618some other prefix. Vendor-defined repositories should use a vendor-owned prefix 619as directed by [DSP0266][dsp0266]. 620 621[dsp0266]: 622 https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf 623 624### Vendor implications 625 626As specified above, vendors must use their own identifiers in order to conform 627with the Redfish specification (see [DSP0266][dsp0266] for requirements on 628identifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`) 629implementation(s) will enable vendors to create their own events for downstream 630code and Registries for integration with Redfish, by creating downstream 631repositories of error definitions. Vendors are responsible for ensuring their 632own versioning and identifiers conform to the expectations in the [Redfish 633specification][dsp0266]. 634 635One potential bad behavior on the part of vendors would be forking and modifying 636`phosphor-dbus-interfaces` defined events. Vendors must not add their own events 637to `phosphor-dbus-interfaces` in downstream implementations because it would 638lead to their implementation advertising support for a message in an 639OpenBMC-owned Registry which is not the case, but they should add them to their 640own repositories with a separate identifier. Similarly, if a vendor were to 641_backport_ upstream changes into their fork, they would need to ensure that the 642`foo.events.yaml` file for that version matches identically with the upstream 643implementation. 644 645## Alternatives Considered 646 647Many alternatives have been explored and referenced through earlier work. Within 648this proposal there are many minor-alternatives that have been assessed. 649 650### Exception inheritance 651 652The original `phosphor-logging` error descriptions allowed inheritance between 653two errors. This is not supported by the proposal for two reasons: 654 655- This introduces complexity in the Redfish Message Registry versioning because 656 a change in one file should induce version changes in all dependent files. 657 658- It makes it difficult for a developer to clearly identify all of the fields 659 they are expected to populate without traversing multiple files. 660 661### sdbusplus Exception APIs 662 663There are a few possible syntaxes I came up with for constructing the generated 664exception types. It is important that these have good ergonomics, are easy to 665understand, and can provide compile-time awareness of missing metadata fields. 666 667```cpp 668 using Example = sdbusplus::error::xyz::openbmc_project::Example; 669 670 // 1) 671 throw Example().fru("Motherboard").value(42); 672 673 // 2) 674 throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42); 675 676 // 3) 677 throw Example("FRU", "Motherboard", "VALUE", 42); 678 679 // 4) 680 throw Example([](auto e) { return e.fru("Motherboard").value(42); }); 681 682 // 5) 683 throw Example({.fru = "Motherboard", .value = 42}); 684``` 685 686**Note**: These examples are all show using `throw` syntax, but could also be 687saved in local variables, returned from functions, or immediately passed to 688`lg2::commit`. 689 6901. This would be my preference for ergonomics and clarity, as it would allow 691 LSP-enabled editors to give completions for the metadata fields but 692 unfortunately there is no mechanism in C++ to define a type which can be 693 constructed but not thrown, which means we cannot get compile-time checking 694 of all metadata fields. 695 6962. This syntax uses tag-dispatch to enables compile-time checking of all 697 metadata fields and potential LSP-completion of the tag-types, but is more 698 verbose than option 3. 699 7003. This syntax is less verbose than (2) and follows conventions already used in 701 `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the 702 metadata tags. 703 7044. This syntax is similar to option (1) but uses an indirection of a lambda to 705 enable compile-time checking that all metadata fields have been populated by 706 the lambda. The LSP-completion is likely not as strong as option (1), due to 707 the use of `auto`, and the lambda necessity will likely be a hang-up for 708 unfamiliar developers. 709 7105. This syntax has similar characteristics as option (1) but similarly does not 711 provide compile-time confirmation that all fields have been populated. 712 713The proposal therefore suggests option (3) is most suitable. 714 715### Redfish Translation Support 716 717The proposed YAML format allows future addition of translation but it is not 718enabled at this time. Future development could enable the Redfish Message 719Registry to be generated in multiple languages if the `message:language` exists 720for those languages. 721 722### Redfish Registry Versioning 723 724The Redfish Message Registries are required to be versioned and has 3 digit 725fields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the 726Message ID. Rather than using the manually specified version we could take a few 727other approaches: 728 729- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the 730 registry was built. 731 - This does not cover vendors that may choose to branch for stabilization 732 purposes, so we can end up with two machines having the same 733 OpenBMC-versioned message registry with different content. 734 735- Use the most recent `openbmc/openbmc` tag as the version. 736 - This does not cover vendors that build off HEAD and may deploy multiple 737 images between two OpenBMC releases. 738 739- Generate the version based on the git-history. 740 - This requires `phosphor-dbus-interfaces` to be built from a git repository, 741 which may not always be true for Yocto source mirrors, and requires 742 non-trivial processing that continues to scale over time. 743 744### Existing OpenBMC Redfish Registry 745 746There are currently 191 messages defined in the existing Redfish Message 747Registry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase 748is emitted with the correct version. 96 of those are only emitted by 749Intel-specific code that is not pulled into any upstreamed machine, 39 are 750emitted by potentially common code, and 56 are not even referenced in the 751codebase outside of the bmcweb registry. Of the 39 common messages half of them 752have an equivalent in one of the standard registries that should be leveraged 753and many of the others do not have attributes that would facilitate a multi-host 754configuration, so the registry at a minimum needs to be updated. None of the 755current implementation has the capability to handle Redfish Resource URIs. 756 757The proposal therefore is to deprecate the existing registry and replace it with 758the new generated registries. For repositories that currently emit events in the 759existing format, we can maintain those call-sites for a time period of 1-2 760years. 761 762If this aspect of the proposal is rejected, the YAML format allows mapping from 763`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0` 764registry `MessageIds`. 765 766Potentially common: 767 768- phosphor-post-code-manager 769 - BIOSPOSTCode (unique) 770- dbus-sensors 771 - ChassisIntrusionDetected (unique) 772 - ChassisIntrusionReset (unique) 773 - FanInserted 774 - FanRedundancyLost (unique) 775 - FanRedudancyRegained (unique) 776 - FanRemoved 777 - LanLost 778 - LanRegained 779 - PowerSupplyConfigurationError (unique) 780 - PowerSupplyConfigurationErrorRecovered (unique) 781 - PowerSupplyFailed 782 - PowerSupplyFailurePredicted (unique) 783 - PowerSupplyFanFailed 784 - PowerSupplyFanRecovered 785 - PowerSupplyPowerLost 786 - PowerSupplyPowerRestored 787 - PowerSupplyPredictiedFailureRecovered (unique) 788 - PowerSupplyRecovered 789- phosphor-sel-logger 790 - IPMIWatchdog (unique) 791 - `SensorThreshold*` : 8 different events 792- phosphor-net-ipmid 793 - InvalidLoginAttempted (unique) 794- entity-manager 795 - InventoryAdded (unique) 796 - InventoryRemoved (unique) 797- estoraged 798 - ServiceStarted 799- x86-power-control 800 - NMIButtonPressed (unique) 801 - NMIDiagnosticInterrupt (unique) 802 - PowerButtonPressed (unique) 803 - PowerRestorePolicyApplied (unique) 804 - PowerSupplyPowerGoodFailed (unique) 805 - ResetButtonPressed (unique) 806 - SystemPowerGoodFailed (unique) 807 808Intel-only implementations: 809 810- intel-ipmi-oem 811 - ADDDCCorrectable 812 - BIOSPostERROR 813 - BIOSRecoveryComplete 814 - BIOSRecoveryStart 815 - FirmwareUpdateCompleted 816 - IntelUPILinkWidthReducedToHalf 817 - IntelUPILinkWidthReducedToQuarter 818 - LegacyPCIPERR 819 - LegacyPCISERR 820 - `ME*` : 29 different events 821 - `Memory*` : 9 different events 822 - MirroringRedundancyDegraded 823 - MirroringRedundancyFull 824 - `PCIeCorrectable*`, `PCIeFatal` : 29 different events 825 - SELEntryAdded 826 - SparingRedundancyDegraded 827- pfr-manager 828 - BIOSFirmwareRecoveryReason 829 - BIOSFirmwarePanicReason 830 - BMCFirmwarePanicReason 831 - BMCFirmwareRecoveryReason 832 - BMCFirmwareResiliencyError 833 - CPLDFirmwarePanicReason 834 - CPLDFirmwareResilencyError 835 - FirmwareResiliencyError 836- host-error-monitor 837 - CPUError 838 - CPUMismatch 839 - CPUThermalTrip 840 - ComponentOverTemperature 841 - SsbThermalTrip 842 - VoltageRegulatorOverheated 843- s2600wf-misc 844 - DriveError 845 - InventoryAdded 846 847## Impacts 848 849- New APIs are defined for error and event logging. This will deprecate existing 850 `phosphor-logging` APIs, with a time to migrate, for error reporting. 851 852- The design should improve performance by eliminating the regular parsing of 853 the `systemd` journal. The design may decrease performance by allowing the 854 number of error and event logs to be dramatically increased, which have an 855 impact to file system utilization and potential for DBus impacts some services 856 such as `ObjectMapper`. 857 858- Backwards compatibility and documentation should be improved by the automatic 859 generation of the Redfish Message Registry corresponding to all error and 860 event reports. 861 862### Organizational 863 864- **Does this repository require a new repository?** 865 - No 866- **Who will be the initial maintainer(s) of this repository?** 867 - N/A 868- **Which repositories are expected to be modified to execute this design?** 869 - `sdbusplus` 870 - `phosphor-dbus-interfaces` 871 - `phosphor-logging` 872 - `bmcweb` 873 - Any repository creating an error or event. 874 875## Testing 876 877- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error 878 and event generation, creation APIs, and to provide coverage on any changes to 879 the `Logging.Entry` object management. 880 881- Unit tests will be written for `bmcweb` for basic `Logging.Entry` 882 transformation and Message Registry generation. 883 884- Integration tests should be leveraged (and enhanced as necessary) from 885 `openbmc-test-automation` to cover the end-to-end error creation and Redfish 886 reporting. 887