1# Error and Event Logging 2 3Author: [Patrick Williams][patrick-email] `<stwcx>` 4 5[patrick-email]: mailto:patrick@stwcx.xyz 6 7Other contributors: 8 9Created: May 16, 2024 10 11## Problem Description 12 13There is currently not a consistent end-to-end error and event reporting design 14for the OpenBMC code stack. There are two different implementations, one 15primarily using phosphor-logging and one using rsyslog, both of which have gaps 16that a complete solution should address. This proposal is intended to be an 17end-to-end design handling both errors and tracing events which facilitate 18external management of the system in an automated and maintainable manner. 19 20## Background and References 21 22### Redfish LogEntry and Message Registry 23 24In Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that 25could be considered "logs", but one such use within OpenBMC is for an equivalent 26of the IPMI "System Event Log (SEL)". 27 28The IPMI SEL is the location where the BMC can collect errors and events, 29sometimes coming from other entities, such as the BIOS. Examples of these might 30be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful". 31These SEL records are exposed as human readable strings, either natively by a 32OEM SEL design or by tools such as `ipmitool`, which are typically unique to 33each system or manufacturer, and could hypothethically change with a BMC or 34firmware update, and are thus difficult to create automated tooling around. Two 35different vendors might use different strings to represent a critical 36temperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example] 37and ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is 38also no mechanism with IPMI to ask the machine "what are all of the SELs you 39might create". 40 41In order to solve two aspects of this problem, listing of possible events and 42versioning, Redfish has Message Registries. A message registry is a versioned 43collection of all of the error events that a system could generate and hints as 44to how they might be parsed and displayed to a user. An [informative 45reference][Registry-Example] from the DMTF gives this example: 46 47```json 48{ 49 "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry", 50 "Id": "Alert.1.0.0", 51 "RegistryPrefix": "Alert", 52 "RegistryVersion": "1.0.0", 53 "Messages": { 54 "LanDisconnect": { 55 "Description": "A LAN Disconnect on %1 was detected on system %2.", 56 "Message": "A LAN Disconnect on %1 was detected on system %2.", 57 "Severity": "Warning", 58 "NumberOfArgs": 2, 59 "Resolution": "None" 60 } 61 } 62} 63``` 64 65This example defines an event, `Alert.1.0.LanDisconnect`, which can record the 66disconnect state of a network device and contains placeholders for the affected 67device and system. When this event occurs, there might be a `LogEntry` recorded 68containing something like: 69 70```json 71{ 72 "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.", 73 "MessageId": "Alert.1.0.LanDisconnect", 74 "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"] 75} 76``` 77 78The `Message` contains a human readable string which was created by applying the 79`MessageArgs` to the placeholders from the `Message` field in the registry. 80System management software can rely on the message registry (referenced from the 81`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to 82perform string processing for reacting to the event. 83 84Within OpenBMC, there is currently a [limited design][existing-design] for this 85Redfish feature and it requires inserting specially formed Redfish-specific 86logging messages into any application that wants to record these events, tightly 87coupling all applications to the Redfish implementation. It has also been 88observed that these [strings][app-example], when used, are often out of date 89with the [message registry][registry-example] advertised by `bmcweb`. Some 90maintainers have rejected adding new Redfish-specific logging messages to their 91applications. 92 93[LogEntry]: 94 https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json 95[HPE-Example]: 96 https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html 97[Oracle-Example]: 98 https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068 99[Registry-Example]: 100 https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf 101[existing-design]: 102 https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md 103[app-example]: 104 https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143 105[registry-example]: 106 https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5 107 108### Existing phosphor-logging implementation 109 110**Note**: While the word 'exception' is used in this section, the existing (and 111proposed) types can be used by applications and execution contexts with 112exceptions disabled. They are 'exceptions' because they do inherit from 113`std::exception` and there is support in the `sdbusplus` bindings for them to be 114used in exception handling. 115 116The `sdbusplus` bindings have the capability to define new C++ exception types 117which can be thrown by a DBus server and turned into an error response to the 118client. `phosphor-logging` extended this to also add metadata associated to the 119log type. See the following example error definitions and usages. 120 121`sdbusplus` error binding definition (in 122`xyz/openbmc_project/Certs.errors.yaml`): 123 124```yaml 125- name: InvalidCertificate 126 description: Invalid certificate file. 127``` 128 129`phosphor-logging` metadata definition (in 130`xyz/openbmc_project/Certs.metadata.yaml`): 131 132```yaml 133- name: InvalidCertificate 134 meta: 135 - str: "REASON=%s" 136 type: string 137``` 138 139Application code reporting an error: 140 141```cpp 142elog<InvalidCertificate>(Reason("Invalid certificate file format")); 143// or 144report<InvalidCertificate>(Reason("Existing certificate file is corrupted")); 145``` 146 147In this sample, an error named 148`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can 149be sent between applications as a DBus response. The `InvalidCertificate` is 150expected to have additional metadata `REASON` which is a string. The two APIs 151`elog` and `report` have slightly different behaviors: `elog` throws an 152exception which can either result in an error DBus result or be handled 153elsewhere in the application, while `report` sends the event directly to 154`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the 155metadata is inserted into the `systemd` journal. 156 157When an error is sent to the `phosphor-logging` daemon, it will: 158 1591. Search back through the journal for recorded metadata associated with the 160 event (this is a relative slow operation). 1612. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object 162 with the associated data extracted from the journal. 1633. Persist a serialized version of the object. 164 165Within `bmcweb` there is support for translating 166`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging` 167into Redfish `LogEntries`, but this support does not reference a Message 168Registry. This makes the events of limited utility for consumption by system 169management software, as it cannot know all of the event types and is left to 170perform (hand-coded) regular-expressions to extract any information from the 171`Message` field of the `LogEntry`. Furthermore, these regular-expressions are 172likely to become outdated over time as internal OpenBMC error reporting 173structure, metadata, or message strings evolve. 174 175[Logging-Entry]: 176 https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1 177 178### Issues with the Status Quo 179 180- There are two different implementations of error logging, neither of which are 181 both complete and fully accepted by maintainers. These implementations also do 182 not cover tracing events. 183 184- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish 185 Message Registry and the reporting application. It also requires every 186 application to be "Redfish aware" which limits decoupling between applications 187 and external management interfaces. This also leaves gaps for reporting errors 188 in different management interfaces, such as inband IPMI and PLDM. The approach 189 also does not provide comple-time assurance of appropriate metadata 190 collection, which can lead to producing code being out-of-date with the 191 message registry definitions. 192 193- The `phosphor-logging` approach does not provide compile-time assurance of 194 appropriate metadata collection and requires expensive daemon processing of 195 the `systemd` journal on each error report, which limits scalability. 196 197- The `sdbusplus` bindings for error reporting do not currently handle lossless 198 transmission of errors between DBus servers and clients. 199 200- Similar applications can result in different Redfish `LogEntry` for the same 201 error scenario. This has been observed in sensor threshold exceeded events 202 between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and 203 `phosphor-health-monitor`. One cause of this is two different error reporting 204 approaches and disagreements amongst maintainers as to the preferred approach. 205 206## Requirements 207 208- Applications running on the BMC must be able to report errors and failure 209 which are persisted and available for external system management through 210 standards such as Redfish. 211 212 - These errors must be structured, versioned, and the complete set of errors 213 able to be created by the BMC should be available at built-time of a BMC 214 image. 215 - The set of errors, able to be created by the BMC, must be able to be 216 transformed into relevant data sets, such as Redfish Message Registries. 217 - For Redfish, the transformation must comply with the Redfish standard 218 requirements, such as conforming to semantic versioning expectations. 219 - For Redfish, the transformation should allow mapping internally defined 220 events to pre-existing Redfish Message Registries for broader 221 compatibility. 222 - For Redfish, the implementation must also support the EventService 223 mechanics for push-reporting. 224 - Errors reported by the BMC should contain sufficient information to allow 225 service of the system for these failures, either by humans or automation 226 (depending on the individual system requirements). 227 228- Applications running on the BMC should be able to report important tracing 229 events relevant to system management and/or debug, such as the system 230 successfully reaching a running state. 231 232 - All requirements relevant to errors are also applicable to tracing events. 233 - The implementation must have a mechanism for vendors to be able to disable 234 specific tracing events to conform to their own system design requirements. 235 236- Applications running on the BMC should be able to determine when a previously 237 reported error is no longer relevant and mark it as "resolved", while 238 maintaining the persistent record for future usages such as debug. 239 240- The BMC should provide a mechanism for managed entities within the server to 241 report their own errors and events. Examples of managed entities would be 242 firmware, such as the BIOS, and satellite management controllers. 243 244- The implementation on the BMC should scale to a minimum of 245 [10,000][error-discussion] error and events without impacting the BMC or 246 managed system performance. 247 248- The implementation should provide a mechanism to allow OEM or vendor 249 extensions to the error and event definitions (and generated artifacts such as 250 the Redfish Message Registry) for usage in closed-source or non-upstreamed 251 code. These extensions must be clearly identified, in all interfaces, as 252 vendor-specific and not be tied to the OpenBMC project. 253 254- APIs to implement error and event reporting should have good ergonomics. These 255 APIs must provide compile-time identification, for applicable programming 256 languages, of call sites which do not conform to the BMC error and event 257 specifications. 258 259 - The generated error classes and APIs should not require exceptions but 260 should also integrate with the `sdbusplus` client and server bindings, which 261 do leverage exceptions. 262 263[error-discussion]: 264 https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213 265 266## Proposed Design 267 268The proposed design has a few high-level design elements: 269 270- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error 271 reporting; expand it to cover tracing events; improve the ergonomics of the 272 associated APIs and add compile-time checking of missing metadata. 273 274- Add APIs to `phosphor-logging` to enable daemons to easily look up their own 275 previously reported events (for marking as resolved). 276 277- Add to `phosphor-logging` a compile-time mechanism to disable recording of 278 specific tracing events for vendor-level customization. 279 280- Generate a Redfish Message Registry for all error and events defined in 281 `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance 282 `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to 283 cover the Redfish Message Registry and `phosphor-logging` enhancements; 284 Leverage the Redfish `LogEntry.DiagnosticData` field to provide a 285 Base64-encoded JSON representation of the entire `Logging.Entry` for 286 additional diagnostics [[does this need to be optional?]]. Add support to the 287 `bmcweb` EventService implementation to support `phosphor-logging`-hosted 288 events. 289 290### `sdbusplus` 291 292The `Foo.errors.yaml` content will be combined with the content formerly in the 293`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new 294file type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the 295current `error` and `metadata` information as well as augment with additional 296information necessary to generate external facing datasets, such as Redfish 297Message Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files 298will be deprecated as their usage is replaced by the new format. 299 300The `sdbusplus` library will be enhanced to provide the following: 301 302- JSON serialization and de-serialization of generated exception types with 303 their assigned metadata; assignment of the JSON serialization to the `message` 304 field of `sd_bus_error_set` calls when errors are returned from DBus server 305 calls. 306 307- A facility to register exception types, at library load time, with the 308 `sdbusplus` library for automatic conversion back to C++ exception types in 309 DBus clients. 310 311The binding generator(s) will be expanded to do the following: 312 313- Generate complete C++ exception types, with compile-time checking of missing 314 metadata and JSON serialization, for errors and events. Metadata can be of one 315 of the following types: 316 317 - size-type and signed integer 318 - floating-point number 319 - string 320 - DBus object path 321 322- Generate a format that `bmcweb` can use to create and populate a Redfish 323 Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry` 324 for a set of errors and events 325 326For general users of `sdbusplus` these changes should have no impact, except for 327the availability of new generated exception types and that specialized instances 328of `sdbusplus::exception::generated_exception` will become available in DBus 329clients. 330 331### `phosphor-dbus-interfaces` 332 333Refactoring will be done to migrate existing `Foo.metadata.yaml` and 334`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by 335applications. Minor changes will take place to utilize the new binding 336generators from `sdbusplus`. A small library enhancement will be done to 337register all generated exception types with `sdbusplus`. Future contributors 338will be able to contribute new error and tracing event definitions. 339 340### `phosphor-logging` 341 342> TODO: Should a tracing event be a `Logging.Entry` with severity of 343> `Informational` or should they be a new type, such as `Logging.Event` and 344> managed separately. The `phosphor-logging` default `meson.options` have 345> `error_cap=200` and `error_info_cap=10`. If we increase the total number of 346> events allowed to 10K, the majority of them are likely going to be information 347> / tracing events. 348 349The `Logging.Entry` interface's `AdditionalData` property should change to 350`dict[string, variant[string,int64_t,size_t,object_path]]`. 351 352The `Logging.Create` interface will have a new method added: 353 354```yaml 355- name: CreateEntry 356 parameters: 357 - name: Message 358 type: string 359 - name: Severity 360 type: enum[Logging.Entry.Level] 361 - name: AdditionalData 362 type: dict[string, variant[string,int64_t,size_t,object_path]] 363 - name: Hint 364 type: string 365 default: "" 366 returns: 367 - name: Entry 368 type: object_path 369``` 370 371The `Hint` parameter is used for daemons to be able to query for their 372previously recorded error, for marking as resolved. These strings need to be 373globally unique and are suggested to be of the format `"<service_name>:<key>"`. 374 375A `Logging.SearchHint` interface will be created, which will be recorded at the 376same object path as a `Logging.Entry` when the `Hint` parameter was not an empty 377string: 378 379```yaml 380- property: Hint 381 type: string 382``` 383 384The `Logging.Manager` interface will be added with a single method: 385 386```yaml 387- name: FindEntry 388 parameters: 389 - name: Hint 390 type: String 391 returns: 392 - name: Entry 393 type: object_path 394 errors: 395 - xyz.openbmc_project.Common.ResourceNotFound 396``` 397 398A `lg2::commit` API will be added to support the new `sdbusplus` generated 399exception types, calling the new `Logging.Create.CreateEntry` method proposed 400earlier. This new API will support `sdbusplus::bus_t` for synchronous DBus 401operations and both `sdbusplus::async::context_t` and 402`sdbusplus::asio::connection` for asynchronous DBus operations. 403 404There are outstanding performance concerns with the `phosphor-logging` 405implementation that may impact the ability for scaling to 10,000 event records. 406This issue is expected to be self-contained within `phosphor-logging`, except 407for potential future changes to the log-retrieval interfaces used by `bmcweb`. 408In order to decouple the transition to this design, by callers of the logging 409APIs, from the experimentation and improvements in `phosphor-logging`, we will 410add a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit` 411behavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same 412approach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog` 413configuration and `bmcweb` support to use these directly. This will allow 414systems which knowingly scale to a large number of event records, using 415`rsyslog` mechanics, the same level of performance. One caveat of this support 416is that the hint and resolution behavior will not exist when that option is 417enabled. 418 419### `bmcweb` 420 421`bmcweb` already has support for build-time conversion from a Redfish Message 422Registry, codified in JSON, to header files it uses to serve the registry; this 423will be expanded to support Redfish Message Registries generated by `sdbusplus`. 424`bmcweb` will add a Meson option for additional message registries, provided 425from bitbake from `phosphor-dbus-interfaces` and vendor-specific event 426definitions as a path to a directory of Message Registry JSONs. Support will 427also be added for adding `phosphor-dbus-interfaces` as a Meson subproject for 428stand-alone testing. 429 430It is desirable for `sdbusplus` to generate a Redfish Message Registry directly, 431leveraging the existing scripts for integration with `bmcweb`. As part of this 432we would like to support mapping a `Logging.Entry` event to an existing 433standardized Redfish event (such as those in the Base registry). The generated 434information must contain the `Logging.Entry::Message` identifier, the 435`AdditionalData` to `MessageArgs` mapping, and the translation from the 436`Message` identifier to the Redfish Message ID (when the Message ID is not from 437"this" registry). In order to facilitate this, we will need to add OEM fields to 438the Redfish Message Registry JSON, which are only used by the `bmcweb` 439processing scripts, to generate the information necessary for this additional 440mapping. 441 442The `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be 443enhanced, to utilize these Message Registries, in four ways: 444 4451. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned 446 to the `DiagnosticData` property. 447 4482. If the `Logging.Entry::Message` contains an identifier corresponding to a 449 Registry entry, the `MessageId` property will be set to the corresponding 450 Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used 451 directly with no further transformation (as is done today). 452 4533. If the `Logging.Entry::Message` contains an identifier corresponding to a 454 Registry entry, the `MessageArgs` property will be filled in by obtaining the 455 corresponding values from the `AdditionalData` dictionary and the `Message` 456 field will be generated from combining these values with the `Message` string 457 from the Registry. 458 4594. A mechanism should be implemented to translate DBus `object_path` references 460 to Redfish Resource URIs. When an `object_path` cannot be translated, 461 `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value. 462 463The implementation of `EventService` should be enhanced to support 464`phosphor-logging` hosted events. The implementation of `LogService` should be 465enhanced to support log paging for `phosphor-logging` hosted events. 466 467### `phosphor-sel-logger` 468 469The `phosphor-sel-logger` has a meson option `send-to-logger` which toggles 470between using `phosphor-logging` or the [`REDFISH_MESSAGE_ID` 471mechanism][existing-design]. The `phosphor-logging`-utilizing paths will be 472updated to utilize `phosphor-dbus-interfaces` specified errors and events. 473 474### YAML format 475 476Consider an example file in `phosphor-dbus-interfaces` as 477`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors 478and events: 479 480```yaml 481version: 1.3.1 482 483errors: 484 - name: UpdateFailure 485 severity: critical 486 metadata: 487 - name: TARGET 488 type: string 489 primary: true 490 - name: ERRNO 491 type: int64 492 - name: CALLOUT_HARDWARE 493 type: object_path 494 primary: true 495 en: 496 description: While updating the firmware on a device, the update failed. 497 message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}. 498 resolution: Retry update. 499 500 - name: BMCUpdateFailure 501 severity: critical 502 deprecated: 1.0.0 503 en: 504 description: Failed to update the BMC 505 redfish-mapping: OpenBMC.FirmwareUpdateFailed 506 507events: 508 - name: UpdateProgress 509 metadata: 510 - name: TARGET 511 type: string 512 primary: true 513 - name: COMPLETION 514 type: double 515 primary: true 516 en: 517 description: An update is in progress and has reached a checkpoint. 518 message: Updating of {TARGET} is {COMPLETION}% complete. 519``` 520 521Each `foo.events.yaml` file would be used to generate both the C++ classes (via 522`sdbusplus`) for exception handling and event reporting, as well as a versioned 523Redfish Message Registry for the errors and events. The [YAML 524schema][yaml-schema] is contained in the sdbusplus repository. 525 526The above example YAML would generate C++ classes similar to: 527 528```cpp 529namespace sdbusplus::errors::xyz::openbmc_project::software::update 530{ 531 532class UpdateFailure 533{ 534 535 template <typename... Args> 536 UpdateFailure(Args&&... args); 537}; 538 539} 540 541namespace sdbusplus::events::xyz::openbmc_project::software::update 542{ 543 544class UpdateProgress 545{ 546 template <typename... Args> 547 UpdateProgress(Args&&... args); 548}; 549 550} 551``` 552 553The constructors here are variadic templates because the generated constructor 554implementation will provide compile-time assurance that all of the metadata 555fields have been populated (in any order). To raise an `UpdateFailure` a 556developers might do something like: 557 558```cpp 559// Immediately report the event: 560lg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path)); 561// or send it in a dbus response (when using sdbusplus generated binding): 562throw UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path); 563``` 564 565If one of the fields, such as `ERRNO` were omitted, a compile failure will be 566raised indicating the first missing field. 567 568[yaml-schema]: 569 https://github.com/openbmc/sdbusplus/blob/master/tools/sdbusplus/schemas/events.schema.yaml 570 571### Versioning Policy 572 573Assume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention. 574 575- Adjusting a description or message should result in a `PATCH` increment. 576- Adding a new error or event, or adding metadata to an existing error or event, 577 should result in a `MINOR` increment. 578- Deprecating an error or event should result in a `MAJOR` increment. 579 580There is [guidance on maintenance][registry-guidance] of the OpenBMC Message 581Registry. We will incorporate that guidance into the equivalent 582`phosphor-dbus-interfaces` policy. 583 584[registry-guidance]: 585 https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md 586 587### Generated Redfish Message Registry 588 589[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish 590Message Registries and dictates guidelines for identifiers. 591 592The hypothetical events defined above would create a message registry similar 593to: 594 595```json 596{ 597 "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1", 598 "Language": "en", 599 "Messages": { 600 "UpdateFailure": { 601 "Description": "While updating the firmware on a device, the update failed.", 602 "Message": "A failure occurred updating %1 on %2.", 603 "Resolution": "Retry update." 604 "NumberOfArgs": 2, 605 "ParamTypes": ["string", "string"], 606 "Severity": "Critical", 607 }, 608 "UpdateProgress" : { 609 "Description": "An update is in progress and has reached a checkpoint." 610 "Message": "Updating of %1 is %2\% complete.", 611 "Resolution": "None", 612 "NumberOfArgs": 2, 613 "ParamTypes": ["string", "number"], 614 "Severity": "OK", 615 } 616 } 617} 618``` 619 620The prefix `OpenBMC_Base` shall be exclusively reserved for use by events from 621`phosphor-logging`. Events defined in other repositories will be expected to use 622some other prefix. Vendor-defined repositories should use a vendor-owned prefix 623as directed by [DSP0266][dsp0266]. 624 625[dsp0266]: 626 https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf 627 628### Vendor implications 629 630As specified above, vendors must use their own identifiers in order to conform 631with the Redfish specification (see [DSP0266][dsp0266] for requirements on 632identifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`) 633implementation(s) will enable vendors to create their own events for downstream 634code and Registries for integration with Redfish, by creating downstream 635repositories of error definitions. Vendors are responsible for ensuring their 636own versioning and identifiers conform to the expectations in the [Redfish 637specification][dsp0266]. 638 639One potential bad behavior on the part of vendors would be forking and modifying 640`phosphor-dbus-interfaces` defined events. Vendors must not add their own events 641to `phosphor-dbus-interfaces` in downstream implementations because it would 642lead to their implementation advertising support for a message in an 643OpenBMC-owned Registry which is not the case, but they should add them to their 644own repositories with a separate identifier. Similarly, if a vendor were to 645_backport_ upstream changes into their fork, they would need to ensure that the 646`foo.events.yaml` file for that version matches identically with the upstream 647implementation. 648 649## Alternatives Considered 650 651Many alternatives have been explored and referenced through earlier work. Within 652this proposal there are many minor-alternatives that have been assessed. 653 654### Exception inheritance 655 656The original `phosphor-logging` error descriptions allowed inheritance between 657two errors. This is not supported by the proposal for two reasons: 658 659- This introduces complexity in the Redfish Message Registry versioning because 660 a change in one file should induce version changes in all dependent files. 661 662- It makes it difficult for a developer to clearly identify all of the fields 663 they are expected to populate without traversing multiple files. 664 665### sdbusplus Exception APIs 666 667There are a few possible syntaxes I came up with for constructing the generated 668exception types. It is important that these have good ergonomics, are easy to 669understand, and can provide compile-time awareness of missing metadata fields. 670 671```cpp 672 using Example = sdbusplus::error::xyz::openbmc_project::Example; 673 674 // 1) 675 throw Example().fru("Motherboard").value(42); 676 677 // 2) 678 throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42); 679 680 // 3) 681 throw Example("FRU", "Motherboard", "VALUE", 42); 682 683 // 4) 684 throw Example([](auto e) { return e.fru("Motherboard").value(42); }); 685 686 // 5) 687 throw Example({.fru = "Motherboard", .value = 42}); 688``` 689 690**Note**: These examples are all show using `throw` syntax, but could also be 691saved in local variables, returned from functions, or immediately passed to 692`lg2::commit`. 693 6941. This would be my preference for ergonomics and clarity, as it would allow 695 LSP-enabled editors to give completions for the metadata fields but 696 unfortunately there is no mechanism in C++ to define a type which can be 697 constructed but not thrown, which means we cannot get compile-time checking 698 of all metadata fields. 699 7002. This syntax uses tag-dispatch to enables compile-time checking of all 701 metadata fields and potential LSP-completion of the tag-types, but is more 702 verbose than option 3. 703 7043. This syntax is less verbose than (2) and follows conventions already used in 705 `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the 706 metadata tags. 707 7084. This syntax is similar to option (1) but uses an indirection of a lambda to 709 enable compile-time checking that all metadata fields have been populated by 710 the lambda. The LSP-completion is likely not as strong as option (1), due to 711 the use of `auto`, and the lambda necessity will likely be a hang-up for 712 unfamiliar developers. 713 7145. This syntax has similar characteristics as option (1) but similarly does not 715 provide compile-time confirmation that all fields have been populated. 716 717The proposal therefore suggests option (3) is most suitable. 718 719### Redfish Translation Support 720 721The proposed YAML format allows future addition of translation but it is not 722enabled at this time. Future development could enable the Redfish Message 723Registry to be generated in multiple languages if the `message:language` exists 724for those languages. 725 726### Redfish Registry Versioning 727 728The Redfish Message Registries are required to be versioned and has 3 digit 729fields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the 730Message ID. Rather than using the manually specified version we could take a few 731other approaches: 732 733- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the 734 registry was built. 735 736 - This does not cover vendors that may choose to branch for stabilization 737 purposes, so we can end up with two machines having the same 738 OpenBMC-versioned message registry with different content. 739 740- Use the most recent `openbmc/openbmc` tag as the version. 741 742 - This does not cover vendors that build off HEAD and may deploy multiple 743 images between two OpenBMC releases. 744 745- Generate the version based on the git-history. 746 747 - This requires `phosphor-dbus-interfaces` to be built from a git repository, 748 which may not always be true for Yocto source mirrors, and requires 749 non-trivial processing that continues to scale over time. 750 751### Existing OpenBMC Redfish Registry 752 753There are currently 191 messages defined in the existing Redfish Message 754Registry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase 755is emitted with the correct version. 96 of those are only emitted by 756Intel-specific code that is not pulled into any upstreamed machine, 39 are 757emitted by potentially common code, and 56 are not even referenced in the 758codebase outside of the bmcweb registry. Of the 39 common messages half of them 759have an equivalent in one of the standard registries that should be leveraged 760and many of the others do not have attributes that would facilitate a multi-host 761configuration, so the registry at a minimum needs to be updated. None of the 762current implementation has the capability to handle Redfish Resource URIs. 763 764The proposal therefore is to deprecate the existing registry and replace it with 765the new generated registries. For repositories that currently emit events in the 766existing format, we can maintain those call-sites for a time period of 1-2 767years. 768 769If this aspect of the proposal is rejected, the YAML format allows mapping from 770`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0` 771registry `MessageIds`. 772 773Potentially common: 774 775- phosphor-post-code-manager 776 - BIOSPOSTCode (unique) 777- dbus-sensors 778 - ChassisIntrusionDetected (unique) 779 - ChassisIntrusionReset (unique) 780 - FanInserted 781 - FanRedundancyLost (unique) 782 - FanRedudancyRegained (unique) 783 - FanRemoved 784 - LanLost 785 - LanRegained 786 - PowerSupplyConfigurationError (unique) 787 - PowerSupplyConfigurationErrorRecovered (unique) 788 - PowerSupplyFailed 789 - PowerSupplyFailurePredicted (unique) 790 - PowerSupplyFanFailed 791 - PowerSupplyFanRecovered 792 - PowerSupplyPowerLost 793 - PowerSupplyPowerRestored 794 - PowerSupplyPredictiedFailureRecovered (unique) 795 - PowerSupplyRecovered 796- phosphor-sel-logger 797 - IPMIWatchdog (unique) 798 - `SensorThreshold*` : 8 different events 799- phosphor-net-ipmid 800 - InvalidLoginAttempted (unique) 801- entity-manager 802 - InventoryAdded (unique) 803 - InventoryRemoved (unique) 804- estoraged 805 - ServiceStarted 806- x86-power-control 807 - NMIButtonPressed (unique) 808 - NMIDiagnosticInterrupt (unique) 809 - PowerButtonPressed (unique) 810 - PowerRestorePolicyApplied (unique) 811 - PowerSupplyPowerGoodFailed (unique) 812 - ResetButtonPressed (unique) 813 - SystemPowerGoodFailed (unique) 814 815Intel-only implementations: 816 817- intel-ipmi-oem 818 - ADDDCCorrectable 819 - BIOSPostERROR 820 - BIOSRecoveryComplete 821 - BIOSRecoveryStart 822 - FirmwareUpdateCompleted 823 - IntelUPILinkWidthReducedToHalf 824 - IntelUPILinkWidthReducedToQuarter 825 - LegacyPCIPERR 826 - LegacyPCISERR 827 - `ME*` : 29 different events 828 - `Memory*` : 9 different events 829 - MirroringRedundancyDegraded 830 - MirroringRedundancyFull 831 - `PCIeCorrectable*`, `PCIeFatal` : 29 different events 832 - SELEntryAdded 833 - SparingRedundancyDegraded 834- pfr-manager 835 - BIOSFirmwareRecoveryReason 836 - BIOSFirmwarePanicReason 837 - BMCFirmwarePanicReason 838 - BMCFirmwareRecoveryReason 839 - BMCFirmwareResiliencyError 840 - CPLDFirmwarePanicReason 841 - CPLDFirmwareResilencyError 842 - FirmwareResiliencyError 843- host-error-monitor 844 - CPUError 845 - CPUMismatch 846 - CPUThermalTrip 847 - ComponentOverTemperature 848 - SsbThermalTrip 849 - VoltageRegulatorOverheated 850- s2600wf-misc 851 - DriveError 852 - InventoryAdded 853 854## Impacts 855 856- New APIs are defined for error and event logging. This will deprecate existing 857 `phosphor-logging` APIs, with a time to migrate, for error reporting. 858 859- The design should improve performance by eliminating the regular parsing of 860 the `systemd` journal. The design may decrease performance by allowing the 861 number of error and event logs to be dramatically increased, which have an 862 impact to file system utilization and potential for DBus impacts some services 863 such as `ObjectMapper`. 864 865- Backwards compatibility and documentation should be improved by the automatic 866 generation of the Redfish Message Registry corresponding to all error and 867 event reports. 868 869### Organizational 870 871- **Does this repository require a new repository?** 872 - No 873- **Who will be the initial maintainer(s) of this repository?** 874 - N/A 875- **Which repositories are expected to be modified to execute this design?** 876 - `sdbusplus` 877 - `phosphor-dbus-interfaces` 878 - `phosphor-logging` 879 - `bmcweb` 880 - Any repository creating an error or event. 881 882## Testing 883 884- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error 885 and event generation, creation APIs, and to provide coverage on any changes to 886 the `Logging.Entry` object management. 887 888- Unit tests will be written for `bmcweb` for basic `Logging.Entry` 889 transformation and Message Registry generation. 890 891- Integration tests should be leveraged (and enhanced as necessary) from 892 `openbmc-test-automation` to cover the end-to-end error creation and Redfish 893 reporting. 894