1# Error and Event Logging 2 3Author: [Patrick Williams][patrick-email] `<stwcx>` 4 5[patrick-email]: mailto:patrick@stwcx.xyz 6 7Other contributors: 8 9Created: May 16, 2024 10 11## Problem Description 12 13There is currently not a consistent end-to-end error and event reporting design 14for the OpenBMC code stack. There are two different implementations, one 15primarily using phosphor-logging and one using rsyslog, both of which have gaps 16that a complete solution should address. This proposal is intended to be an 17end-to-end design handling both errors and tracing events which facilitate 18external management of the system in an automated and maintainable manner. 19 20## Background and References 21 22### Redfish LogEntry and Message Registry 23 24In Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that 25could be considered "logs", but one such use within OpenBMC is for an equivalent 26of the IPMI "System Event Log (SEL)". 27 28The IPMI SEL is the location where the BMC can collect errors and events, 29sometimes coming from other entities, such as the BIOS. Examples of these might 30be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful". 31These SEL records are exposed as human readable strings, either natively by a 32OEM SEL design or by tools such as `ipmitool`, which are typically unique to 33each system or manufacturer, and could hypothethically change with a BMC or 34firmware update, and are thus difficult to create automated tooling around. Two 35different vendors might use different strings to represent a critical 36temperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example] 37and ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is 38also no mechanism with IPMI to ask the machine "what are all of the SELs you 39might create". 40 41In order to solve two aspects of this problem, listing of possible events and 42versioning, Redfish has Message Registries. A message registry is a versioned 43collection of all of the error events that a system could generate and hints as 44to how they might be parsed and displayed to a user. An [informative 45reference][Registry-Example] from the DMTF gives this example: 46 47```json 48{ 49 "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry", 50 "Id": "Alert.1.0.0", 51 "RegistryPrefix": "Alert", 52 "RegistryVersion": "1.0.0", 53 "Messages": { 54 "LanDisconnect": { 55 "Description": "A LAN Disconnect on %1 was detected on system %2.", 56 "Message": "A LAN Disconnect on %1 was detected on system %2.", 57 "Severity": "Warning", 58 "NumberOfArgs": 2, 59 "Resolution": "None" 60 } 61 } 62} 63``` 64 65This example defines an event, `Alert.1.0.LanDisconnect`, which can record the 66disconnect state of a network device and contains placeholders for the affected 67device and system. When this event occurs, there might be a `LogEntry` recorded 68containing something like: 69 70```json 71{ 72 "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.", 73 "MessageId": "Alert.1.0.LanDisconnect", 74 "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"] 75} 76``` 77 78The `Message` contains a human readable string which was created by applying the 79`MessageArgs` to the placeholders from the `Message` field in the registry. 80System management software can rely on the message registry (referenced from the 81`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to 82perform string processing for reacting to the event. 83 84Within OpenBMC, there is currently a [limited design][existing-design] for this 85Redfish feature and it requires inserting specially formed Redfish-specific 86logging messages into any application that wants to record these events, tightly 87coupling all applications to the Redfish implementation. It has also been 88observed that these [strings][app-example], when used, are often out of date 89with the [message registry][registry-example] advertised by `bmcweb`. Some 90maintainers have rejected adding new Redfish-specific logging messages to their 91applications. 92 93[LogEntry]: 94 https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json 95[HPE-Example]: 96 https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html 97[Oracle-Example]: 98 https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068 99[Registry-Example]: 100 https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf 101[existing-design]: 102 https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md 103[app-example]: 104 https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143 105[registry-example]: 106 https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5 107 108### Existing phosphor-logging implementation 109 110**Note**: While the word 'exception' is used in this section, the existing (and 111proposed) types can be used by applications and execution contexts with 112exceptions disabled. They are 'exceptions' because they do inherit from 113`std::exception` and there is support in the `sdbusplus` bindings for them to be 114used in exception handling. 115 116The `sdbusplus` bindings have the capability to define new C++ exception types 117which can be thrown by a DBus server and turned into an error response to the 118client. `phosphor-logging` extended this to also add metadata associated to the 119log type. See the following example error definitions and usages. 120 121`sdbusplus` error binding definition (in 122`xyz/openbmc_project/Certs.errors.yaml`): 123 124```yaml 125- name: InvalidCertificate 126 description: Invalid certificate file. 127``` 128 129`phosphor-logging` metadata definition (in 130`xyz/openbmc_project/Certs.metadata.yaml`): 131 132```yaml 133- name: InvalidCertificate 134 meta: 135 - str: "REASON=%s" 136 type: string 137``` 138 139Application code reporting an error: 140 141```cpp 142elog<InvalidCertificate>(Reason("Invalid certificate file format")); 143// or 144report<InvalidCertificate>(Reason("Existing certificate file is corrupted")); 145``` 146 147In this sample, an error named 148`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can 149be sent between applications as a DBus response. The `InvalidCertificate` is 150expected to have additional metadata `REASON` which is a string. The two APIs 151`elog` and `report` have slightly different behaviors: `elog` throws an 152exception which can either result in an error DBus result or be handled 153elsewhere in the application, while `report` sends the event directly to 154`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the 155metadata is inserted into the `systemd` journal. 156 157When an error is sent to the `phosphor-logging` daemon, it will: 158 1591. Search back through the journal for recorded metadata associated with the 160 event (this is a relative slow operation). 1612. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object 162 with the associated data extracted from the journal. 1633. Persist a serialized version of the object. 164 165Within `bmcweb` there is support for translating 166`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging` 167into Redfish `LogEntries`, but this support does not reference a Message 168Registry. This makes the events of limited utility for consumption by system 169management software, as it cannot know all of the event types and is left to 170perform (hand-coded) regular-expressions to extract any information from the 171`Message` field of the `LogEntry`. Furthermore, these regular-expressions are 172likely to become outdated over time as internal OpenBMC error reporting 173structure, metadata, or message strings evolve. 174 175[Logging-Entry]: 176 https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1 177 178### Issues with the Status Quo 179 180- There are two different implementations of error logging, neither of which are 181 both complete and fully accepted by maintainers. These implementations also do 182 not cover tracing events. 183 184- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish 185 Message Registry and the reporting application. It also requires every 186 application to be "Redfish aware" which limits decoupling between applications 187 and external management interfaces. This also leaves gaps for reporting errors 188 in different management interfaces, such as inband IPMI and PLDM. The approach 189 also does not provide comple-time assurance of appropriate metadata 190 collection, which can lead to producing code being out-of-date with the 191 message registry definitions. 192 193- The `phosphor-logging` approach does not provide compile-time assurance of 194 appropriate metadata collection and requires expensive daemon processing of 195 the `systemd` journal on each error report, which limits scalability. 196 197- The `sdbusplus` bindings for error reporting do not currently handle lossless 198 transmission of errors between DBus servers and clients. 199 200- Similar applications can result in different Redfish `LogEntry` for the same 201 error scenario. This has been observed in sensor threshold exceeded events 202 between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and 203 `phosphor-health-monitor`. One cause of this is two different error reporting 204 approaches and disagreements amongst maintainers as to the preferred approach. 205 206## Requirements 207 208- Applications running on the BMC must be able to report errors and failure 209 which are persisted and available for external system management through 210 standards such as Redfish. 211 212 - These errors must be structured, versioned, and the complete set of errors 213 able to be created by the BMC should be available at built-time of a BMC 214 image. 215 - The set of errors, able to be created by the BMC, must be able to be 216 transformed into relevant data sets, such as Redfish Message Registries. 217 - For Redfish, the transformation must comply with the Redfish standard 218 requirements, such as conforming to semantic versioning expectations. 219 - For Redfish, the transformation should allow mapping internally defined 220 events to pre-existing Redfish Message Registries for broader 221 compatibility. 222 - For Redfish, the implementation must also support the EventService 223 mechanics for push-reporting. 224 - Errors reported by the BMC should contain sufficient information to allow 225 service of the system for these failures, either by humans or automation 226 (depending on the individual system requirements). 227 228- Applications running on the BMC should be able to report important tracing 229 events relevant to system management and/or debug, such as the system 230 successfully reaching a running state. 231 232 - All requirements relevant to errors are also applicable to tracing events. 233 - The implementation must have a mechanism for vendors to be able to disable 234 specific tracing events to conform to their own system design requirements. 235 236- Applications running on the BMC should be able to determine when a previously 237 reported error is no longer relevant and mark it as "resolved", while 238 maintaining the persistent record for future usages such as debug. 239 240- The BMC should provide a mechanism for managed entities within the server to 241 report their own errors and events. Examples of managed entities would be 242 firmware, such as the BIOS, and satellite management controllers. 243 244- The implementation on the BMC should scale to a minimum of 245 [10,000][error-discussion] error and events without impacting the BMC or 246 managed system performance. 247 248- The implementation should provide a mechanism to allow OEM or vendor 249 extensions to the error and event definitions (and generated artifacts such as 250 the Redfish Message Registry) for usage in closed-source or non-upstreamed 251 code. These extensions must be clearly identified, in all interfaces, as 252 vendor-specific and not be tied to the OpenBMC project. 253 254- APIs to implement error and event reporting should have good ergonomics. These 255 APIs must provide compile-time identification, for applicable programming 256 languages, of call sites which do not conform to the BMC error and event 257 specifications. 258 259 - The generated error classes and APIs should not require exceptions but 260 should also integrate with the `sdbusplus` client and server bindings, which 261 do leverage exceptions. 262 263[error-discussion]: 264 https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213 265 266## Proposed Design 267 268The proposed design has a few high-level design elements: 269 270- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error 271 reporting; expand it to cover tracing events; improve the ergonomics of the 272 associated APIs and add compile-time checking of missing metadata. 273 274- Add APIs to `phosphor-logging` to enable daemons to easily look up their own 275 previously reported events (for marking as resolved). 276 277- Add to `phosphor-logging` a compile-time mechanism to disable recording of 278 specific tracing events for vendor-level customization. 279 280- Generate a Redfish Message Registry for all error and events defined in 281 `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance 282 `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to 283 cover the Redfish Message Registry and `phosphor-logging` enhancements; 284 Leverage the Redfish `LogEntry.DiagnosticData` field to provide a 285 Base64-encoded JSON representation of the entire `Logging.Entry` for 286 additional diagnostics [[does this need to be optional?]]. Add support to the 287 `bmcweb` EventService implementation to support `phosphor-logging`-hosted 288 events. 289 290### `sdbusplus` 291 292The `Foo.errors.yaml` content will be combined with the content formerly in the 293`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new 294file type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the 295current `error` and `metadata` information as well as augment with additional 296information necessary to generate external facing datasets, such as Redfish 297Message Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files 298will be deprecated as their usage is replaced by the new format. 299 300The `sdbusplus` library will be enhanced to provide the following: 301 302- JSON serialization and de-serialization of generated exception types with 303 their assigned metadata; assignment of the JSON serialization to the `message` 304 field of `sd_bus_error_set` calls when errors are returned from DBus server 305 calls. 306 307- A facility to register exception types, at library load time, with the 308 `sdbusplus` library for automatic conversion back to C++ exception types in 309 DBus clients. 310 311The binding generator(s) will be expanded to do the following: 312 313- Generate complete C++ exception types, with compile-time checking of missing 314 metadata and JSON serialization, for errors and events. Metadata can be of one 315 of the following types: 316 317 - size-type and signed integer 318 - floating-point number 319 - string 320 - DBus object path 321 322- Generate a format that `bmcweb` can use to create and populate a Redfish 323 Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry` 324 for a set of errors and events 325 326For general users of `sdbusplus` these changes should have no impact, except for 327the availability of new generated exception types and that specialized instances 328of `sdbusplus::exception::generated_exception` will become available in DBus 329clients. 330 331### `phosphor-dbus-interfaces` 332 333Refactoring will be done to migrate existing `Foo.metadata.yaml` and 334`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by 335applications. Minor changes will take place to utilize the new binding 336generators from `sdbusplus`. A small library enhancement will be done to 337register all generated exception types with `sdbusplus`. Future contributors 338will be able to contribute new error and tracing event definitions. 339 340### `phosphor-logging` 341 342> TODO: Should a tracing event be a `Logging.Entry` with severity of 343> `Informational` or should they be a new type, such as `Logging.Event` and 344> managed separately. The `phosphor-logging` default `meson.options` have 345> `error_cap=200` and `error_info_cap=10`. If we increase the total number of 346> events allowed to 10K, the majority of them are likely going to be information 347> / tracing events. 348 349The `Logging.Entry` interface's `AdditionalData` property should change to 350`dict[string, variant[string,int64_t,size_t,object_path]]`. 351 352The `Logging.Create` interface will have a new method added: 353 354```yaml 355- name: CreateEntry 356 parameters: 357 - name: Message 358 type: string 359 - name: Severity 360 type: enum[Logging.Entry.Level] 361 - name: AdditionalData 362 type: dict[string, variant[string,int64_t,size_t,object_path]] 363 - name: Hint 364 type: string 365 default: "" 366 returns: 367 - name: Entry 368 type: object_path 369``` 370 371The `Hint` parameter is used for daemons to be able to query for their 372previously recorded error, for marking as resolved. These strings need to be 373globally unique and are suggested to be of the format `"<service_name>:<key>"`. 374 375A `Logging.SearchHint` interface will be created, which will be recorded at the 376same object path as a `Logging.Entry` when the `Hint` parameter was not an empty 377string: 378 379```yaml 380- property: Hint 381 type: string 382``` 383 384The `Logging.Manager` interface will be added with a single method: 385 386```yaml 387- name: FindEntry 388 parameters: 389 - name: Hint 390 type: String 391 returns: 392 - name: Entry 393 type: object_path 394 errors: 395 - xyz.openbmc_project.Common.ResourceNotFound 396``` 397 398A `lg2::commit` API will be added to support the new `sdbusplus` generated 399exception types, calling the new `Logging.Create.CreateEntry` method proposed 400earlier. This new API will support `sdbusplus::bus_t` for synchronous DBus 401operations and both `sdbusplus::async::context_t` and 402`sdbusplus::asio::connection` for asynchronous DBus operations. 403 404There are outstanding performance concerns with the `phosphor-logging` 405implementation that may impact the ability for scaling to 10,000 event records. 406This issue is expected to be self-contained within `phosphor-logging`, except 407for potential future changes to the log-retrieval interfaces used by `bmcweb`. 408In order to decouple the transition to this design, by callers of the logging 409APIs, from the experimentation and improvements in `phosphor-logging`, we will 410add a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit` 411behavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same 412approach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog` 413configuration and `bmcweb` support to use these directly. This will allow 414systems which knowingly scale to a large number of event records, using 415`rsyslog` mechanics, the same level of performance. One caveat of this support 416is that the hint and resolution behavior will not exist when that option is 417enabled. 418 419### `bmcweb` 420 421`bmcweb` already has support for build-time conversion from a Redfish Message 422Registry, codified in JSON, to header files it uses to serve the registry; this 423will be expanded to support Redfish Message Registries generated by `sdbusplus`. 424`bmcweb` will add a Meson option for additional message registries, provided 425from bitbake from `phosphor-dbus-interfaces` and vendor-specific event 426definitions as a path to a directory of Message Registry JSONs. Support will 427also be added for adding `phosphor-dbus-interfaces` as a Meson subproject for 428stand-alone testing. 429 430It is desirable for `sdbusplus` to generate a Redfish Message Registry directly, 431leveraging the existing scripts for integration with `bmcweb`. As part of this 432we would like to support mapping a `Logging.Entry` event to an existing 433standardized Redfish event (such as those in the Base registry). The generated 434information must contain the `Logging.Entry::Message` identifier, the 435`AdditionalData` to `MessageArgs` mapping, and the translation from the 436`Message` identifier to the Redfish Message ID (when the Message ID is not from 437"this" registry). In order to facilitate this, we will need to add OEM fields to 438the Redfish Message Registry JSON, which are only used by the `bmcweb` 439processing scripts, to generate the information necessary for this additional 440mapping. 441 442The `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be 443enhanced, to utilize these Message Registries, in four ways: 444 4451. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned 446 to the `DiagnosticData` property. 447 4482. If the `Logging.Entry::Message` contains an identifier corresponding to a 449 Registry entry, the `MessageId` property will be set to the corresponding 450 Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used 451 directly with no further transformation (as is done today). 452 4533. If the `Logging.Entry::Message` contains an identifier corresponding to a 454 Registry entry, the `MessageArgs` property will be filled in by obtaining the 455 corresponding values from the `AdditionalData` dictionary and the `Message` 456 field will be generated from combining these values with the `Message` string 457 from the Registry. 458 4594. A mechanism should be implemented to translate DBus `object_path` references 460 to Redfish Resource URIs. When an `object_path` cannot be translated, 461 `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value. 462 463The implementation of `EventService` should be enhanced to support 464`phosphor-logging` hosted events. The implementation of `LogService` should be 465enhanced to support log paging for `phosphor-logging` hosted events. 466 467### `phosphor-sel-logger` 468 469The `phosphor-sel-logger` has a meson option `send-to-logger` which toggles 470between using `phosphor-logging` or the [`REDFISH_MESSAGE_ID` 471mechanism][existing-design]. The `phosphor-logging`-utilizing paths will be 472updated to utilize `phosphor-dbus-interfaces` specified errors and events. 473 474### YAML format 475 476Consider an example file in `phosphor-dbus-interfaces` as 477`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors 478and events: 479 480```yaml 481version: 1.3.1 482 483errors: 484 - name: UpdateFailure 485 severity: critical 486 metadata: 487 - name: TARGET 488 type: string 489 primary: true 490 - name: ERRNO 491 type: int64 492 - name: CALLOUT_HARDWARE 493 type: object_path 494 primary: true 495 en: 496 description: While updating the firmware on a device, the update failed. 497 message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}. 498 resolution: Retry update. 499 500 - name: BMCUpdateFailure 501 severity: critical 502 deprecated: 1.0.0 503 en: 504 description: Failed to update the BMC 505 redfish-mapping: OpenBMC.FirmwareUpdateFailed 506 507events: 508 - name: UpdateProgress 509 metadata: 510 - name: TARGET 511 type: string 512 primary: true 513 - name: COMPLETION 514 type: double 515 primary: true 516 en: 517 description: An update is in progress and has reached a checkpoint. 518 message: Updating of {TARGET} is {COMPLETION}% complete. 519``` 520 521Each `foo.events.yaml` file would be used to generate both the C++ classes (via 522`sdbusplus`) for exception handling and event reporting, as well as a versioned 523Redfish Message Registry for the errors and events. The YAML schema is as 524follows: 525 526```yaml 527$id: https://openbmc-project.xyz/sdbusplus/events.schema.yaml 528$schema: https://json-schema.org/draft/2020-12/schema 529title: Event and error definitions 530type: object 531$defs: 532 event: 533 type: array 534 items: 535 type: object 536 properties: 537 name: 538 type: string 539 description: 540 An identifier for the event in UpperCamelCase; used as the class and 541 Redfish Message ID. 542 en: 543 type: object 544 description: The details for English. 545 properties: 546 description: 547 type: string 548 description: 549 A developer-applicable description of the error reported. These 550 form the "description" of the Redfish message. 551 message: 552 type: string 553 description: 554 The end-user message, including placeholders for arguemnts. 555 resolution: 556 type: string 557 description: The end-user resolution. 558 severity: 559 enum: 560 - emergency 561 - alert 562 - critical 563 - error 564 - warning 565 - notice 566 - informational 567 - debug 568 description: 569 The `xyz.openbmc_project.Logging.Entry.Level` value for this 570 error. Only applicable for 'errors'. 571 redfish-mapping: 572 type: string 573 description: 574 Used when a `sdbusplus` event should map to a specific Redfish 575 Message rather than a generated one. This is useful when an internal 576 error has an analog in a standardized registry. 577 deprecated: 578 type: string 579 pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$" 580 description: 581 Indicates that the event is now deprecated and should not be created 582 by any OpenBMC software, but is required to still exist for 583 generation in the Redfish Message Registry. The version listed here 584 should be the first version where the error is no longer used. 585 metadata: 586 type: array 587 items: 588 type: object 589 properties: 590 name: 591 type: string 592 description: The name of the metadata field. 593 type: 594 enum: 595 - string 596 - size 597 - int64 598 - uint64 599 - double 600 - object_path 601 description: The type of the metadata field. 602 primary: 603 type: boolean 604 description: 605 Set to true when the metadata field is expected to be part of 606 the Redfish `MessageArgs` (and not only in the extended 607 `DiagnosticData`). 608properties: 609 version: 610 type: string 611 pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$" 612 description: 613 The version of the file, which will be used as the Redfish Message 614 Registry version. 615errors: 616 $ref: "#/definitions/event" 617events: 618 $ref: ":#/definitions/event" 619``` 620 621The above example YAML would generate C++ classes similar to: 622 623```cpp 624namespace sdbusplus::errors::xyz::openbmc_project::software::update 625{ 626 627class UpdateFailure 628{ 629 630 template <typename... Args> 631 UpdateFailure(Args&&... args); 632}; 633 634} 635 636namespace sdbusplus::events::xyz::openbmc_project::software::update 637{ 638 639class UpdateProgress 640{ 641 template <typename... Args> 642 UpdateProgress(Args&&... args); 643}; 644 645} 646``` 647 648The constructors here are variadic templates because the generated constructor 649implementation will provide compile-time assurance that all of the metadata 650fields have been populated (in any order). To raise an `UpdateFailure` a 651developers might do something like: 652 653```cpp 654// Immediately report the event: 655lg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path)); 656// or send it in a dbus response (when using sdbusplus generated binding): 657throw UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path); 658``` 659 660If one of the fields, such as `ERRNO` were omitted, a compile failure will be 661raised indicating the first missing field. 662 663### Versioning Policy 664 665Assume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention. 666 667- Adjusting a description or message should result in a `PATCH` increment. 668- Adding a new error or event, or adding metadata to an existing error or event, 669 should result in a `MINOR` increment. 670- Deprecating an error or event should result in a `MAJOR` increment. 671 672There is [guidance on maintenance][registry-guidance] of the OpenBMC Message 673Registry. We will incorporate that guidance into the equivalent 674`phosphor-dbus-interfaces` policy. 675 676[registry-guidance]: 677 https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md 678 679### Generated Redfish Message Registry 680 681[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish 682Message Registries and dictates guidelines for identifiers. 683 684The hypothetical events defined above would create a message registry similar 685to: 686 687```json 688{ 689 "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1", 690 "Language": "en", 691 "Messages": { 692 "UpdateFailure": { 693 "Description": "While updating the firmware on a device, the update failed.", 694 "Message": "A failure occurred updating %1 on %2.", 695 "Resolution": "Retry update." 696 "NumberOfArgs": 2, 697 "ParamTypes": ["string", "string"], 698 "Severity": "Critical", 699 }, 700 "UpdateProgress" : { 701 "Description": "An update is in progress and has reached a checkpoint." 702 "Message": "Updating of %1 is %2\% complete.", 703 "Resolution": "None", 704 "NumberOfArgs": 2, 705 "ParamTypes": ["string", "number"], 706 "Severity": "OK", 707 } 708 } 709} 710``` 711 712The prefix `OpenBMC_Base` shall be exclusively reserved for use by events from 713`phosphor-logging`. Events defined in other repositories will be expected to use 714some other prefix. Vendor-defined repositories should use a vendor-owned prefix 715as directed by [DSP0266][dsp0266]. 716 717[dsp0266]: 718 https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf 719 720### Vendor implications 721 722As specified above, vendors must use their own identifiers in order to conform 723with the Redfish specification (see [DSP0266][dsp0266] for requirements on 724identifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`) 725implementation(s) will enable vendors to create their own events for downstream 726code and Registries for integration with Redfish, by creating downstream 727repositories of error definitions. Vendors are responsible for ensuring their 728own versioning and identifiers conform to the expectations in the [Redfish 729specification][dsp0266]. 730 731One potential bad behavior on the part of vendors would be forking and modifying 732`phosphor-dbus-interfaces` defined events. Vendors must not add their own events 733to `phosphor-dbus-interfaces` in downstream implementations because it would 734lead to their implementation advertising support for a message in an 735OpenBMC-owned Registry which is not the case, but they should add them to their 736own repositories with a separate identifier. Similarly, if a vendor were to 737_backport_ upstream changes into their fork, they would need to ensure that the 738`foo.events.yaml` file for that version matches identically with the upstream 739implementation. 740 741## Alternatives Considered 742 743Many alternatives have been explored and referenced through earlier work. Within 744this proposal there are many minor-alternatives that have been assessed. 745 746### Exception inheritance 747 748The original `phosphor-logging` error descriptions allowed inheritance between 749two errors. This is not supported by the proposal for two reasons: 750 751- This introduces complexity in the Redfish Message Registry versioning because 752 a change in one file should induce version changes in all dependent files. 753 754- It makes it difficult for a developer to clearly identify all of the fields 755 they are expected to populate without traversing multiple files. 756 757### sdbusplus Exception APIs 758 759There are a few possible syntaxes I came up with for constructing the generated 760exception types. It is important that these have good ergonomics, are easy to 761understand, and can provide compile-time awareness of missing metadata fields. 762 763```cpp 764 using Example = sdbusplus::error::xyz::openbmc_project::Example; 765 766 // 1) 767 throw Example().fru("Motherboard").value(42); 768 769 // 2) 770 throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42); 771 772 // 3) 773 throw Example("FRU", "Motherboard", "VALUE", 42); 774 775 // 4) 776 throw Example([](auto e) { return e.fru("Motherboard").value(42); }); 777 778 // 5) 779 throw Example({.fru = "Motherboard", .value = 42}); 780``` 781 782**Note**: These examples are all show using `throw` syntax, but could also be 783saved in local variables, returned from functions, or immediately passed to 784`lg2::commit`. 785 7861. This would be my preference for ergonomics and clarity, as it would allow 787 LSP-enabled editors to give completions for the metadata fields but 788 unfortunately there is no mechanism in C++ to define a type which can be 789 constructed but not thrown, which means we cannot get compile-time checking 790 of all metadata fields. 791 7922. This syntax uses tag-dispatch to enables compile-time checking of all 793 metadata fields and potential LSP-completion of the tag-types, but is more 794 verbose than option 3. 795 7963. This syntax is less verbose than (2) and follows conventions already used in 797 `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the 798 metadata tags. 799 8004. This syntax is similar to option (1) but uses an indirection of a lambda to 801 enable compile-time checking that all metadata fields have been populated by 802 the lambda. The LSP-completion is likely not as strong as option (1), due to 803 the use of `auto`, and the lambda necessity will likely be a hang-up for 804 unfamiliar developers. 805 8065. This syntax has similar characteristics as option (1) but similarly does not 807 provide compile-time confirmation that all fields have been populated. 808 809The proposal therefore suggests option (3) is most suitable. 810 811### Redfish Translation Support 812 813The proposed YAML format allows future addition of translation but it is not 814enabled at this time. Future development could enable the Redfish Message 815Registry to be generated in multiple languages if the `message:language` exists 816for those languages. 817 818### Redfish Registry Versioning 819 820The Redfish Message Registries are required to be versioned and has 3 digit 821fields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the 822Message ID. Rather than using the manually specified version we could take a few 823other approaches: 824 825- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the 826 registry was built. 827 828 - This does not cover vendors that may choose to branch for stabilization 829 purposes, so we can end up with two machines having the same 830 OpenBMC-versioned message registry with different content. 831 832- Use the most recent `openbmc/openbmc` tag as the version. 833 834 - This does not cover vendors that build off HEAD and may deploy multiple 835 images between two OpenBMC releases. 836 837- Generate the version based on the git-history. 838 839 - This requires `phosphor-dbus-interfaces` to be built from a git repository, 840 which may not always be true for Yocto source mirrors, and requires 841 non-trivial processing that continues to scale over time. 842 843### Existing OpenBMC Redfish Registry 844 845There are currently 191 messages defined in the existing Redfish Message 846Registry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase 847is emitted with the correct version. 96 of those are only emitted by 848Intel-specific code that is not pulled into any upstreamed machine, 39 are 849emitted by potentially common code, and 56 are not even referenced in the 850codebase outside of the bmcweb registry. Of the 39 common messages half of them 851have an equivalent in one of the standard registries that should be leveraged 852and many of the others do not have attributes that would facilitate a multi-host 853configuration, so the registry at a minimum needs to be updated. None of the 854current implementation has the capability to handle Redfish Resource URIs. 855 856The proposal therefore is to deprecate the existing registry and replace it with 857the new generated registries. For repositories that currently emit events in the 858existing format, we can maintain those call-sites for a time period of 1-2 859years. 860 861If this aspect of the proposal is rejected, the YAML format allows mapping from 862`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0` 863registry `MessageIds`. 864 865Potentially common: 866 867- phosphor-post-code-manager 868 - BIOSPOSTCode (unique) 869- dbus-sensors 870 - ChassisIntrusionDetected (unique) 871 - ChassisIntrusionReset (unique) 872 - FanInserted 873 - FanRedundancyLost (unique) 874 - FanRedudancyRegained (unique) 875 - FanRemoved 876 - LanLost 877 - LanRegained 878 - PowerSupplyConfigurationError (unique) 879 - PowerSupplyConfigurationErrorRecovered (unique) 880 - PowerSupplyFailed 881 - PowerSupplyFailurePredicted (unique) 882 - PowerSupplyFanFailed 883 - PowerSupplyFanRecovered 884 - PowerSupplyPowerLost 885 - PowerSupplyPowerRestored 886 - PowerSupplyPredictiedFailureRecovered (unique) 887 - PowerSupplyRecovered 888- phosphor-sel-logger 889 - IPMIWatchdog (unique) 890 - `SensorThreshold*` : 8 different events 891- phosphor-net-ipmid 892 - InvalidLoginAttempted (unique) 893- entity-manager 894 - InventoryAdded (unique) 895 - InventoryRemoved (unique) 896- estoraged 897 - ServiceStarted 898- x86-power-control 899 - NMIButtonPressed (unique) 900 - NMIDiagnosticInterrupt (unique) 901 - PowerButtonPressed (unique) 902 - PowerRestorePolicyApplied (unique) 903 - PowerSupplyPowerGoodFailed (unique) 904 - ResetButtonPressed (unique) 905 - SystemPowerGoodFailed (unique) 906 907Intel-only implementations: 908 909- intel-ipmi-oem 910 - ADDDCCorrectable 911 - BIOSPostERROR 912 - BIOSRecoveryComplete 913 - BIOSRecoveryStart 914 - FirmwareUpdateCompleted 915 - IntelUPILinkWidthReducedToHalf 916 - IntelUPILinkWidthReducedToQuarter 917 - LegacyPCIPERR 918 - LegacyPCISERR 919 - `ME*` : 29 different events 920 - `Memory*` : 9 different events 921 - MirroringRedundancyDegraded 922 - MirroringRedundancyFull 923 - `PCIeCorrectable*`, `PCIeFatal` : 29 different events 924 - SELEntryAdded 925 - SparingRedundancyDegraded 926- pfr-manager 927 - BIOSFirmwareRecoveryReason 928 - BIOSFirmwarePanicReason 929 - BMCFirmwarePanicReason 930 - BMCFirmwareRecoveryReason 931 - BMCFirmwareResiliencyError 932 - CPLDFirmwarePanicReason 933 - CPLDFirmwareResilencyError 934 - FirmwareResiliencyError 935- host-error-monitor 936 - CPUError 937 - CPUMismatch 938 - CPUThermalTrip 939 - ComponentOverTemperature 940 - SsbThermalTrip 941 - VoltageRegulatorOverheated 942- s2600wf-misc 943 - DriveError 944 - InventoryAdded 945 946## Impacts 947 948- New APIs are defined for error and event logging. This will deprecate existing 949 `phosphor-logging` APIs, with a time to migrate, for error reporting. 950 951- The design should improve performance by eliminating the regular parsing of 952 the `systemd` journal. The design may decrease performance by allowing the 953 number of error and event logs to be dramatically increased, which have an 954 impact to file system utilization and potential for DBus impacts some services 955 such as `ObjectMapper`. 956 957- Backwards compatibility and documentation should be improved by the automatic 958 generation of the Redfish Message Registry corresponding to all error and 959 event reports. 960 961### Organizational 962 963- **Does this repository require a new repository?** 964 - No 965- **Who will be the initial maintainer(s) of this repository?** 966 - N/A 967- **Which repositories are expected to be modified to execute this design?** 968 - `sdbusplus` 969 - `phosphor-dbus-interfaces` 970 - `phosphor-logging` 971 - `bmcweb` 972 - Any repository creating an error or event. 973 974## Testing 975 976- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error 977 and event generation, creation APIs, and to provide coverage on any changes to 978 the `Logging.Entry` object management. 979 980- Unit tests will be written for `bmcweb` for basic `Logging.Entry` 981 transformation and Message Registry generation. 982 983- Integration tests should be leveraged (and enhanced as necessary) from 984 `openbmc-test-automation` to cover the end-to-end error creation and Redfish 985 reporting. 986