1ede0a25eSPatrick Williams# Error and Event Logging 2ede0a25eSPatrick Williams 3ede0a25eSPatrick WilliamsAuthor: [Patrick Williams][patrick-email] `<stwcx>` 4ede0a25eSPatrick Williams 5ede0a25eSPatrick Williams[patrick-email]: mailto:patrick@stwcx.xyz 6ede0a25eSPatrick Williams 7ede0a25eSPatrick WilliamsOther contributors: 8ede0a25eSPatrick Williams 9ede0a25eSPatrick WilliamsCreated: May 16, 2024 10ede0a25eSPatrick Williams 11ede0a25eSPatrick Williams## Problem Description 12ede0a25eSPatrick Williams 13ede0a25eSPatrick WilliamsThere is currently not a consistent end-to-end error and event reporting design 14ede0a25eSPatrick Williamsfor the OpenBMC code stack. There are two different implementations, one 15ede0a25eSPatrick Williamsprimarily using phosphor-logging and one using rsyslog, both of which have gaps 16ede0a25eSPatrick Williamsthat a complete solution should address. This proposal is intended to be an 17ede0a25eSPatrick Williamsend-to-end design handling both errors and tracing events which facilitate 18ede0a25eSPatrick Williamsexternal management of the system in an automated and maintainable manner. 19ede0a25eSPatrick Williams 20ede0a25eSPatrick Williams## Background and References 21ede0a25eSPatrick Williams 22ede0a25eSPatrick Williams### Redfish LogEntry and Message Registry 23ede0a25eSPatrick Williams 24ede0a25eSPatrick WilliamsIn Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that 25ede0a25eSPatrick Williamscould be considered "logs", but one such use within OpenBMC is for an equivalent 26ede0a25eSPatrick Williamsof the IPMI "System Event Log (SEL)". 27ede0a25eSPatrick Williams 28ede0a25eSPatrick WilliamsThe IPMI SEL is the location where the BMC can collect errors and events, 29ede0a25eSPatrick Williamssometimes coming from other entities, such as the BIOS. Examples of these might 30ede0a25eSPatrick Williamsbe "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful". 31ede0a25eSPatrick WilliamsThese SEL records are exposed as human readable strings, either natively by a 32ede0a25eSPatrick WilliamsOEM SEL design or by tools such as `ipmitool`, which are typically unique to 33ede0a25eSPatrick Williamseach system or manufacturer, and could hypothethically change with a BMC or 34ede0a25eSPatrick Williamsfirmware update, and are thus difficult to create automated tooling around. Two 35ede0a25eSPatrick Williamsdifferent vendors might use different strings to represent a critical 36ede0a25eSPatrick Williamstemperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example] 37ede0a25eSPatrick Williamsand ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is 38ede0a25eSPatrick Williamsalso no mechanism with IPMI to ask the machine "what are all of the SELs you 39ede0a25eSPatrick Williamsmight create". 40ede0a25eSPatrick Williams 41ede0a25eSPatrick WilliamsIn order to solve two aspects of this problem, listing of possible events and 42ede0a25eSPatrick Williamsversioning, Redfish has Message Registries. A message registry is a versioned 43ede0a25eSPatrick Williamscollection of all of the error events that a system could generate and hints as 44ede0a25eSPatrick Williamsto how they might be parsed and displayed to a user. An [informative 45ede0a25eSPatrick Williamsreference][Registry-Example] from the DMTF gives this example: 46ede0a25eSPatrick Williams 47ede0a25eSPatrick Williams```json 48ede0a25eSPatrick Williams{ 49ede0a25eSPatrick Williams "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry", 50ede0a25eSPatrick Williams "Id": "Alert.1.0.0", 51ede0a25eSPatrick Williams "RegistryPrefix": "Alert", 52ede0a25eSPatrick Williams "RegistryVersion": "1.0.0", 53ede0a25eSPatrick Williams "Messages": { 54ede0a25eSPatrick Williams "LanDisconnect": { 55ede0a25eSPatrick Williams "Description": "A LAN Disconnect on %1 was detected on system %2.", 56ede0a25eSPatrick Williams "Message": "A LAN Disconnect on %1 was detected on system %2.", 57ede0a25eSPatrick Williams "Severity": "Warning", 58ede0a25eSPatrick Williams "NumberOfArgs": 2, 59ede0a25eSPatrick Williams "Resolution": "None" 60ede0a25eSPatrick Williams } 61ede0a25eSPatrick Williams } 62ede0a25eSPatrick Williams} 63ede0a25eSPatrick Williams``` 64ede0a25eSPatrick Williams 65ede0a25eSPatrick WilliamsThis example defines an event, `Alert.1.0.LanDisconnect`, which can record the 66ede0a25eSPatrick Williamsdisconnect state of a network device and contains placeholders for the affected 67ede0a25eSPatrick Williamsdevice and system. When this event occurs, there might be a `LogEntry` recorded 68ede0a25eSPatrick Williamscontaining something like: 69ede0a25eSPatrick Williams 70ede0a25eSPatrick Williams```json 71ede0a25eSPatrick Williams{ 72ede0a25eSPatrick Williams "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.", 73ede0a25eSPatrick Williams "MessageId": "Alert.1.0.LanDisconnect", 74ede0a25eSPatrick Williams "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"] 75ede0a25eSPatrick Williams} 76ede0a25eSPatrick Williams``` 77ede0a25eSPatrick Williams 78ede0a25eSPatrick WilliamsThe `Message` contains a human readable string which was created by applying the 79ede0a25eSPatrick Williams`MessageArgs` to the placeholders from the `Message` field in the registry. 80ede0a25eSPatrick WilliamsSystem management software can rely on the message registry (referenced from the 81ede0a25eSPatrick Williams`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to 82ede0a25eSPatrick Williamsperform string processing for reacting to the event. 83ede0a25eSPatrick Williams 84ede0a25eSPatrick WilliamsWithin OpenBMC, there is currently a [limited design][existing-design] for this 85ede0a25eSPatrick WilliamsRedfish feature and it requires inserting specially formed Redfish-specific 86ede0a25eSPatrick Williamslogging messages into any application that wants to record these events, tightly 87ede0a25eSPatrick Williamscoupling all applications to the Redfish implementation. It has also been 88ede0a25eSPatrick Williamsobserved that these [strings][app-example], when used, are often out of date 89ede0a25eSPatrick Williamswith the [message registry][registry-example] advertised by `bmcweb`. Some 90ede0a25eSPatrick Williamsmaintainers have rejected adding new Redfish-specific logging messages to their 91ede0a25eSPatrick Williamsapplications. 92ede0a25eSPatrick Williams 93ede0a25eSPatrick Williams[LogEntry]: 94ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json 95ede0a25eSPatrick Williams[HPE-Example]: 96ede0a25eSPatrick Williams https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html 97ede0a25eSPatrick Williams[Oracle-Example]: 98ede0a25eSPatrick Williams https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068 99ede0a25eSPatrick Williams[Registry-Example]: 100ede0a25eSPatrick Williams https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf 101ede0a25eSPatrick Williams[existing-design]: 102ede0a25eSPatrick Williams https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md 103ede0a25eSPatrick Williams[app-example]: 104ede0a25eSPatrick Williams https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143 105ede0a25eSPatrick Williams[registry-example]: 106ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5 107ede0a25eSPatrick Williams 108ede0a25eSPatrick Williams### Existing phosphor-logging implementation 109ede0a25eSPatrick Williams 110ede0a25eSPatrick Williams**Note**: While the word 'exception' is used in this section, the existing (and 111ede0a25eSPatrick Williamsproposed) types can be used by applications and execution contexts with 112ede0a25eSPatrick Williamsexceptions disabled. They are 'exceptions' because they do inherit from 113ede0a25eSPatrick Williams`std::exception` and there is support in the `sdbusplus` bindings for them to be 114ede0a25eSPatrick Williamsused in exception handling. 115ede0a25eSPatrick Williams 116ede0a25eSPatrick WilliamsThe `sdbusplus` bindings have the capability to define new C++ exception types 117ede0a25eSPatrick Williamswhich can be thrown by a DBus server and turned into an error response to the 118ede0a25eSPatrick Williamsclient. `phosphor-logging` extended this to also add metadata associated to the 119ede0a25eSPatrick Williamslog type. See the following example error definitions and usages. 120ede0a25eSPatrick Williams 121ede0a25eSPatrick Williams`sdbusplus` error binding definition (in 122ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.errors.yaml`): 123ede0a25eSPatrick Williams 124ede0a25eSPatrick Williams```yaml 125ede0a25eSPatrick Williams- name: InvalidCertificate 126ede0a25eSPatrick Williams description: Invalid certificate file. 127ede0a25eSPatrick Williams``` 128ede0a25eSPatrick Williams 129ede0a25eSPatrick Williams`phosphor-logging` metadata definition (in 130ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.metadata.yaml`): 131ede0a25eSPatrick Williams 132ede0a25eSPatrick Williams```yaml 133ede0a25eSPatrick Williams- name: InvalidCertificate 134ede0a25eSPatrick Williams meta: 135ede0a25eSPatrick Williams - str: "REASON=%s" 136ede0a25eSPatrick Williams type: string 137ede0a25eSPatrick Williams``` 138ede0a25eSPatrick Williams 139ede0a25eSPatrick WilliamsApplication code reporting an error: 140ede0a25eSPatrick Williams 141ede0a25eSPatrick Williams```cpp 142ede0a25eSPatrick Williamselog<InvalidCertificate>(Reason("Invalid certificate file format")); 143ede0a25eSPatrick Williams// or 144ede0a25eSPatrick Williamsreport<InvalidCertificate>(Reason("Existing certificate file is corrupted")); 145ede0a25eSPatrick Williams``` 146ede0a25eSPatrick Williams 147ede0a25eSPatrick WilliamsIn this sample, an error named 148ede0a25eSPatrick Williams`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can 149ede0a25eSPatrick Williamsbe sent between applications as a DBus response. The `InvalidCertificate` is 150ede0a25eSPatrick Williamsexpected to have additional metadata `REASON` which is a string. The two APIs 151ede0a25eSPatrick Williams`elog` and `report` have slightly different behaviors: `elog` throws an 152ede0a25eSPatrick Williamsexception which can either result in an error DBus result or be handled 153ede0a25eSPatrick Williamselsewhere in the application, while `report` sends the event directly to 154ede0a25eSPatrick Williams`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the 155ede0a25eSPatrick Williamsmetadata is inserted into the `systemd` journal. 156ede0a25eSPatrick Williams 157ede0a25eSPatrick WilliamsWhen an error is sent to the `phosphor-logging` daemon, it will: 158ede0a25eSPatrick Williams 159ede0a25eSPatrick Williams1. Search back through the journal for recorded metadata associated with the 160ede0a25eSPatrick Williams event (this is a relative slow operation). 161ede0a25eSPatrick Williams2. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object 162ede0a25eSPatrick Williams with the associated data extracted from the journal. 163ede0a25eSPatrick Williams3. Persist a serialized version of the object. 164ede0a25eSPatrick Williams 165ede0a25eSPatrick WilliamsWithin `bmcweb` there is support for translating 166ede0a25eSPatrick Williams`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging` 167ede0a25eSPatrick Williamsinto Redfish `LogEntries`, but this support does not reference a Message 168ede0a25eSPatrick WilliamsRegistry. This makes the events of limited utility for consumption by system 169ede0a25eSPatrick Williamsmanagement software, as it cannot know all of the event types and is left to 170ede0a25eSPatrick Williamsperform (hand-coded) regular-expressions to extract any information from the 171ede0a25eSPatrick Williams`Message` field of the `LogEntry`. Furthermore, these regular-expressions are 172ede0a25eSPatrick Williamslikely to become outdated over time as internal OpenBMC error reporting 173ede0a25eSPatrick Williamsstructure, metadata, or message strings evolve. 174ede0a25eSPatrick Williams 175ede0a25eSPatrick Williams[Logging-Entry]: 176ede0a25eSPatrick Williams https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1 177ede0a25eSPatrick Williams 178ede0a25eSPatrick Williams### Issues with the Status Quo 179ede0a25eSPatrick Williams 180ede0a25eSPatrick Williams- There are two different implementations of error logging, neither of which are 181ede0a25eSPatrick Williams both complete and fully accepted by maintainers. These implementations also do 182ede0a25eSPatrick Williams not cover tracing events. 183ede0a25eSPatrick Williams 184ede0a25eSPatrick Williams- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish 185ede0a25eSPatrick Williams Message Registry and the reporting application. It also requires every 186ede0a25eSPatrick Williams application to be "Redfish aware" which limits decoupling between applications 187ede0a25eSPatrick Williams and external management interfaces. This also leaves gaps for reporting errors 188ede0a25eSPatrick Williams in different management interfaces, such as inband IPMI and PLDM. The approach 189ede0a25eSPatrick Williams also does not provide comple-time assurance of appropriate metadata 190ede0a25eSPatrick Williams collection, which can lead to producing code being out-of-date with the 191ede0a25eSPatrick Williams message registry definitions. 192ede0a25eSPatrick Williams 193ede0a25eSPatrick Williams- The `phosphor-logging` approach does not provide compile-time assurance of 194ede0a25eSPatrick Williams appropriate metadata collection and requires expensive daemon processing of 195ede0a25eSPatrick Williams the `systemd` journal on each error report, which limits scalability. 196ede0a25eSPatrick Williams 197ede0a25eSPatrick Williams- The `sdbusplus` bindings for error reporting do not currently handle lossless 198ede0a25eSPatrick Williams transmission of errors between DBus servers and clients. 199ede0a25eSPatrick Williams 200ede0a25eSPatrick Williams- Similar applications can result in different Redfish `LogEntry` for the same 201ede0a25eSPatrick Williams error scenario. This has been observed in sensor threshold exceeded events 202ede0a25eSPatrick Williams between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and 203ede0a25eSPatrick Williams `phosphor-health-monitor`. One cause of this is two different error reporting 204ede0a25eSPatrick Williams approaches and disagreements amongst maintainers as to the preferred approach. 205ede0a25eSPatrick Williams 206ede0a25eSPatrick Williams## Requirements 207ede0a25eSPatrick Williams 208ede0a25eSPatrick Williams- Applications running on the BMC must be able to report errors and failure 209ede0a25eSPatrick Williams which are persisted and available for external system management through 210ede0a25eSPatrick Williams standards such as Redfish. 211ede0a25eSPatrick Williams 212ede0a25eSPatrick Williams - These errors must be structured, versioned, and the complete set of errors 213ede0a25eSPatrick Williams able to be created by the BMC should be available at built-time of a BMC 214ede0a25eSPatrick Williams image. 215ede0a25eSPatrick Williams - The set of errors, able to be created by the BMC, must be able to be 216ede0a25eSPatrick Williams transformed into relevant data sets, such as Redfish Message Registries. 217ede0a25eSPatrick Williams - For Redfish, the transformation must comply with the Redfish standard 218ede0a25eSPatrick Williams requirements, such as conforming to semantic versioning expectations. 219ede0a25eSPatrick Williams - For Redfish, the transformation should allow mapping internally defined 220ede0a25eSPatrick Williams events to pre-existing Redfish Message Registries for broader 221ede0a25eSPatrick Williams compatibility. 222ede0a25eSPatrick Williams - For Redfish, the implementation must also support the EventService 223ede0a25eSPatrick Williams mechanics for push-reporting. 224ede0a25eSPatrick Williams - Errors reported by the BMC should contain sufficient information to allow 225ede0a25eSPatrick Williams service of the system for these failures, either by humans or automation 226ede0a25eSPatrick Williams (depending on the individual system requirements). 227ede0a25eSPatrick Williams 228ede0a25eSPatrick Williams- Applications running on the BMC should be able to report important tracing 229ede0a25eSPatrick Williams events relevant to system management and/or debug, such as the system 230ede0a25eSPatrick Williams successfully reaching a running state. 231ede0a25eSPatrick Williams 232ede0a25eSPatrick Williams - All requirements relevant to errors are also applicable to tracing events. 233ede0a25eSPatrick Williams - The implementation must have a mechanism for vendors to be able to disable 234ede0a25eSPatrick Williams specific tracing events to conform to their own system design requirements. 235ede0a25eSPatrick Williams 236ede0a25eSPatrick Williams- Applications running on the BMC should be able to determine when a previously 237ede0a25eSPatrick Williams reported error is no longer relevant and mark it as "resolved", while 238ede0a25eSPatrick Williams maintaining the persistent record for future usages such as debug. 239ede0a25eSPatrick Williams 240ede0a25eSPatrick Williams- The BMC should provide a mechanism for managed entities within the server to 241ede0a25eSPatrick Williams report their own errors and events. Examples of managed entities would be 242ede0a25eSPatrick Williams firmware, such as the BIOS, and satellite management controllers. 243ede0a25eSPatrick Williams 244ede0a25eSPatrick Williams- The implementation on the BMC should scale to a minimum of 245ede0a25eSPatrick Williams [10,000][error-discussion] error and events without impacting the BMC or 246ede0a25eSPatrick Williams managed system performance. 247ede0a25eSPatrick Williams 248ede0a25eSPatrick Williams- The implementation should provide a mechanism to allow OEM or vendor 249ede0a25eSPatrick Williams extensions to the error and event definitions (and generated artifacts such as 250ede0a25eSPatrick Williams the Redfish Message Registry) for usage in closed-source or non-upstreamed 251ede0a25eSPatrick Williams code. These extensions must be clearly identified, in all interfaces, as 252ede0a25eSPatrick Williams vendor-specific and not be tied to the OpenBMC project. 253ede0a25eSPatrick Williams 254ede0a25eSPatrick Williams- APIs to implement error and event reporting should have good ergonomics. These 255ede0a25eSPatrick Williams APIs must provide compile-time identification, for applicable programming 256ede0a25eSPatrick Williams languages, of call sites which do not conform to the BMC error and event 257ede0a25eSPatrick Williams specifications. 258ede0a25eSPatrick Williams 259ede0a25eSPatrick Williams - The generated error classes and APIs should not require exceptions but 260ede0a25eSPatrick Williams should also integrate with the `sdbusplus` client and server bindings, which 261ede0a25eSPatrick Williams do leverage exceptions. 262ede0a25eSPatrick Williams 263ede0a25eSPatrick Williams[error-discussion]: 264ede0a25eSPatrick Williams https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213 265ede0a25eSPatrick Williams 266ede0a25eSPatrick Williams## Proposed Design 267ede0a25eSPatrick Williams 268ede0a25eSPatrick WilliamsThe proposed design has a few high-level design elements: 269ede0a25eSPatrick Williams 270ede0a25eSPatrick Williams- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error 271ede0a25eSPatrick Williams reporting; expand it to cover tracing events; improve the ergonomics of the 272ede0a25eSPatrick Williams associated APIs and add compile-time checking of missing metadata. 273ede0a25eSPatrick Williams 274ede0a25eSPatrick Williams- Add APIs to `phosphor-logging` to enable daemons to easily look up their own 275ede0a25eSPatrick Williams previously reported events (for marking as resolved). 276ede0a25eSPatrick Williams 277ede0a25eSPatrick Williams- Add to `phosphor-logging` a compile-time mechanism to disable recording of 278ede0a25eSPatrick Williams specific tracing events for vendor-level customization. 279ede0a25eSPatrick Williams 280ede0a25eSPatrick Williams- Generate a Redfish Message Registry for all error and events defined in 281ede0a25eSPatrick Williams `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance 282ede0a25eSPatrick Williams `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to 283ede0a25eSPatrick Williams cover the Redfish Message Registry and `phosphor-logging` enhancements; 284ede0a25eSPatrick Williams Leverage the Redfish `LogEntry.DiagnosticData` field to provide a 285ede0a25eSPatrick Williams Base64-encoded JSON representation of the entire `Logging.Entry` for 286ede0a25eSPatrick Williams additional diagnostics [[does this need to be optional?]]. Add support to the 287ede0a25eSPatrick Williams `bmcweb` EventService implementation to support `phosphor-logging`-hosted 288ede0a25eSPatrick Williams events. 289ede0a25eSPatrick Williams 290ede0a25eSPatrick Williams### `sdbusplus` 291ede0a25eSPatrick Williams 292ede0a25eSPatrick WilliamsThe `Foo.errors.yaml` content will be combined with the content formerly in the 293ede0a25eSPatrick Williams`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new 294ede0a25eSPatrick Williamsfile type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the 295ede0a25eSPatrick Williamscurrent `error` and `metadata` information as well as augment with additional 296ede0a25eSPatrick Williamsinformation necessary to generate external facing datasets, such as Redfish 297ede0a25eSPatrick WilliamsMessage Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files 298ede0a25eSPatrick Williamswill be deprecated as their usage is replaced by the new format. 299ede0a25eSPatrick Williams 300ede0a25eSPatrick WilliamsThe `sdbusplus` library will be enhanced to provide the following: 301ede0a25eSPatrick Williams 302ede0a25eSPatrick Williams- JSON serialization and de-serialization of generated exception types with 303ede0a25eSPatrick Williams their assigned metadata; assignment of the JSON serialization to the `message` 304ede0a25eSPatrick Williams field of `sd_bus_error_set` calls when errors are returned from DBus server 305ede0a25eSPatrick Williams calls. 306ede0a25eSPatrick Williams 307ede0a25eSPatrick Williams- A facility to register exception types, at library load time, with the 308ede0a25eSPatrick Williams `sdbusplus` library for automatic conversion back to C++ exception types in 309ede0a25eSPatrick Williams DBus clients. 310ede0a25eSPatrick Williams 311ede0a25eSPatrick WilliamsThe binding generator(s) will be expanded to do the following: 312ede0a25eSPatrick Williams 313ede0a25eSPatrick Williams- Generate complete C++ exception types, with compile-time checking of missing 314ede0a25eSPatrick Williams metadata and JSON serialization, for errors and events. Metadata can be of one 315ede0a25eSPatrick Williams of the following types: 316ede0a25eSPatrick Williams 317ede0a25eSPatrick Williams - size-type and signed integer 318ede0a25eSPatrick Williams - floating-point number 319ede0a25eSPatrick Williams - string 320ede0a25eSPatrick Williams - DBus object path 321ede0a25eSPatrick Williams 322ede0a25eSPatrick Williams- Generate a format that `bmcweb` can use to create and populate a Redfish 323ede0a25eSPatrick Williams Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry` 324ede0a25eSPatrick Williams for a set of errors and events 325ede0a25eSPatrick Williams 326ede0a25eSPatrick WilliamsFor general users of `sdbusplus` these changes should have no impact, except for 327ede0a25eSPatrick Williamsthe availability of new generated exception types and that specialized instances 328ede0a25eSPatrick Williamsof `sdbusplus::exception::generated_exception` will become available in DBus 329ede0a25eSPatrick Williamsclients. 330ede0a25eSPatrick Williams 331ede0a25eSPatrick Williams### `phosphor-dbus-interfaces` 332ede0a25eSPatrick Williams 333ede0a25eSPatrick WilliamsRefactoring will be done to migrate existing `Foo.metadata.yaml` and 334ede0a25eSPatrick Williams`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by 335ede0a25eSPatrick Williamsapplications. Minor changes will take place to utilize the new binding 336ede0a25eSPatrick Williamsgenerators from `sdbusplus`. A small library enhancement will be done to 337ede0a25eSPatrick Williamsregister all generated exception types with `sdbusplus`. Future contributors 338ede0a25eSPatrick Williamswill be able to contribute new error and tracing event definitions. 339ede0a25eSPatrick Williams 340ede0a25eSPatrick Williams### `phosphor-logging` 341ede0a25eSPatrick Williams 342ede0a25eSPatrick Williams> TODO: Should a tracing event be a `Logging.Entry` with severity of 343ede0a25eSPatrick Williams> `Informational` or should they be a new type, such as `Logging.Event` and 344ede0a25eSPatrick Williams> managed separately. The `phosphor-logging` default `meson.options` have 345ede0a25eSPatrick Williams> `error_cap=200` and `error_info_cap=10`. If we increase the total number of 346ede0a25eSPatrick Williams> events allowed to 10K, the majority of them are likely going to be information 347ede0a25eSPatrick Williams> / tracing events. 348ede0a25eSPatrick Williams 349ede0a25eSPatrick WilliamsThe `Logging.Entry` interface's `AdditionalData` property should change to 350ede0a25eSPatrick Williams`dict[string, variant[string,int64_t,size_t,object_path]]`. 351ede0a25eSPatrick Williams 352ede0a25eSPatrick WilliamsThe `Logging.Create` interface will have a new method added: 353ede0a25eSPatrick Williams 354ede0a25eSPatrick Williams```yaml 355ede0a25eSPatrick Williams- name: CreateEntry 356ede0a25eSPatrick Williams parameters: 357ede0a25eSPatrick Williams - name: Message 358ede0a25eSPatrick Williams type: string 359ede0a25eSPatrick Williams - name: Severity 360ede0a25eSPatrick Williams type: enum[Logging.Entry.Level] 361ede0a25eSPatrick Williams - name: AdditionalData 362ede0a25eSPatrick Williams type: dict[string, variant[string,int64_t,size_t,object_path]] 363ede0a25eSPatrick Williams - name: Hint 364ede0a25eSPatrick Williams type: string 365ede0a25eSPatrick Williams default: "" 366ede0a25eSPatrick Williams returns: 367ede0a25eSPatrick Williams - name: Entry 368ede0a25eSPatrick Williams type: object_path 369ede0a25eSPatrick Williams``` 370ede0a25eSPatrick Williams 371ede0a25eSPatrick WilliamsThe `Hint` parameter is used for daemons to be able to query for their 372ede0a25eSPatrick Williamspreviously recorded error, for marking as resolved. These strings need to be 373ede0a25eSPatrick Williamsglobally unique and are suggested to be of the format `"<service_name>:<key>"`. 374ede0a25eSPatrick Williams 375ede0a25eSPatrick WilliamsA `Logging.SearchHint` interface will be created, which will be recorded at the 376ede0a25eSPatrick Williamssame object path as a `Logging.Entry` when the `Hint` parameter was not an empty 377ede0a25eSPatrick Williamsstring: 378ede0a25eSPatrick Williams 379ede0a25eSPatrick Williams```yaml 380ede0a25eSPatrick Williams- property: Hint 381ede0a25eSPatrick Williams type: string 382ede0a25eSPatrick Williams``` 383ede0a25eSPatrick Williams 384ede0a25eSPatrick WilliamsThe `Logging.Manager` interface will be added with a single method: 385ede0a25eSPatrick Williams 386ede0a25eSPatrick Williams```yaml 387ede0a25eSPatrick Williams- name: FindEntry 388ede0a25eSPatrick Williams parameters: 389ede0a25eSPatrick Williams - name: Hint 390ede0a25eSPatrick Williams type: String 391ede0a25eSPatrick Williams returns: 392ede0a25eSPatrick Williams - name: Entry 393ede0a25eSPatrick Williams type: object_path 394ede0a25eSPatrick Williams errors: 395ede0a25eSPatrick Williams - xyz.openbmc_project.Common.ResourceNotFound 396ede0a25eSPatrick Williams``` 397ede0a25eSPatrick Williams 398ede0a25eSPatrick WilliamsA `lg2::commit` API will be added to support the new `sdbusplus` generated 399ede0a25eSPatrick Williamsexception types, calling the new `Logging.Create.CreateEntry` method proposed 400ede0a25eSPatrick Williamsearlier. This new API will support `sdbusplus::bus_t` for synchronous DBus 401ede0a25eSPatrick Williamsoperations and both `sdbusplus::async::context_t` and 402ede0a25eSPatrick Williams`sdbusplus::asio::connection` for asynchronous DBus operations. 403ede0a25eSPatrick Williams 404ede0a25eSPatrick WilliamsThere are outstanding performance concerns with the `phosphor-logging` 405ede0a25eSPatrick Williamsimplementation that may impact the ability for scaling to 10,000 event records. 406ede0a25eSPatrick WilliamsThis issue is expected to be self-contained within `phosphor-logging`, except 407ede0a25eSPatrick Williamsfor potential future changes to the log-retrieval interfaces used by `bmcweb`. 408ede0a25eSPatrick WilliamsIn order to decouple the transition to this design, by callers of the logging 409ede0a25eSPatrick WilliamsAPIs, from the experimentation and improvements in `phosphor-logging`, we will 410ede0a25eSPatrick Williamsadd a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit` 411ede0a25eSPatrick Williamsbehavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same 412ede0a25eSPatrick Williamsapproach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog` 413ede0a25eSPatrick Williamsconfiguration and `bmcweb` support to use these directly. This will allow 414ede0a25eSPatrick Williamssystems which knowingly scale to a large number of event records, using 415ede0a25eSPatrick Williams`rsyslog` mechanics, the same level of performance. One caveat of this support 416ede0a25eSPatrick Williamsis that the hint and resolution behavior will not exist when that option is 417ede0a25eSPatrick Williamsenabled. 418ede0a25eSPatrick Williams 419ede0a25eSPatrick Williams### `bmcweb` 420ede0a25eSPatrick Williams 421ede0a25eSPatrick Williams`bmcweb` already has support for build-time conversion from a Redfish Message 422ede0a25eSPatrick WilliamsRegistry, codified in JSON, to header files it uses to serve the registry; this 423ede0a25eSPatrick Williamswill be expanded to support Redfish Message Registries generated by `sdbusplus`. 424ede0a25eSPatrick Williams`bmcweb` will add a Meson option for additional message registries, provided 425ede0a25eSPatrick Williamsfrom bitbake from `phosphor-dbus-interfaces` and vendor-specific event 426ede0a25eSPatrick Williamsdefinitions as a path to a directory of Message Registry JSONs. Support will 427ede0a25eSPatrick Williamsalso be added for adding `phosphor-dbus-interfaces` as a Meson subproject for 428ede0a25eSPatrick Williamsstand-alone testing. 429ede0a25eSPatrick Williams 430ede0a25eSPatrick WilliamsIt is desirable for `sdbusplus` to generate a Redfish Message Registry directly, 431ede0a25eSPatrick Williamsleveraging the existing scripts for integration with `bmcweb`. As part of this 432ede0a25eSPatrick Williamswe would like to support mapping a `Logging.Entry` event to an existing 433ede0a25eSPatrick Williamsstandardized Redfish event (such as those in the Base registry). The generated 434ede0a25eSPatrick Williamsinformation must contain the `Logging.Entry::Message` identifier, the 435ede0a25eSPatrick Williams`AdditionalData` to `MessageArgs` mapping, and the translation from the 436ede0a25eSPatrick Williams`Message` identifier to the Redfish Message ID (when the Message ID is not from 437ede0a25eSPatrick Williams"this" registry). In order to facilitate this, we will need to add OEM fields to 438ede0a25eSPatrick Williamsthe Redfish Message Registry JSON, which are only used by the `bmcweb` 439ede0a25eSPatrick Williamsprocessing scripts, to generate the information necessary for this additional 440ede0a25eSPatrick Williamsmapping. 441ede0a25eSPatrick Williams 442ede0a25eSPatrick WilliamsThe `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be 443ede0a25eSPatrick Williamsenhanced, to utilize these Message Registries, in four ways: 444ede0a25eSPatrick Williams 445ede0a25eSPatrick Williams1. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned 446ede0a25eSPatrick Williams to the `DiagnosticData` property. 447ede0a25eSPatrick Williams 448ede0a25eSPatrick Williams2. If the `Logging.Entry::Message` contains an identifier corresponding to a 449ede0a25eSPatrick Williams Registry entry, the `MessageId` property will be set to the corresponding 450ede0a25eSPatrick Williams Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used 451ede0a25eSPatrick Williams directly with no further transformation (as is done today). 452ede0a25eSPatrick Williams 453ede0a25eSPatrick Williams3. If the `Logging.Entry::Message` contains an identifier corresponding to a 454ede0a25eSPatrick Williams Registry entry, the `MessageArgs` property will be filled in by obtaining the 455ede0a25eSPatrick Williams corresponding values from the `AdditionalData` dictionary and the `Message` 456ede0a25eSPatrick Williams field will be generated from combining these values with the `Message` string 457ede0a25eSPatrick Williams from the Registry. 458ede0a25eSPatrick Williams 459ede0a25eSPatrick Williams4. A mechanism should be implemented to translate DBus `object_path` references 460ede0a25eSPatrick Williams to Redfish Resource URIs. When an `object_path` cannot be translated, 461ede0a25eSPatrick Williams `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value. 462ede0a25eSPatrick Williams 463ede0a25eSPatrick WilliamsThe implementation of `EventService` should be enhanced to support 464ede0a25eSPatrick Williams`phosphor-logging` hosted events. The implementation of `LogService` should be 465ede0a25eSPatrick Williamsenhanced to support log paging for `phosphor-logging` hosted events. 466ede0a25eSPatrick Williams 467ede0a25eSPatrick Williams### `phosphor-sel-logger` 468ede0a25eSPatrick Williams 469ede0a25eSPatrick WilliamsThe `phosphor-sel-logger` has a meson option `send-to-logger` which toggles 470ede0a25eSPatrick Williamsbetween using `phosphor-logging` or the [`REDFISH_MESSAGE_ID` 471ede0a25eSPatrick Williamsmechanism][existing-design]. The `phosphor-logging`-utilizing paths will be 472ede0a25eSPatrick Williamsupdated to utilize `phosphor-dbus-interfaces` specified errors and events. 473ede0a25eSPatrick Williams 474ede0a25eSPatrick Williams### YAML format 475ede0a25eSPatrick Williams 476ede0a25eSPatrick WilliamsConsider an example file in `phosphor-dbus-interfaces` as 477ede0a25eSPatrick Williams`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors 478ede0a25eSPatrick Williamsand events: 479ede0a25eSPatrick Williams 480ede0a25eSPatrick Williams```yaml 481ede0a25eSPatrick Williamsversion: 1.3.1 482ede0a25eSPatrick Williams 483ede0a25eSPatrick Williamserrors: 484ede0a25eSPatrick Williams - name: UpdateFailure 485ede0a25eSPatrick Williams severity: critical 486ede0a25eSPatrick Williams metadata: 487ede0a25eSPatrick Williams - name: TARGET 488ede0a25eSPatrick Williams type: string 489ede0a25eSPatrick Williams primary: true 490ede0a25eSPatrick Williams - name: ERRNO 491ede0a25eSPatrick Williams type: int64 492ede0a25eSPatrick Williams - name: CALLOUT_HARDWARE 493ede0a25eSPatrick Williams type: object_path 494ede0a25eSPatrick Williams primary: true 495ede0a25eSPatrick Williams en: 496ede0a25eSPatrick Williams description: While updating the firmware on a device, the update failed. 497ede0a25eSPatrick Williams message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}. 498ede0a25eSPatrick Williams resolution: Retry update. 499ede0a25eSPatrick Williams 500ede0a25eSPatrick Williams - name: BMCUpdateFailure 501ede0a25eSPatrick Williams severity: critical 502ede0a25eSPatrick Williams deprecated: 1.0.0 503ede0a25eSPatrick Williams en: 504ede0a25eSPatrick Williams description: Failed to update the BMC 505ede0a25eSPatrick Williams redfish-mapping: OpenBMC.FirmwareUpdateFailed 506ede0a25eSPatrick Williams 507ede0a25eSPatrick Williamsevents: 508ede0a25eSPatrick Williams - name: UpdateProgress 509ede0a25eSPatrick Williams metadata: 510ede0a25eSPatrick Williams - name: TARGET 511ede0a25eSPatrick Williams type: string 512ede0a25eSPatrick Williams primary: true 513ede0a25eSPatrick Williams - name: COMPLETION 514ede0a25eSPatrick Williams type: double 515ede0a25eSPatrick Williams primary: true 516ede0a25eSPatrick Williams en: 517ede0a25eSPatrick Williams description: An update is in progress and has reached a checkpoint. 518ede0a25eSPatrick Williams message: Updating of {TARGET} is {COMPLETION}% complete. 519ede0a25eSPatrick Williams``` 520ede0a25eSPatrick Williams 521ede0a25eSPatrick WilliamsEach `foo.events.yaml` file would be used to generate both the C++ classes (via 522ede0a25eSPatrick Williams`sdbusplus`) for exception handling and event reporting, as well as a versioned 523*9a0248b5SPatrick WilliamsRedfish Message Registry for the errors and events. The [YAML 524*9a0248b5SPatrick Williamsschema][yaml-schema] is contained in the sdbusplus repository. 525ede0a25eSPatrick Williams 526ede0a25eSPatrick WilliamsThe above example YAML would generate C++ classes similar to: 527ede0a25eSPatrick Williams 528ede0a25eSPatrick Williams```cpp 529ede0a25eSPatrick Williamsnamespace sdbusplus::errors::xyz::openbmc_project::software::update 530ede0a25eSPatrick Williams{ 531ede0a25eSPatrick Williams 532ede0a25eSPatrick Williamsclass UpdateFailure 533ede0a25eSPatrick Williams{ 534ede0a25eSPatrick Williams 535ede0a25eSPatrick Williams template <typename... Args> 536ede0a25eSPatrick Williams UpdateFailure(Args&&... args); 537ede0a25eSPatrick Williams}; 538ede0a25eSPatrick Williams 539ede0a25eSPatrick Williams} 540ede0a25eSPatrick Williams 541ede0a25eSPatrick Williamsnamespace sdbusplus::events::xyz::openbmc_project::software::update 542ede0a25eSPatrick Williams{ 543ede0a25eSPatrick Williams 544ede0a25eSPatrick Williamsclass UpdateProgress 545ede0a25eSPatrick Williams{ 546ede0a25eSPatrick Williams template <typename... Args> 547ede0a25eSPatrick Williams UpdateProgress(Args&&... args); 548ede0a25eSPatrick Williams}; 549ede0a25eSPatrick Williams 550ede0a25eSPatrick Williams} 551ede0a25eSPatrick Williams``` 552ede0a25eSPatrick Williams 553ede0a25eSPatrick WilliamsThe constructors here are variadic templates because the generated constructor 554ede0a25eSPatrick Williamsimplementation will provide compile-time assurance that all of the metadata 555ede0a25eSPatrick Williamsfields have been populated (in any order). To raise an `UpdateFailure` a 556ede0a25eSPatrick Williamsdevelopers might do something like: 557ede0a25eSPatrick Williams 558ede0a25eSPatrick Williams```cpp 559ede0a25eSPatrick Williams// Immediately report the event: 560ede0a25eSPatrick Williamslg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path)); 561ede0a25eSPatrick Williams// or send it in a dbus response (when using sdbusplus generated binding): 562ede0a25eSPatrick Williamsthrow UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path); 563ede0a25eSPatrick Williams``` 564ede0a25eSPatrick Williams 565ede0a25eSPatrick WilliamsIf one of the fields, such as `ERRNO` were omitted, a compile failure will be 566ede0a25eSPatrick Williamsraised indicating the first missing field. 567ede0a25eSPatrick Williams 568*9a0248b5SPatrick Williams[yaml-schema]: 569*9a0248b5SPatrick Williams https://github.com/openbmc/sdbusplus/blob/master/tools/sdbusplus/schemas/events.schema.yaml 570*9a0248b5SPatrick Williams 571ede0a25eSPatrick Williams### Versioning Policy 572ede0a25eSPatrick Williams 573ede0a25eSPatrick WilliamsAssume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention. 574ede0a25eSPatrick Williams 575ede0a25eSPatrick Williams- Adjusting a description or message should result in a `PATCH` increment. 576ede0a25eSPatrick Williams- Adding a new error or event, or adding metadata to an existing error or event, 577ede0a25eSPatrick Williams should result in a `MINOR` increment. 578ede0a25eSPatrick Williams- Deprecating an error or event should result in a `MAJOR` increment. 579ede0a25eSPatrick Williams 580ede0a25eSPatrick WilliamsThere is [guidance on maintenance][registry-guidance] of the OpenBMC Message 581ede0a25eSPatrick WilliamsRegistry. We will incorporate that guidance into the equivalent 582ede0a25eSPatrick Williams`phosphor-dbus-interfaces` policy. 583ede0a25eSPatrick Williams 584ede0a25eSPatrick Williams[registry-guidance]: 585ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md 586ede0a25eSPatrick Williams 587ede0a25eSPatrick Williams### Generated Redfish Message Registry 588ede0a25eSPatrick Williams 589ede0a25eSPatrick Williams[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish 590ede0a25eSPatrick WilliamsMessage Registries and dictates guidelines for identifiers. 591ede0a25eSPatrick Williams 592ede0a25eSPatrick WilliamsThe hypothetical events defined above would create a message registry similar 593ede0a25eSPatrick Williamsto: 594ede0a25eSPatrick Williams 595ede0a25eSPatrick Williams```json 596ede0a25eSPatrick Williams{ 597ede0a25eSPatrick Williams "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1", 598ede0a25eSPatrick Williams "Language": "en", 599ede0a25eSPatrick Williams "Messages": { 600ede0a25eSPatrick Williams "UpdateFailure": { 601ede0a25eSPatrick Williams "Description": "While updating the firmware on a device, the update failed.", 602ede0a25eSPatrick Williams "Message": "A failure occurred updating %1 on %2.", 603ede0a25eSPatrick Williams "Resolution": "Retry update." 604ede0a25eSPatrick Williams "NumberOfArgs": 2, 605ede0a25eSPatrick Williams "ParamTypes": ["string", "string"], 606ede0a25eSPatrick Williams "Severity": "Critical", 607ede0a25eSPatrick Williams }, 608ede0a25eSPatrick Williams "UpdateProgress" : { 609ede0a25eSPatrick Williams "Description": "An update is in progress and has reached a checkpoint." 610ede0a25eSPatrick Williams "Message": "Updating of %1 is %2\% complete.", 611ede0a25eSPatrick Williams "Resolution": "None", 612ede0a25eSPatrick Williams "NumberOfArgs": 2, 613ede0a25eSPatrick Williams "ParamTypes": ["string", "number"], 614ede0a25eSPatrick Williams "Severity": "OK", 615ede0a25eSPatrick Williams } 616ede0a25eSPatrick Williams } 617ede0a25eSPatrick Williams} 618ede0a25eSPatrick Williams``` 619ede0a25eSPatrick Williams 620ede0a25eSPatrick WilliamsThe prefix `OpenBMC_Base` shall be exclusively reserved for use by events from 621ede0a25eSPatrick Williams`phosphor-logging`. Events defined in other repositories will be expected to use 622ede0a25eSPatrick Williamssome other prefix. Vendor-defined repositories should use a vendor-owned prefix 623ede0a25eSPatrick Williamsas directed by [DSP0266][dsp0266]. 624ede0a25eSPatrick Williams 625ede0a25eSPatrick Williams[dsp0266]: 626ede0a25eSPatrick Williams https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf 627ede0a25eSPatrick Williams 628ede0a25eSPatrick Williams### Vendor implications 629ede0a25eSPatrick Williams 630ede0a25eSPatrick WilliamsAs specified above, vendors must use their own identifiers in order to conform 631ede0a25eSPatrick Williamswith the Redfish specification (see [DSP0266][dsp0266] for requirements on 632ede0a25eSPatrick Williamsidentifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`) 633ede0a25eSPatrick Williamsimplementation(s) will enable vendors to create their own events for downstream 634ede0a25eSPatrick Williamscode and Registries for integration with Redfish, by creating downstream 635ede0a25eSPatrick Williamsrepositories of error definitions. Vendors are responsible for ensuring their 636ede0a25eSPatrick Williamsown versioning and identifiers conform to the expectations in the [Redfish 637ede0a25eSPatrick Williamsspecification][dsp0266]. 638ede0a25eSPatrick Williams 639ede0a25eSPatrick WilliamsOne potential bad behavior on the part of vendors would be forking and modifying 640ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events. Vendors must not add their own events 641ede0a25eSPatrick Williamsto `phosphor-dbus-interfaces` in downstream implementations because it would 642ede0a25eSPatrick Williamslead to their implementation advertising support for a message in an 643ede0a25eSPatrick WilliamsOpenBMC-owned Registry which is not the case, but they should add them to their 644ede0a25eSPatrick Williamsown repositories with a separate identifier. Similarly, if a vendor were to 645ede0a25eSPatrick Williams_backport_ upstream changes into their fork, they would need to ensure that the 646ede0a25eSPatrick Williams`foo.events.yaml` file for that version matches identically with the upstream 647ede0a25eSPatrick Williamsimplementation. 648ede0a25eSPatrick Williams 649ede0a25eSPatrick Williams## Alternatives Considered 650ede0a25eSPatrick Williams 651ede0a25eSPatrick WilliamsMany alternatives have been explored and referenced through earlier work. Within 652ede0a25eSPatrick Williamsthis proposal there are many minor-alternatives that have been assessed. 653ede0a25eSPatrick Williams 654ede0a25eSPatrick Williams### Exception inheritance 655ede0a25eSPatrick Williams 656ede0a25eSPatrick WilliamsThe original `phosphor-logging` error descriptions allowed inheritance between 657ede0a25eSPatrick Williamstwo errors. This is not supported by the proposal for two reasons: 658ede0a25eSPatrick Williams 659ede0a25eSPatrick Williams- This introduces complexity in the Redfish Message Registry versioning because 660ede0a25eSPatrick Williams a change in one file should induce version changes in all dependent files. 661ede0a25eSPatrick Williams 662ede0a25eSPatrick Williams- It makes it difficult for a developer to clearly identify all of the fields 663ede0a25eSPatrick Williams they are expected to populate without traversing multiple files. 664ede0a25eSPatrick Williams 665ede0a25eSPatrick Williams### sdbusplus Exception APIs 666ede0a25eSPatrick Williams 667ede0a25eSPatrick WilliamsThere are a few possible syntaxes I came up with for constructing the generated 668ede0a25eSPatrick Williamsexception types. It is important that these have good ergonomics, are easy to 669ede0a25eSPatrick Williamsunderstand, and can provide compile-time awareness of missing metadata fields. 670ede0a25eSPatrick Williams 671ede0a25eSPatrick Williams```cpp 672ede0a25eSPatrick Williams using Example = sdbusplus::error::xyz::openbmc_project::Example; 673ede0a25eSPatrick Williams 674ede0a25eSPatrick Williams // 1) 675ede0a25eSPatrick Williams throw Example().fru("Motherboard").value(42); 676ede0a25eSPatrick Williams 677ede0a25eSPatrick Williams // 2) 678ede0a25eSPatrick Williams throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42); 679ede0a25eSPatrick Williams 680ede0a25eSPatrick Williams // 3) 681ede0a25eSPatrick Williams throw Example("FRU", "Motherboard", "VALUE", 42); 682ede0a25eSPatrick Williams 683ede0a25eSPatrick Williams // 4) 684ede0a25eSPatrick Williams throw Example([](auto e) { return e.fru("Motherboard").value(42); }); 685ede0a25eSPatrick Williams 686ede0a25eSPatrick Williams // 5) 687ede0a25eSPatrick Williams throw Example({.fru = "Motherboard", .value = 42}); 688ede0a25eSPatrick Williams``` 689ede0a25eSPatrick Williams 690ede0a25eSPatrick Williams**Note**: These examples are all show using `throw` syntax, but could also be 691ede0a25eSPatrick Williamssaved in local variables, returned from functions, or immediately passed to 692ede0a25eSPatrick Williams`lg2::commit`. 693ede0a25eSPatrick Williams 694ede0a25eSPatrick Williams1. This would be my preference for ergonomics and clarity, as it would allow 695ede0a25eSPatrick Williams LSP-enabled editors to give completions for the metadata fields but 696ede0a25eSPatrick Williams unfortunately there is no mechanism in C++ to define a type which can be 697ede0a25eSPatrick Williams constructed but not thrown, which means we cannot get compile-time checking 698ede0a25eSPatrick Williams of all metadata fields. 699ede0a25eSPatrick Williams 700ede0a25eSPatrick Williams2. This syntax uses tag-dispatch to enables compile-time checking of all 701ede0a25eSPatrick Williams metadata fields and potential LSP-completion of the tag-types, but is more 702ede0a25eSPatrick Williams verbose than option 3. 703ede0a25eSPatrick Williams 704ede0a25eSPatrick Williams3. This syntax is less verbose than (2) and follows conventions already used in 705ede0a25eSPatrick Williams `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the 706ede0a25eSPatrick Williams metadata tags. 707ede0a25eSPatrick Williams 708ede0a25eSPatrick Williams4. This syntax is similar to option (1) but uses an indirection of a lambda to 709ede0a25eSPatrick Williams enable compile-time checking that all metadata fields have been populated by 710ede0a25eSPatrick Williams the lambda. The LSP-completion is likely not as strong as option (1), due to 711ede0a25eSPatrick Williams the use of `auto`, and the lambda necessity will likely be a hang-up for 712ede0a25eSPatrick Williams unfamiliar developers. 713ede0a25eSPatrick Williams 714ede0a25eSPatrick Williams5. This syntax has similar characteristics as option (1) but similarly does not 715ede0a25eSPatrick Williams provide compile-time confirmation that all fields have been populated. 716ede0a25eSPatrick Williams 717ede0a25eSPatrick WilliamsThe proposal therefore suggests option (3) is most suitable. 718ede0a25eSPatrick Williams 719ede0a25eSPatrick Williams### Redfish Translation Support 720ede0a25eSPatrick Williams 721ede0a25eSPatrick WilliamsThe proposed YAML format allows future addition of translation but it is not 722ede0a25eSPatrick Williamsenabled at this time. Future development could enable the Redfish Message 723ede0a25eSPatrick WilliamsRegistry to be generated in multiple languages if the `message:language` exists 724ede0a25eSPatrick Williamsfor those languages. 725ede0a25eSPatrick Williams 726ede0a25eSPatrick Williams### Redfish Registry Versioning 727ede0a25eSPatrick Williams 728ede0a25eSPatrick WilliamsThe Redfish Message Registries are required to be versioned and has 3 digit 729ede0a25eSPatrick Williamsfields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the 730ede0a25eSPatrick WilliamsMessage ID. Rather than using the manually specified version we could take a few 731ede0a25eSPatrick Williamsother approaches: 732ede0a25eSPatrick Williams 733ede0a25eSPatrick Williams- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the 734ede0a25eSPatrick Williams registry was built. 735ede0a25eSPatrick Williams 736ede0a25eSPatrick Williams - This does not cover vendors that may choose to branch for stabilization 737ede0a25eSPatrick Williams purposes, so we can end up with two machines having the same 738ede0a25eSPatrick Williams OpenBMC-versioned message registry with different content. 739ede0a25eSPatrick Williams 740ede0a25eSPatrick Williams- Use the most recent `openbmc/openbmc` tag as the version. 741ede0a25eSPatrick Williams 742ede0a25eSPatrick Williams - This does not cover vendors that build off HEAD and may deploy multiple 743ede0a25eSPatrick Williams images between two OpenBMC releases. 744ede0a25eSPatrick Williams 745ede0a25eSPatrick Williams- Generate the version based on the git-history. 746ede0a25eSPatrick Williams 747ede0a25eSPatrick Williams - This requires `phosphor-dbus-interfaces` to be built from a git repository, 748ede0a25eSPatrick Williams which may not always be true for Yocto source mirrors, and requires 749ede0a25eSPatrick Williams non-trivial processing that continues to scale over time. 750ede0a25eSPatrick Williams 751ede0a25eSPatrick Williams### Existing OpenBMC Redfish Registry 752ede0a25eSPatrick Williams 753ede0a25eSPatrick WilliamsThere are currently 191 messages defined in the existing Redfish Message 754ede0a25eSPatrick WilliamsRegistry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase 755ede0a25eSPatrick Williamsis emitted with the correct version. 96 of those are only emitted by 756ede0a25eSPatrick WilliamsIntel-specific code that is not pulled into any upstreamed machine, 39 are 757ede0a25eSPatrick Williamsemitted by potentially common code, and 56 are not even referenced in the 758ede0a25eSPatrick Williamscodebase outside of the bmcweb registry. Of the 39 common messages half of them 759ede0a25eSPatrick Williamshave an equivalent in one of the standard registries that should be leveraged 760ede0a25eSPatrick Williamsand many of the others do not have attributes that would facilitate a multi-host 761ede0a25eSPatrick Williamsconfiguration, so the registry at a minimum needs to be updated. None of the 762ede0a25eSPatrick Williamscurrent implementation has the capability to handle Redfish Resource URIs. 763ede0a25eSPatrick Williams 764ede0a25eSPatrick WilliamsThe proposal therefore is to deprecate the existing registry and replace it with 765ede0a25eSPatrick Williamsthe new generated registries. For repositories that currently emit events in the 766ede0a25eSPatrick Williamsexisting format, we can maintain those call-sites for a time period of 1-2 767ede0a25eSPatrick Williamsyears. 768ede0a25eSPatrick Williams 769ede0a25eSPatrick WilliamsIf this aspect of the proposal is rejected, the YAML format allows mapping from 770ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0` 771ede0a25eSPatrick Williamsregistry `MessageIds`. 772ede0a25eSPatrick Williams 773ede0a25eSPatrick WilliamsPotentially common: 774ede0a25eSPatrick Williams 775ede0a25eSPatrick Williams- phosphor-post-code-manager 776ede0a25eSPatrick Williams - BIOSPOSTCode (unique) 777ede0a25eSPatrick Williams- dbus-sensors 778ede0a25eSPatrick Williams - ChassisIntrusionDetected (unique) 779ede0a25eSPatrick Williams - ChassisIntrusionReset (unique) 780ede0a25eSPatrick Williams - FanInserted 781ede0a25eSPatrick Williams - FanRedundancyLost (unique) 782ede0a25eSPatrick Williams - FanRedudancyRegained (unique) 783ede0a25eSPatrick Williams - FanRemoved 784ede0a25eSPatrick Williams - LanLost 785ede0a25eSPatrick Williams - LanRegained 786ede0a25eSPatrick Williams - PowerSupplyConfigurationError (unique) 787ede0a25eSPatrick Williams - PowerSupplyConfigurationErrorRecovered (unique) 788ede0a25eSPatrick Williams - PowerSupplyFailed 789ede0a25eSPatrick Williams - PowerSupplyFailurePredicted (unique) 790ede0a25eSPatrick Williams - PowerSupplyFanFailed 791ede0a25eSPatrick Williams - PowerSupplyFanRecovered 792ede0a25eSPatrick Williams - PowerSupplyPowerLost 793ede0a25eSPatrick Williams - PowerSupplyPowerRestored 794ede0a25eSPatrick Williams - PowerSupplyPredictiedFailureRecovered (unique) 795ede0a25eSPatrick Williams - PowerSupplyRecovered 796ede0a25eSPatrick Williams- phosphor-sel-logger 797ede0a25eSPatrick Williams - IPMIWatchdog (unique) 798ede0a25eSPatrick Williams - `SensorThreshold*` : 8 different events 799ede0a25eSPatrick Williams- phosphor-net-ipmid 800ede0a25eSPatrick Williams - InvalidLoginAttempted (unique) 801ede0a25eSPatrick Williams- entity-manager 802ede0a25eSPatrick Williams - InventoryAdded (unique) 803ede0a25eSPatrick Williams - InventoryRemoved (unique) 804ede0a25eSPatrick Williams- estoraged 805ede0a25eSPatrick Williams - ServiceStarted 806ede0a25eSPatrick Williams- x86-power-control 807ede0a25eSPatrick Williams - NMIButtonPressed (unique) 808ede0a25eSPatrick Williams - NMIDiagnosticInterrupt (unique) 809ede0a25eSPatrick Williams - PowerButtonPressed (unique) 810ede0a25eSPatrick Williams - PowerRestorePolicyApplied (unique) 811ede0a25eSPatrick Williams - PowerSupplyPowerGoodFailed (unique) 812ede0a25eSPatrick Williams - ResetButtonPressed (unique) 813ede0a25eSPatrick Williams - SystemPowerGoodFailed (unique) 814ede0a25eSPatrick Williams 815ede0a25eSPatrick WilliamsIntel-only implementations: 816ede0a25eSPatrick Williams 817ede0a25eSPatrick Williams- intel-ipmi-oem 818ede0a25eSPatrick Williams - ADDDCCorrectable 819ede0a25eSPatrick Williams - BIOSPostERROR 820ede0a25eSPatrick Williams - BIOSRecoveryComplete 821ede0a25eSPatrick Williams - BIOSRecoveryStart 822ede0a25eSPatrick Williams - FirmwareUpdateCompleted 823ede0a25eSPatrick Williams - IntelUPILinkWidthReducedToHalf 824ede0a25eSPatrick Williams - IntelUPILinkWidthReducedToQuarter 825ede0a25eSPatrick Williams - LegacyPCIPERR 826ede0a25eSPatrick Williams - LegacyPCISERR 827ede0a25eSPatrick Williams - `ME*` : 29 different events 828ede0a25eSPatrick Williams - `Memory*` : 9 different events 829ede0a25eSPatrick Williams - MirroringRedundancyDegraded 830ede0a25eSPatrick Williams - MirroringRedundancyFull 831ede0a25eSPatrick Williams - `PCIeCorrectable*`, `PCIeFatal` : 29 different events 832ede0a25eSPatrick Williams - SELEntryAdded 833ede0a25eSPatrick Williams - SparingRedundancyDegraded 834ede0a25eSPatrick Williams- pfr-manager 835ede0a25eSPatrick Williams - BIOSFirmwareRecoveryReason 836ede0a25eSPatrick Williams - BIOSFirmwarePanicReason 837ede0a25eSPatrick Williams - BMCFirmwarePanicReason 838ede0a25eSPatrick Williams - BMCFirmwareRecoveryReason 839ede0a25eSPatrick Williams - BMCFirmwareResiliencyError 840ede0a25eSPatrick Williams - CPLDFirmwarePanicReason 841ede0a25eSPatrick Williams - CPLDFirmwareResilencyError 842ede0a25eSPatrick Williams - FirmwareResiliencyError 843ede0a25eSPatrick Williams- host-error-monitor 844ede0a25eSPatrick Williams - CPUError 845ede0a25eSPatrick Williams - CPUMismatch 846ede0a25eSPatrick Williams - CPUThermalTrip 847ede0a25eSPatrick Williams - ComponentOverTemperature 848ede0a25eSPatrick Williams - SsbThermalTrip 849ede0a25eSPatrick Williams - VoltageRegulatorOverheated 850ede0a25eSPatrick Williams- s2600wf-misc 851ede0a25eSPatrick Williams - DriveError 852ede0a25eSPatrick Williams - InventoryAdded 853ede0a25eSPatrick Williams 854ede0a25eSPatrick Williams## Impacts 855ede0a25eSPatrick Williams 856ede0a25eSPatrick Williams- New APIs are defined for error and event logging. This will deprecate existing 857ede0a25eSPatrick Williams `phosphor-logging` APIs, with a time to migrate, for error reporting. 858ede0a25eSPatrick Williams 859ede0a25eSPatrick Williams- The design should improve performance by eliminating the regular parsing of 860ede0a25eSPatrick Williams the `systemd` journal. The design may decrease performance by allowing the 861ede0a25eSPatrick Williams number of error and event logs to be dramatically increased, which have an 862ede0a25eSPatrick Williams impact to file system utilization and potential for DBus impacts some services 863ede0a25eSPatrick Williams such as `ObjectMapper`. 864ede0a25eSPatrick Williams 865ede0a25eSPatrick Williams- Backwards compatibility and documentation should be improved by the automatic 866ede0a25eSPatrick Williams generation of the Redfish Message Registry corresponding to all error and 867ede0a25eSPatrick Williams event reports. 868ede0a25eSPatrick Williams 869ede0a25eSPatrick Williams### Organizational 870ede0a25eSPatrick Williams 871ede0a25eSPatrick Williams- **Does this repository require a new repository?** 872ede0a25eSPatrick Williams - No 873ede0a25eSPatrick Williams- **Who will be the initial maintainer(s) of this repository?** 874ede0a25eSPatrick Williams - N/A 875ede0a25eSPatrick Williams- **Which repositories are expected to be modified to execute this design?** 876ede0a25eSPatrick Williams - `sdbusplus` 877ede0a25eSPatrick Williams - `phosphor-dbus-interfaces` 878ede0a25eSPatrick Williams - `phosphor-logging` 879ede0a25eSPatrick Williams - `bmcweb` 880ede0a25eSPatrick Williams - Any repository creating an error or event. 881ede0a25eSPatrick Williams 882ede0a25eSPatrick Williams## Testing 883ede0a25eSPatrick Williams 884ede0a25eSPatrick Williams- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error 885ede0a25eSPatrick Williams and event generation, creation APIs, and to provide coverage on any changes to 886ede0a25eSPatrick Williams the `Logging.Entry` object management. 887ede0a25eSPatrick Williams 888ede0a25eSPatrick Williams- Unit tests will be written for `bmcweb` for basic `Logging.Entry` 889ede0a25eSPatrick Williams transformation and Message Registry generation. 890ede0a25eSPatrick Williams 891ede0a25eSPatrick Williams- Integration tests should be leveraged (and enhanced as necessary) from 892ede0a25eSPatrick Williams `openbmc-test-automation` to cover the end-to-end error creation and Redfish 893ede0a25eSPatrick Williams reporting. 894