1*ede0a25eSPatrick Williams# Error and Event Logging 2*ede0a25eSPatrick Williams 3*ede0a25eSPatrick WilliamsAuthor: [Patrick Williams][patrick-email] `<stwcx>` 4*ede0a25eSPatrick Williams 5*ede0a25eSPatrick Williams[patrick-email]: mailto:patrick@stwcx.xyz 6*ede0a25eSPatrick Williams 7*ede0a25eSPatrick WilliamsOther contributors: 8*ede0a25eSPatrick Williams 9*ede0a25eSPatrick WilliamsCreated: May 16, 2024 10*ede0a25eSPatrick Williams 11*ede0a25eSPatrick Williams## Problem Description 12*ede0a25eSPatrick Williams 13*ede0a25eSPatrick WilliamsThere is currently not a consistent end-to-end error and event reporting design 14*ede0a25eSPatrick Williamsfor the OpenBMC code stack. There are two different implementations, one 15*ede0a25eSPatrick Williamsprimarily using phosphor-logging and one using rsyslog, both of which have gaps 16*ede0a25eSPatrick Williamsthat a complete solution should address. This proposal is intended to be an 17*ede0a25eSPatrick Williamsend-to-end design handling both errors and tracing events which facilitate 18*ede0a25eSPatrick Williamsexternal management of the system in an automated and maintainable manner. 19*ede0a25eSPatrick Williams 20*ede0a25eSPatrick Williams## Background and References 21*ede0a25eSPatrick Williams 22*ede0a25eSPatrick Williams### Redfish LogEntry and Message Registry 23*ede0a25eSPatrick Williams 24*ede0a25eSPatrick WilliamsIn Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that 25*ede0a25eSPatrick Williamscould be considered "logs", but one such use within OpenBMC is for an equivalent 26*ede0a25eSPatrick Williamsof the IPMI "System Event Log (SEL)". 27*ede0a25eSPatrick Williams 28*ede0a25eSPatrick WilliamsThe IPMI SEL is the location where the BMC can collect errors and events, 29*ede0a25eSPatrick Williamssometimes coming from other entities, such as the BIOS. Examples of these might 30*ede0a25eSPatrick Williamsbe "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful". 31*ede0a25eSPatrick WilliamsThese SEL records are exposed as human readable strings, either natively by a 32*ede0a25eSPatrick WilliamsOEM SEL design or by tools such as `ipmitool`, which are typically unique to 33*ede0a25eSPatrick Williamseach system or manufacturer, and could hypothethically change with a BMC or 34*ede0a25eSPatrick Williamsfirmware update, and are thus difficult to create automated tooling around. Two 35*ede0a25eSPatrick Williamsdifferent vendors might use different strings to represent a critical 36*ede0a25eSPatrick Williamstemperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example] 37*ede0a25eSPatrick Williamsand ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is 38*ede0a25eSPatrick Williamsalso no mechanism with IPMI to ask the machine "what are all of the SELs you 39*ede0a25eSPatrick Williamsmight create". 40*ede0a25eSPatrick Williams 41*ede0a25eSPatrick WilliamsIn order to solve two aspects of this problem, listing of possible events and 42*ede0a25eSPatrick Williamsversioning, Redfish has Message Registries. A message registry is a versioned 43*ede0a25eSPatrick Williamscollection of all of the error events that a system could generate and hints as 44*ede0a25eSPatrick Williamsto how they might be parsed and displayed to a user. An [informative 45*ede0a25eSPatrick Williamsreference][Registry-Example] from the DMTF gives this example: 46*ede0a25eSPatrick Williams 47*ede0a25eSPatrick Williams```json 48*ede0a25eSPatrick Williams{ 49*ede0a25eSPatrick Williams "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry", 50*ede0a25eSPatrick Williams "Id": "Alert.1.0.0", 51*ede0a25eSPatrick Williams "RegistryPrefix": "Alert", 52*ede0a25eSPatrick Williams "RegistryVersion": "1.0.0", 53*ede0a25eSPatrick Williams "Messages": { 54*ede0a25eSPatrick Williams "LanDisconnect": { 55*ede0a25eSPatrick Williams "Description": "A LAN Disconnect on %1 was detected on system %2.", 56*ede0a25eSPatrick Williams "Message": "A LAN Disconnect on %1 was detected on system %2.", 57*ede0a25eSPatrick Williams "Severity": "Warning", 58*ede0a25eSPatrick Williams "NumberOfArgs": 2, 59*ede0a25eSPatrick Williams "Resolution": "None" 60*ede0a25eSPatrick Williams } 61*ede0a25eSPatrick Williams } 62*ede0a25eSPatrick Williams} 63*ede0a25eSPatrick Williams``` 64*ede0a25eSPatrick Williams 65*ede0a25eSPatrick WilliamsThis example defines an event, `Alert.1.0.LanDisconnect`, which can record the 66*ede0a25eSPatrick Williamsdisconnect state of a network device and contains placeholders for the affected 67*ede0a25eSPatrick Williamsdevice and system. When this event occurs, there might be a `LogEntry` recorded 68*ede0a25eSPatrick Williamscontaining something like: 69*ede0a25eSPatrick Williams 70*ede0a25eSPatrick Williams```json 71*ede0a25eSPatrick Williams{ 72*ede0a25eSPatrick Williams "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.", 73*ede0a25eSPatrick Williams "MessageId": "Alert.1.0.LanDisconnect", 74*ede0a25eSPatrick Williams "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"] 75*ede0a25eSPatrick Williams} 76*ede0a25eSPatrick Williams``` 77*ede0a25eSPatrick Williams 78*ede0a25eSPatrick WilliamsThe `Message` contains a human readable string which was created by applying the 79*ede0a25eSPatrick Williams`MessageArgs` to the placeholders from the `Message` field in the registry. 80*ede0a25eSPatrick WilliamsSystem management software can rely on the message registry (referenced from the 81*ede0a25eSPatrick Williams`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to 82*ede0a25eSPatrick Williamsperform string processing for reacting to the event. 83*ede0a25eSPatrick Williams 84*ede0a25eSPatrick WilliamsWithin OpenBMC, there is currently a [limited design][existing-design] for this 85*ede0a25eSPatrick WilliamsRedfish feature and it requires inserting specially formed Redfish-specific 86*ede0a25eSPatrick Williamslogging messages into any application that wants to record these events, tightly 87*ede0a25eSPatrick Williamscoupling all applications to the Redfish implementation. It has also been 88*ede0a25eSPatrick Williamsobserved that these [strings][app-example], when used, are often out of date 89*ede0a25eSPatrick Williamswith the [message registry][registry-example] advertised by `bmcweb`. Some 90*ede0a25eSPatrick Williamsmaintainers have rejected adding new Redfish-specific logging messages to their 91*ede0a25eSPatrick Williamsapplications. 92*ede0a25eSPatrick Williams 93*ede0a25eSPatrick Williams[LogEntry]: 94*ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json 95*ede0a25eSPatrick Williams[HPE-Example]: 96*ede0a25eSPatrick Williams https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html 97*ede0a25eSPatrick Williams[Oracle-Example]: 98*ede0a25eSPatrick Williams https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068 99*ede0a25eSPatrick Williams[Registry-Example]: 100*ede0a25eSPatrick Williams https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf 101*ede0a25eSPatrick Williams[existing-design]: 102*ede0a25eSPatrick Williams https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md 103*ede0a25eSPatrick Williams[app-example]: 104*ede0a25eSPatrick Williams https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143 105*ede0a25eSPatrick Williams[registry-example]: 106*ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5 107*ede0a25eSPatrick Williams 108*ede0a25eSPatrick Williams### Existing phosphor-logging implementation 109*ede0a25eSPatrick Williams 110*ede0a25eSPatrick Williams**Note**: While the word 'exception' is used in this section, the existing (and 111*ede0a25eSPatrick Williamsproposed) types can be used by applications and execution contexts with 112*ede0a25eSPatrick Williamsexceptions disabled. They are 'exceptions' because they do inherit from 113*ede0a25eSPatrick Williams`std::exception` and there is support in the `sdbusplus` bindings for them to be 114*ede0a25eSPatrick Williamsused in exception handling. 115*ede0a25eSPatrick Williams 116*ede0a25eSPatrick WilliamsThe `sdbusplus` bindings have the capability to define new C++ exception types 117*ede0a25eSPatrick Williamswhich can be thrown by a DBus server and turned into an error response to the 118*ede0a25eSPatrick Williamsclient. `phosphor-logging` extended this to also add metadata associated to the 119*ede0a25eSPatrick Williamslog type. See the following example error definitions and usages. 120*ede0a25eSPatrick Williams 121*ede0a25eSPatrick Williams`sdbusplus` error binding definition (in 122*ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.errors.yaml`): 123*ede0a25eSPatrick Williams 124*ede0a25eSPatrick Williams```yaml 125*ede0a25eSPatrick Williams- name: InvalidCertificate 126*ede0a25eSPatrick Williams description: Invalid certificate file. 127*ede0a25eSPatrick Williams``` 128*ede0a25eSPatrick Williams 129*ede0a25eSPatrick Williams`phosphor-logging` metadata definition (in 130*ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.metadata.yaml`): 131*ede0a25eSPatrick Williams 132*ede0a25eSPatrick Williams```yaml 133*ede0a25eSPatrick Williams- name: InvalidCertificate 134*ede0a25eSPatrick Williams meta: 135*ede0a25eSPatrick Williams - str: "REASON=%s" 136*ede0a25eSPatrick Williams type: string 137*ede0a25eSPatrick Williams``` 138*ede0a25eSPatrick Williams 139*ede0a25eSPatrick WilliamsApplication code reporting an error: 140*ede0a25eSPatrick Williams 141*ede0a25eSPatrick Williams```cpp 142*ede0a25eSPatrick Williamselog<InvalidCertificate>(Reason("Invalid certificate file format")); 143*ede0a25eSPatrick Williams// or 144*ede0a25eSPatrick Williamsreport<InvalidCertificate>(Reason("Existing certificate file is corrupted")); 145*ede0a25eSPatrick Williams``` 146*ede0a25eSPatrick Williams 147*ede0a25eSPatrick WilliamsIn this sample, an error named 148*ede0a25eSPatrick Williams`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can 149*ede0a25eSPatrick Williamsbe sent between applications as a DBus response. The `InvalidCertificate` is 150*ede0a25eSPatrick Williamsexpected to have additional metadata `REASON` which is a string. The two APIs 151*ede0a25eSPatrick Williams`elog` and `report` have slightly different behaviors: `elog` throws an 152*ede0a25eSPatrick Williamsexception which can either result in an error DBus result or be handled 153*ede0a25eSPatrick Williamselsewhere in the application, while `report` sends the event directly to 154*ede0a25eSPatrick Williams`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the 155*ede0a25eSPatrick Williamsmetadata is inserted into the `systemd` journal. 156*ede0a25eSPatrick Williams 157*ede0a25eSPatrick WilliamsWhen an error is sent to the `phosphor-logging` daemon, it will: 158*ede0a25eSPatrick Williams 159*ede0a25eSPatrick Williams1. Search back through the journal for recorded metadata associated with the 160*ede0a25eSPatrick Williams event (this is a relative slow operation). 161*ede0a25eSPatrick Williams2. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object 162*ede0a25eSPatrick Williams with the associated data extracted from the journal. 163*ede0a25eSPatrick Williams3. Persist a serialized version of the object. 164*ede0a25eSPatrick Williams 165*ede0a25eSPatrick WilliamsWithin `bmcweb` there is support for translating 166*ede0a25eSPatrick Williams`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging` 167*ede0a25eSPatrick Williamsinto Redfish `LogEntries`, but this support does not reference a Message 168*ede0a25eSPatrick WilliamsRegistry. This makes the events of limited utility for consumption by system 169*ede0a25eSPatrick Williamsmanagement software, as it cannot know all of the event types and is left to 170*ede0a25eSPatrick Williamsperform (hand-coded) regular-expressions to extract any information from the 171*ede0a25eSPatrick Williams`Message` field of the `LogEntry`. Furthermore, these regular-expressions are 172*ede0a25eSPatrick Williamslikely to become outdated over time as internal OpenBMC error reporting 173*ede0a25eSPatrick Williamsstructure, metadata, or message strings evolve. 174*ede0a25eSPatrick Williams 175*ede0a25eSPatrick Williams[Logging-Entry]: 176*ede0a25eSPatrick Williams https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1 177*ede0a25eSPatrick Williams 178*ede0a25eSPatrick Williams### Issues with the Status Quo 179*ede0a25eSPatrick Williams 180*ede0a25eSPatrick Williams- There are two different implementations of error logging, neither of which are 181*ede0a25eSPatrick Williams both complete and fully accepted by maintainers. These implementations also do 182*ede0a25eSPatrick Williams not cover tracing events. 183*ede0a25eSPatrick Williams 184*ede0a25eSPatrick Williams- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish 185*ede0a25eSPatrick Williams Message Registry and the reporting application. It also requires every 186*ede0a25eSPatrick Williams application to be "Redfish aware" which limits decoupling between applications 187*ede0a25eSPatrick Williams and external management interfaces. This also leaves gaps for reporting errors 188*ede0a25eSPatrick Williams in different management interfaces, such as inband IPMI and PLDM. The approach 189*ede0a25eSPatrick Williams also does not provide comple-time assurance of appropriate metadata 190*ede0a25eSPatrick Williams collection, which can lead to producing code being out-of-date with the 191*ede0a25eSPatrick Williams message registry definitions. 192*ede0a25eSPatrick Williams 193*ede0a25eSPatrick Williams- The `phosphor-logging` approach does not provide compile-time assurance of 194*ede0a25eSPatrick Williams appropriate metadata collection and requires expensive daemon processing of 195*ede0a25eSPatrick Williams the `systemd` journal on each error report, which limits scalability. 196*ede0a25eSPatrick Williams 197*ede0a25eSPatrick Williams- The `sdbusplus` bindings for error reporting do not currently handle lossless 198*ede0a25eSPatrick Williams transmission of errors between DBus servers and clients. 199*ede0a25eSPatrick Williams 200*ede0a25eSPatrick Williams- Similar applications can result in different Redfish `LogEntry` for the same 201*ede0a25eSPatrick Williams error scenario. This has been observed in sensor threshold exceeded events 202*ede0a25eSPatrick Williams between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and 203*ede0a25eSPatrick Williams `phosphor-health-monitor`. One cause of this is two different error reporting 204*ede0a25eSPatrick Williams approaches and disagreements amongst maintainers as to the preferred approach. 205*ede0a25eSPatrick Williams 206*ede0a25eSPatrick Williams## Requirements 207*ede0a25eSPatrick Williams 208*ede0a25eSPatrick Williams- Applications running on the BMC must be able to report errors and failure 209*ede0a25eSPatrick Williams which are persisted and available for external system management through 210*ede0a25eSPatrick Williams standards such as Redfish. 211*ede0a25eSPatrick Williams 212*ede0a25eSPatrick Williams - These errors must be structured, versioned, and the complete set of errors 213*ede0a25eSPatrick Williams able to be created by the BMC should be available at built-time of a BMC 214*ede0a25eSPatrick Williams image. 215*ede0a25eSPatrick Williams - The set of errors, able to be created by the BMC, must be able to be 216*ede0a25eSPatrick Williams transformed into relevant data sets, such as Redfish Message Registries. 217*ede0a25eSPatrick Williams - For Redfish, the transformation must comply with the Redfish standard 218*ede0a25eSPatrick Williams requirements, such as conforming to semantic versioning expectations. 219*ede0a25eSPatrick Williams - For Redfish, the transformation should allow mapping internally defined 220*ede0a25eSPatrick Williams events to pre-existing Redfish Message Registries for broader 221*ede0a25eSPatrick Williams compatibility. 222*ede0a25eSPatrick Williams - For Redfish, the implementation must also support the EventService 223*ede0a25eSPatrick Williams mechanics for push-reporting. 224*ede0a25eSPatrick Williams - Errors reported by the BMC should contain sufficient information to allow 225*ede0a25eSPatrick Williams service of the system for these failures, either by humans or automation 226*ede0a25eSPatrick Williams (depending on the individual system requirements). 227*ede0a25eSPatrick Williams 228*ede0a25eSPatrick Williams- Applications running on the BMC should be able to report important tracing 229*ede0a25eSPatrick Williams events relevant to system management and/or debug, such as the system 230*ede0a25eSPatrick Williams successfully reaching a running state. 231*ede0a25eSPatrick Williams 232*ede0a25eSPatrick Williams - All requirements relevant to errors are also applicable to tracing events. 233*ede0a25eSPatrick Williams - The implementation must have a mechanism for vendors to be able to disable 234*ede0a25eSPatrick Williams specific tracing events to conform to their own system design requirements. 235*ede0a25eSPatrick Williams 236*ede0a25eSPatrick Williams- Applications running on the BMC should be able to determine when a previously 237*ede0a25eSPatrick Williams reported error is no longer relevant and mark it as "resolved", while 238*ede0a25eSPatrick Williams maintaining the persistent record for future usages such as debug. 239*ede0a25eSPatrick Williams 240*ede0a25eSPatrick Williams- The BMC should provide a mechanism for managed entities within the server to 241*ede0a25eSPatrick Williams report their own errors and events. Examples of managed entities would be 242*ede0a25eSPatrick Williams firmware, such as the BIOS, and satellite management controllers. 243*ede0a25eSPatrick Williams 244*ede0a25eSPatrick Williams- The implementation on the BMC should scale to a minimum of 245*ede0a25eSPatrick Williams [10,000][error-discussion] error and events without impacting the BMC or 246*ede0a25eSPatrick Williams managed system performance. 247*ede0a25eSPatrick Williams 248*ede0a25eSPatrick Williams- The implementation should provide a mechanism to allow OEM or vendor 249*ede0a25eSPatrick Williams extensions to the error and event definitions (and generated artifacts such as 250*ede0a25eSPatrick Williams the Redfish Message Registry) for usage in closed-source or non-upstreamed 251*ede0a25eSPatrick Williams code. These extensions must be clearly identified, in all interfaces, as 252*ede0a25eSPatrick Williams vendor-specific and not be tied to the OpenBMC project. 253*ede0a25eSPatrick Williams 254*ede0a25eSPatrick Williams- APIs to implement error and event reporting should have good ergonomics. These 255*ede0a25eSPatrick Williams APIs must provide compile-time identification, for applicable programming 256*ede0a25eSPatrick Williams languages, of call sites which do not conform to the BMC error and event 257*ede0a25eSPatrick Williams specifications. 258*ede0a25eSPatrick Williams 259*ede0a25eSPatrick Williams - The generated error classes and APIs should not require exceptions but 260*ede0a25eSPatrick Williams should also integrate with the `sdbusplus` client and server bindings, which 261*ede0a25eSPatrick Williams do leverage exceptions. 262*ede0a25eSPatrick Williams 263*ede0a25eSPatrick Williams[error-discussion]: 264*ede0a25eSPatrick Williams https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213 265*ede0a25eSPatrick Williams 266*ede0a25eSPatrick Williams## Proposed Design 267*ede0a25eSPatrick Williams 268*ede0a25eSPatrick WilliamsThe proposed design has a few high-level design elements: 269*ede0a25eSPatrick Williams 270*ede0a25eSPatrick Williams- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error 271*ede0a25eSPatrick Williams reporting; expand it to cover tracing events; improve the ergonomics of the 272*ede0a25eSPatrick Williams associated APIs and add compile-time checking of missing metadata. 273*ede0a25eSPatrick Williams 274*ede0a25eSPatrick Williams- Add APIs to `phosphor-logging` to enable daemons to easily look up their own 275*ede0a25eSPatrick Williams previously reported events (for marking as resolved). 276*ede0a25eSPatrick Williams 277*ede0a25eSPatrick Williams- Add to `phosphor-logging` a compile-time mechanism to disable recording of 278*ede0a25eSPatrick Williams specific tracing events for vendor-level customization. 279*ede0a25eSPatrick Williams 280*ede0a25eSPatrick Williams- Generate a Redfish Message Registry for all error and events defined in 281*ede0a25eSPatrick Williams `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance 282*ede0a25eSPatrick Williams `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to 283*ede0a25eSPatrick Williams cover the Redfish Message Registry and `phosphor-logging` enhancements; 284*ede0a25eSPatrick Williams Leverage the Redfish `LogEntry.DiagnosticData` field to provide a 285*ede0a25eSPatrick Williams Base64-encoded JSON representation of the entire `Logging.Entry` for 286*ede0a25eSPatrick Williams additional diagnostics [[does this need to be optional?]]. Add support to the 287*ede0a25eSPatrick Williams `bmcweb` EventService implementation to support `phosphor-logging`-hosted 288*ede0a25eSPatrick Williams events. 289*ede0a25eSPatrick Williams 290*ede0a25eSPatrick Williams### `sdbusplus` 291*ede0a25eSPatrick Williams 292*ede0a25eSPatrick WilliamsThe `Foo.errors.yaml` content will be combined with the content formerly in the 293*ede0a25eSPatrick Williams`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new 294*ede0a25eSPatrick Williamsfile type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the 295*ede0a25eSPatrick Williamscurrent `error` and `metadata` information as well as augment with additional 296*ede0a25eSPatrick Williamsinformation necessary to generate external facing datasets, such as Redfish 297*ede0a25eSPatrick WilliamsMessage Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files 298*ede0a25eSPatrick Williamswill be deprecated as their usage is replaced by the new format. 299*ede0a25eSPatrick Williams 300*ede0a25eSPatrick WilliamsThe `sdbusplus` library will be enhanced to provide the following: 301*ede0a25eSPatrick Williams 302*ede0a25eSPatrick Williams- JSON serialization and de-serialization of generated exception types with 303*ede0a25eSPatrick Williams their assigned metadata; assignment of the JSON serialization to the `message` 304*ede0a25eSPatrick Williams field of `sd_bus_error_set` calls when errors are returned from DBus server 305*ede0a25eSPatrick Williams calls. 306*ede0a25eSPatrick Williams 307*ede0a25eSPatrick Williams- A facility to register exception types, at library load time, with the 308*ede0a25eSPatrick Williams `sdbusplus` library for automatic conversion back to C++ exception types in 309*ede0a25eSPatrick Williams DBus clients. 310*ede0a25eSPatrick Williams 311*ede0a25eSPatrick WilliamsThe binding generator(s) will be expanded to do the following: 312*ede0a25eSPatrick Williams 313*ede0a25eSPatrick Williams- Generate complete C++ exception types, with compile-time checking of missing 314*ede0a25eSPatrick Williams metadata and JSON serialization, for errors and events. Metadata can be of one 315*ede0a25eSPatrick Williams of the following types: 316*ede0a25eSPatrick Williams 317*ede0a25eSPatrick Williams - size-type and signed integer 318*ede0a25eSPatrick Williams - floating-point number 319*ede0a25eSPatrick Williams - string 320*ede0a25eSPatrick Williams - DBus object path 321*ede0a25eSPatrick Williams 322*ede0a25eSPatrick Williams- Generate a format that `bmcweb` can use to create and populate a Redfish 323*ede0a25eSPatrick Williams Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry` 324*ede0a25eSPatrick Williams for a set of errors and events 325*ede0a25eSPatrick Williams 326*ede0a25eSPatrick WilliamsFor general users of `sdbusplus` these changes should have no impact, except for 327*ede0a25eSPatrick Williamsthe availability of new generated exception types and that specialized instances 328*ede0a25eSPatrick Williamsof `sdbusplus::exception::generated_exception` will become available in DBus 329*ede0a25eSPatrick Williamsclients. 330*ede0a25eSPatrick Williams 331*ede0a25eSPatrick Williams### `phosphor-dbus-interfaces` 332*ede0a25eSPatrick Williams 333*ede0a25eSPatrick WilliamsRefactoring will be done to migrate existing `Foo.metadata.yaml` and 334*ede0a25eSPatrick Williams`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by 335*ede0a25eSPatrick Williamsapplications. Minor changes will take place to utilize the new binding 336*ede0a25eSPatrick Williamsgenerators from `sdbusplus`. A small library enhancement will be done to 337*ede0a25eSPatrick Williamsregister all generated exception types with `sdbusplus`. Future contributors 338*ede0a25eSPatrick Williamswill be able to contribute new error and tracing event definitions. 339*ede0a25eSPatrick Williams 340*ede0a25eSPatrick Williams### `phosphor-logging` 341*ede0a25eSPatrick Williams 342*ede0a25eSPatrick Williams> TODO: Should a tracing event be a `Logging.Entry` with severity of 343*ede0a25eSPatrick Williams> `Informational` or should they be a new type, such as `Logging.Event` and 344*ede0a25eSPatrick Williams> managed separately. The `phosphor-logging` default `meson.options` have 345*ede0a25eSPatrick Williams> `error_cap=200` and `error_info_cap=10`. If we increase the total number of 346*ede0a25eSPatrick Williams> events allowed to 10K, the majority of them are likely going to be information 347*ede0a25eSPatrick Williams> / tracing events. 348*ede0a25eSPatrick Williams 349*ede0a25eSPatrick WilliamsThe `Logging.Entry` interface's `AdditionalData` property should change to 350*ede0a25eSPatrick Williams`dict[string, variant[string,int64_t,size_t,object_path]]`. 351*ede0a25eSPatrick Williams 352*ede0a25eSPatrick WilliamsThe `Logging.Create` interface will have a new method added: 353*ede0a25eSPatrick Williams 354*ede0a25eSPatrick Williams```yaml 355*ede0a25eSPatrick Williams- name: CreateEntry 356*ede0a25eSPatrick Williams parameters: 357*ede0a25eSPatrick Williams - name: Message 358*ede0a25eSPatrick Williams type: string 359*ede0a25eSPatrick Williams - name: Severity 360*ede0a25eSPatrick Williams type: enum[Logging.Entry.Level] 361*ede0a25eSPatrick Williams - name: AdditionalData 362*ede0a25eSPatrick Williams type: dict[string, variant[string,int64_t,size_t,object_path]] 363*ede0a25eSPatrick Williams - name: Hint 364*ede0a25eSPatrick Williams type: string 365*ede0a25eSPatrick Williams default: "" 366*ede0a25eSPatrick Williams returns: 367*ede0a25eSPatrick Williams - name: Entry 368*ede0a25eSPatrick Williams type: object_path 369*ede0a25eSPatrick Williams``` 370*ede0a25eSPatrick Williams 371*ede0a25eSPatrick WilliamsThe `Hint` parameter is used for daemons to be able to query for their 372*ede0a25eSPatrick Williamspreviously recorded error, for marking as resolved. These strings need to be 373*ede0a25eSPatrick Williamsglobally unique and are suggested to be of the format `"<service_name>:<key>"`. 374*ede0a25eSPatrick Williams 375*ede0a25eSPatrick WilliamsA `Logging.SearchHint` interface will be created, which will be recorded at the 376*ede0a25eSPatrick Williamssame object path as a `Logging.Entry` when the `Hint` parameter was not an empty 377*ede0a25eSPatrick Williamsstring: 378*ede0a25eSPatrick Williams 379*ede0a25eSPatrick Williams```yaml 380*ede0a25eSPatrick Williams- property: Hint 381*ede0a25eSPatrick Williams type: string 382*ede0a25eSPatrick Williams``` 383*ede0a25eSPatrick Williams 384*ede0a25eSPatrick WilliamsThe `Logging.Manager` interface will be added with a single method: 385*ede0a25eSPatrick Williams 386*ede0a25eSPatrick Williams```yaml 387*ede0a25eSPatrick Williams- name: FindEntry 388*ede0a25eSPatrick Williams parameters: 389*ede0a25eSPatrick Williams - name: Hint 390*ede0a25eSPatrick Williams type: String 391*ede0a25eSPatrick Williams returns: 392*ede0a25eSPatrick Williams - name: Entry 393*ede0a25eSPatrick Williams type: object_path 394*ede0a25eSPatrick Williams errors: 395*ede0a25eSPatrick Williams - xyz.openbmc_project.Common.ResourceNotFound 396*ede0a25eSPatrick Williams``` 397*ede0a25eSPatrick Williams 398*ede0a25eSPatrick WilliamsA `lg2::commit` API will be added to support the new `sdbusplus` generated 399*ede0a25eSPatrick Williamsexception types, calling the new `Logging.Create.CreateEntry` method proposed 400*ede0a25eSPatrick Williamsearlier. This new API will support `sdbusplus::bus_t` for synchronous DBus 401*ede0a25eSPatrick Williamsoperations and both `sdbusplus::async::context_t` and 402*ede0a25eSPatrick Williams`sdbusplus::asio::connection` for asynchronous DBus operations. 403*ede0a25eSPatrick Williams 404*ede0a25eSPatrick WilliamsThere are outstanding performance concerns with the `phosphor-logging` 405*ede0a25eSPatrick Williamsimplementation that may impact the ability for scaling to 10,000 event records. 406*ede0a25eSPatrick WilliamsThis issue is expected to be self-contained within `phosphor-logging`, except 407*ede0a25eSPatrick Williamsfor potential future changes to the log-retrieval interfaces used by `bmcweb`. 408*ede0a25eSPatrick WilliamsIn order to decouple the transition to this design, by callers of the logging 409*ede0a25eSPatrick WilliamsAPIs, from the experimentation and improvements in `phosphor-logging`, we will 410*ede0a25eSPatrick Williamsadd a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit` 411*ede0a25eSPatrick Williamsbehavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same 412*ede0a25eSPatrick Williamsapproach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog` 413*ede0a25eSPatrick Williamsconfiguration and `bmcweb` support to use these directly. This will allow 414*ede0a25eSPatrick Williamssystems which knowingly scale to a large number of event records, using 415*ede0a25eSPatrick Williams`rsyslog` mechanics, the same level of performance. One caveat of this support 416*ede0a25eSPatrick Williamsis that the hint and resolution behavior will not exist when that option is 417*ede0a25eSPatrick Williamsenabled. 418*ede0a25eSPatrick Williams 419*ede0a25eSPatrick Williams### `bmcweb` 420*ede0a25eSPatrick Williams 421*ede0a25eSPatrick Williams`bmcweb` already has support for build-time conversion from a Redfish Message 422*ede0a25eSPatrick WilliamsRegistry, codified in JSON, to header files it uses to serve the registry; this 423*ede0a25eSPatrick Williamswill be expanded to support Redfish Message Registries generated by `sdbusplus`. 424*ede0a25eSPatrick Williams`bmcweb` will add a Meson option for additional message registries, provided 425*ede0a25eSPatrick Williamsfrom bitbake from `phosphor-dbus-interfaces` and vendor-specific event 426*ede0a25eSPatrick Williamsdefinitions as a path to a directory of Message Registry JSONs. Support will 427*ede0a25eSPatrick Williamsalso be added for adding `phosphor-dbus-interfaces` as a Meson subproject for 428*ede0a25eSPatrick Williamsstand-alone testing. 429*ede0a25eSPatrick Williams 430*ede0a25eSPatrick WilliamsIt is desirable for `sdbusplus` to generate a Redfish Message Registry directly, 431*ede0a25eSPatrick Williamsleveraging the existing scripts for integration with `bmcweb`. As part of this 432*ede0a25eSPatrick Williamswe would like to support mapping a `Logging.Entry` event to an existing 433*ede0a25eSPatrick Williamsstandardized Redfish event (such as those in the Base registry). The generated 434*ede0a25eSPatrick Williamsinformation must contain the `Logging.Entry::Message` identifier, the 435*ede0a25eSPatrick Williams`AdditionalData` to `MessageArgs` mapping, and the translation from the 436*ede0a25eSPatrick Williams`Message` identifier to the Redfish Message ID (when the Message ID is not from 437*ede0a25eSPatrick Williams"this" registry). In order to facilitate this, we will need to add OEM fields to 438*ede0a25eSPatrick Williamsthe Redfish Message Registry JSON, which are only used by the `bmcweb` 439*ede0a25eSPatrick Williamsprocessing scripts, to generate the information necessary for this additional 440*ede0a25eSPatrick Williamsmapping. 441*ede0a25eSPatrick Williams 442*ede0a25eSPatrick WilliamsThe `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be 443*ede0a25eSPatrick Williamsenhanced, to utilize these Message Registries, in four ways: 444*ede0a25eSPatrick Williams 445*ede0a25eSPatrick Williams1. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned 446*ede0a25eSPatrick Williams to the `DiagnosticData` property. 447*ede0a25eSPatrick Williams 448*ede0a25eSPatrick Williams2. If the `Logging.Entry::Message` contains an identifier corresponding to a 449*ede0a25eSPatrick Williams Registry entry, the `MessageId` property will be set to the corresponding 450*ede0a25eSPatrick Williams Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used 451*ede0a25eSPatrick Williams directly with no further transformation (as is done today). 452*ede0a25eSPatrick Williams 453*ede0a25eSPatrick Williams3. If the `Logging.Entry::Message` contains an identifier corresponding to a 454*ede0a25eSPatrick Williams Registry entry, the `MessageArgs` property will be filled in by obtaining the 455*ede0a25eSPatrick Williams corresponding values from the `AdditionalData` dictionary and the `Message` 456*ede0a25eSPatrick Williams field will be generated from combining these values with the `Message` string 457*ede0a25eSPatrick Williams from the Registry. 458*ede0a25eSPatrick Williams 459*ede0a25eSPatrick Williams4. A mechanism should be implemented to translate DBus `object_path` references 460*ede0a25eSPatrick Williams to Redfish Resource URIs. When an `object_path` cannot be translated, 461*ede0a25eSPatrick Williams `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value. 462*ede0a25eSPatrick Williams 463*ede0a25eSPatrick WilliamsThe implementation of `EventService` should be enhanced to support 464*ede0a25eSPatrick Williams`phosphor-logging` hosted events. The implementation of `LogService` should be 465*ede0a25eSPatrick Williamsenhanced to support log paging for `phosphor-logging` hosted events. 466*ede0a25eSPatrick Williams 467*ede0a25eSPatrick Williams### `phosphor-sel-logger` 468*ede0a25eSPatrick Williams 469*ede0a25eSPatrick WilliamsThe `phosphor-sel-logger` has a meson option `send-to-logger` which toggles 470*ede0a25eSPatrick Williamsbetween using `phosphor-logging` or the [`REDFISH_MESSAGE_ID` 471*ede0a25eSPatrick Williamsmechanism][existing-design]. The `phosphor-logging`-utilizing paths will be 472*ede0a25eSPatrick Williamsupdated to utilize `phosphor-dbus-interfaces` specified errors and events. 473*ede0a25eSPatrick Williams 474*ede0a25eSPatrick Williams### YAML format 475*ede0a25eSPatrick Williams 476*ede0a25eSPatrick WilliamsConsider an example file in `phosphor-dbus-interfaces` as 477*ede0a25eSPatrick Williams`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors 478*ede0a25eSPatrick Williamsand events: 479*ede0a25eSPatrick Williams 480*ede0a25eSPatrick Williams```yaml 481*ede0a25eSPatrick Williamsversion: 1.3.1 482*ede0a25eSPatrick Williams 483*ede0a25eSPatrick Williamserrors: 484*ede0a25eSPatrick Williams - name: UpdateFailure 485*ede0a25eSPatrick Williams severity: critical 486*ede0a25eSPatrick Williams metadata: 487*ede0a25eSPatrick Williams - name: TARGET 488*ede0a25eSPatrick Williams type: string 489*ede0a25eSPatrick Williams primary: true 490*ede0a25eSPatrick Williams - name: ERRNO 491*ede0a25eSPatrick Williams type: int64 492*ede0a25eSPatrick Williams - name: CALLOUT_HARDWARE 493*ede0a25eSPatrick Williams type: object_path 494*ede0a25eSPatrick Williams primary: true 495*ede0a25eSPatrick Williams en: 496*ede0a25eSPatrick Williams description: While updating the firmware on a device, the update failed. 497*ede0a25eSPatrick Williams message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}. 498*ede0a25eSPatrick Williams resolution: Retry update. 499*ede0a25eSPatrick Williams 500*ede0a25eSPatrick Williams - name: BMCUpdateFailure 501*ede0a25eSPatrick Williams severity: critical 502*ede0a25eSPatrick Williams deprecated: 1.0.0 503*ede0a25eSPatrick Williams en: 504*ede0a25eSPatrick Williams description: Failed to update the BMC 505*ede0a25eSPatrick Williams redfish-mapping: OpenBMC.FirmwareUpdateFailed 506*ede0a25eSPatrick Williams 507*ede0a25eSPatrick Williamsevents: 508*ede0a25eSPatrick Williams - name: UpdateProgress 509*ede0a25eSPatrick Williams metadata: 510*ede0a25eSPatrick Williams - name: TARGET 511*ede0a25eSPatrick Williams type: string 512*ede0a25eSPatrick Williams primary: true 513*ede0a25eSPatrick Williams - name: COMPLETION 514*ede0a25eSPatrick Williams type: double 515*ede0a25eSPatrick Williams primary: true 516*ede0a25eSPatrick Williams en: 517*ede0a25eSPatrick Williams description: An update is in progress and has reached a checkpoint. 518*ede0a25eSPatrick Williams message: Updating of {TARGET} is {COMPLETION}% complete. 519*ede0a25eSPatrick Williams``` 520*ede0a25eSPatrick Williams 521*ede0a25eSPatrick WilliamsEach `foo.events.yaml` file would be used to generate both the C++ classes (via 522*ede0a25eSPatrick Williams`sdbusplus`) for exception handling and event reporting, as well as a versioned 523*ede0a25eSPatrick WilliamsRedfish Message Registry for the errors and events. The YAML schema is as 524*ede0a25eSPatrick Williamsfollows: 525*ede0a25eSPatrick Williams 526*ede0a25eSPatrick Williams```yaml 527*ede0a25eSPatrick Williams$id: https://openbmc-project.xyz/sdbusplus/events.schema.yaml 528*ede0a25eSPatrick Williams$schema: https://json-schema.org/draft/2020-12/schema 529*ede0a25eSPatrick Williamstitle: Event and error definitions 530*ede0a25eSPatrick Williamstype: object 531*ede0a25eSPatrick Williams$defs: 532*ede0a25eSPatrick Williams event: 533*ede0a25eSPatrick Williams type: array 534*ede0a25eSPatrick Williams items: 535*ede0a25eSPatrick Williams type: object 536*ede0a25eSPatrick Williams properties: 537*ede0a25eSPatrick Williams name: 538*ede0a25eSPatrick Williams type: string 539*ede0a25eSPatrick Williams description: 540*ede0a25eSPatrick Williams An identifier for the event in UpperCamelCase; used as the class and 541*ede0a25eSPatrick Williams Redfish Message ID. 542*ede0a25eSPatrick Williams en: 543*ede0a25eSPatrick Williams type: object 544*ede0a25eSPatrick Williams description: The details for English. 545*ede0a25eSPatrick Williams properties: 546*ede0a25eSPatrick Williams description: 547*ede0a25eSPatrick Williams type: string 548*ede0a25eSPatrick Williams description: 549*ede0a25eSPatrick Williams A developer-applicable description of the error reported. These 550*ede0a25eSPatrick Williams form the "description" of the Redfish message. 551*ede0a25eSPatrick Williams message: 552*ede0a25eSPatrick Williams type: string 553*ede0a25eSPatrick Williams description: 554*ede0a25eSPatrick Williams The end-user message, including placeholders for arguemnts. 555*ede0a25eSPatrick Williams resolution: 556*ede0a25eSPatrick Williams type: string 557*ede0a25eSPatrick Williams description: The end-user resolution. 558*ede0a25eSPatrick Williams severity: 559*ede0a25eSPatrick Williams enum: 560*ede0a25eSPatrick Williams - emergency 561*ede0a25eSPatrick Williams - alert 562*ede0a25eSPatrick Williams - critical 563*ede0a25eSPatrick Williams - error 564*ede0a25eSPatrick Williams - warning 565*ede0a25eSPatrick Williams - notice 566*ede0a25eSPatrick Williams - informational 567*ede0a25eSPatrick Williams - debug 568*ede0a25eSPatrick Williams description: 569*ede0a25eSPatrick Williams The `xyz.openbmc_project.Logging.Entry.Level` value for this 570*ede0a25eSPatrick Williams error. Only applicable for 'errors'. 571*ede0a25eSPatrick Williams redfish-mapping: 572*ede0a25eSPatrick Williams type: string 573*ede0a25eSPatrick Williams description: 574*ede0a25eSPatrick Williams Used when a `sdbusplus` event should map to a specific Redfish 575*ede0a25eSPatrick Williams Message rather than a generated one. This is useful when an internal 576*ede0a25eSPatrick Williams error has an analog in a standardized registry. 577*ede0a25eSPatrick Williams deprecated: 578*ede0a25eSPatrick Williams type: string 579*ede0a25eSPatrick Williams pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$" 580*ede0a25eSPatrick Williams description: 581*ede0a25eSPatrick Williams Indicates that the event is now deprecated and should not be created 582*ede0a25eSPatrick Williams by any OpenBMC software, but is required to still exist for 583*ede0a25eSPatrick Williams generation in the Redfish Message Registry. The version listed here 584*ede0a25eSPatrick Williams should be the first version where the error is no longer used. 585*ede0a25eSPatrick Williams metadata: 586*ede0a25eSPatrick Williams type: array 587*ede0a25eSPatrick Williams items: 588*ede0a25eSPatrick Williams type: object 589*ede0a25eSPatrick Williams properties: 590*ede0a25eSPatrick Williams name: 591*ede0a25eSPatrick Williams type: string 592*ede0a25eSPatrick Williams description: The name of the metadata field. 593*ede0a25eSPatrick Williams type: 594*ede0a25eSPatrick Williams enum: 595*ede0a25eSPatrick Williams - string 596*ede0a25eSPatrick Williams - size 597*ede0a25eSPatrick Williams - int64 598*ede0a25eSPatrick Williams - uint64 599*ede0a25eSPatrick Williams - double 600*ede0a25eSPatrick Williams - object_path 601*ede0a25eSPatrick Williams description: The type of the metadata field. 602*ede0a25eSPatrick Williams primary: 603*ede0a25eSPatrick Williams type: boolean 604*ede0a25eSPatrick Williams description: 605*ede0a25eSPatrick Williams Set to true when the metadata field is expected to be part of 606*ede0a25eSPatrick Williams the Redfish `MessageArgs` (and not only in the extended 607*ede0a25eSPatrick Williams `DiagnosticData`). 608*ede0a25eSPatrick Williamsproperties: 609*ede0a25eSPatrick Williams version: 610*ede0a25eSPatrick Williams type: string 611*ede0a25eSPatrick Williams pattern: "^[0-9]+\\.[0-9]+\\.[0-9]+$" 612*ede0a25eSPatrick Williams description: 613*ede0a25eSPatrick Williams The version of the file, which will be used as the Redfish Message 614*ede0a25eSPatrick Williams Registry version. 615*ede0a25eSPatrick Williamserrors: 616*ede0a25eSPatrick Williams $ref: "#/definitions/event" 617*ede0a25eSPatrick Williamsevents: 618*ede0a25eSPatrick Williams $ref: ":#/definitions/event" 619*ede0a25eSPatrick Williams``` 620*ede0a25eSPatrick Williams 621*ede0a25eSPatrick WilliamsThe above example YAML would generate C++ classes similar to: 622*ede0a25eSPatrick Williams 623*ede0a25eSPatrick Williams```cpp 624*ede0a25eSPatrick Williamsnamespace sdbusplus::errors::xyz::openbmc_project::software::update 625*ede0a25eSPatrick Williams{ 626*ede0a25eSPatrick Williams 627*ede0a25eSPatrick Williamsclass UpdateFailure 628*ede0a25eSPatrick Williams{ 629*ede0a25eSPatrick Williams 630*ede0a25eSPatrick Williams template <typename... Args> 631*ede0a25eSPatrick Williams UpdateFailure(Args&&... args); 632*ede0a25eSPatrick Williams}; 633*ede0a25eSPatrick Williams 634*ede0a25eSPatrick Williams} 635*ede0a25eSPatrick Williams 636*ede0a25eSPatrick Williamsnamespace sdbusplus::events::xyz::openbmc_project::software::update 637*ede0a25eSPatrick Williams{ 638*ede0a25eSPatrick Williams 639*ede0a25eSPatrick Williamsclass UpdateProgress 640*ede0a25eSPatrick Williams{ 641*ede0a25eSPatrick Williams template <typename... Args> 642*ede0a25eSPatrick Williams UpdateProgress(Args&&... args); 643*ede0a25eSPatrick Williams}; 644*ede0a25eSPatrick Williams 645*ede0a25eSPatrick Williams} 646*ede0a25eSPatrick Williams``` 647*ede0a25eSPatrick Williams 648*ede0a25eSPatrick WilliamsThe constructors here are variadic templates because the generated constructor 649*ede0a25eSPatrick Williamsimplementation will provide compile-time assurance that all of the metadata 650*ede0a25eSPatrick Williamsfields have been populated (in any order). To raise an `UpdateFailure` a 651*ede0a25eSPatrick Williamsdevelopers might do something like: 652*ede0a25eSPatrick Williams 653*ede0a25eSPatrick Williams```cpp 654*ede0a25eSPatrick Williams// Immediately report the event: 655*ede0a25eSPatrick Williamslg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path)); 656*ede0a25eSPatrick Williams// or send it in a dbus response (when using sdbusplus generated binding): 657*ede0a25eSPatrick Williamsthrow UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path); 658*ede0a25eSPatrick Williams``` 659*ede0a25eSPatrick Williams 660*ede0a25eSPatrick WilliamsIf one of the fields, such as `ERRNO` were omitted, a compile failure will be 661*ede0a25eSPatrick Williamsraised indicating the first missing field. 662*ede0a25eSPatrick Williams 663*ede0a25eSPatrick Williams### Versioning Policy 664*ede0a25eSPatrick Williams 665*ede0a25eSPatrick WilliamsAssume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention. 666*ede0a25eSPatrick Williams 667*ede0a25eSPatrick Williams- Adjusting a description or message should result in a `PATCH` increment. 668*ede0a25eSPatrick Williams- Adding a new error or event, or adding metadata to an existing error or event, 669*ede0a25eSPatrick Williams should result in a `MINOR` increment. 670*ede0a25eSPatrick Williams- Deprecating an error or event should result in a `MAJOR` increment. 671*ede0a25eSPatrick Williams 672*ede0a25eSPatrick WilliamsThere is [guidance on maintenance][registry-guidance] of the OpenBMC Message 673*ede0a25eSPatrick WilliamsRegistry. We will incorporate that guidance into the equivalent 674*ede0a25eSPatrick Williams`phosphor-dbus-interfaces` policy. 675*ede0a25eSPatrick Williams 676*ede0a25eSPatrick Williams[registry-guidance]: 677*ede0a25eSPatrick Williams https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md 678*ede0a25eSPatrick Williams 679*ede0a25eSPatrick Williams### Generated Redfish Message Registry 680*ede0a25eSPatrick Williams 681*ede0a25eSPatrick Williams[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish 682*ede0a25eSPatrick WilliamsMessage Registries and dictates guidelines for identifiers. 683*ede0a25eSPatrick Williams 684*ede0a25eSPatrick WilliamsThe hypothetical events defined above would create a message registry similar 685*ede0a25eSPatrick Williamsto: 686*ede0a25eSPatrick Williams 687*ede0a25eSPatrick Williams```json 688*ede0a25eSPatrick Williams{ 689*ede0a25eSPatrick Williams "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1", 690*ede0a25eSPatrick Williams "Language": "en", 691*ede0a25eSPatrick Williams "Messages": { 692*ede0a25eSPatrick Williams "UpdateFailure": { 693*ede0a25eSPatrick Williams "Description": "While updating the firmware on a device, the update failed.", 694*ede0a25eSPatrick Williams "Message": "A failure occurred updating %1 on %2.", 695*ede0a25eSPatrick Williams "Resolution": "Retry update." 696*ede0a25eSPatrick Williams "NumberOfArgs": 2, 697*ede0a25eSPatrick Williams "ParamTypes": ["string", "string"], 698*ede0a25eSPatrick Williams "Severity": "Critical", 699*ede0a25eSPatrick Williams }, 700*ede0a25eSPatrick Williams "UpdateProgress" : { 701*ede0a25eSPatrick Williams "Description": "An update is in progress and has reached a checkpoint." 702*ede0a25eSPatrick Williams "Message": "Updating of %1 is %2\% complete.", 703*ede0a25eSPatrick Williams "Resolution": "None", 704*ede0a25eSPatrick Williams "NumberOfArgs": 2, 705*ede0a25eSPatrick Williams "ParamTypes": ["string", "number"], 706*ede0a25eSPatrick Williams "Severity": "OK", 707*ede0a25eSPatrick Williams } 708*ede0a25eSPatrick Williams } 709*ede0a25eSPatrick Williams} 710*ede0a25eSPatrick Williams``` 711*ede0a25eSPatrick Williams 712*ede0a25eSPatrick WilliamsThe prefix `OpenBMC_Base` shall be exclusively reserved for use by events from 713*ede0a25eSPatrick Williams`phosphor-logging`. Events defined in other repositories will be expected to use 714*ede0a25eSPatrick Williamssome other prefix. Vendor-defined repositories should use a vendor-owned prefix 715*ede0a25eSPatrick Williamsas directed by [DSP0266][dsp0266]. 716*ede0a25eSPatrick Williams 717*ede0a25eSPatrick Williams[dsp0266]: 718*ede0a25eSPatrick Williams https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf 719*ede0a25eSPatrick Williams 720*ede0a25eSPatrick Williams### Vendor implications 721*ede0a25eSPatrick Williams 722*ede0a25eSPatrick WilliamsAs specified above, vendors must use their own identifiers in order to conform 723*ede0a25eSPatrick Williamswith the Redfish specification (see [DSP0266][dsp0266] for requirements on 724*ede0a25eSPatrick Williamsidentifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`) 725*ede0a25eSPatrick Williamsimplementation(s) will enable vendors to create their own events for downstream 726*ede0a25eSPatrick Williamscode and Registries for integration with Redfish, by creating downstream 727*ede0a25eSPatrick Williamsrepositories of error definitions. Vendors are responsible for ensuring their 728*ede0a25eSPatrick Williamsown versioning and identifiers conform to the expectations in the [Redfish 729*ede0a25eSPatrick Williamsspecification][dsp0266]. 730*ede0a25eSPatrick Williams 731*ede0a25eSPatrick WilliamsOne potential bad behavior on the part of vendors would be forking and modifying 732*ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events. Vendors must not add their own events 733*ede0a25eSPatrick Williamsto `phosphor-dbus-interfaces` in downstream implementations because it would 734*ede0a25eSPatrick Williamslead to their implementation advertising support for a message in an 735*ede0a25eSPatrick WilliamsOpenBMC-owned Registry which is not the case, but they should add them to their 736*ede0a25eSPatrick Williamsown repositories with a separate identifier. Similarly, if a vendor were to 737*ede0a25eSPatrick Williams_backport_ upstream changes into their fork, they would need to ensure that the 738*ede0a25eSPatrick Williams`foo.events.yaml` file for that version matches identically with the upstream 739*ede0a25eSPatrick Williamsimplementation. 740*ede0a25eSPatrick Williams 741*ede0a25eSPatrick Williams## Alternatives Considered 742*ede0a25eSPatrick Williams 743*ede0a25eSPatrick WilliamsMany alternatives have been explored and referenced through earlier work. Within 744*ede0a25eSPatrick Williamsthis proposal there are many minor-alternatives that have been assessed. 745*ede0a25eSPatrick Williams 746*ede0a25eSPatrick Williams### Exception inheritance 747*ede0a25eSPatrick Williams 748*ede0a25eSPatrick WilliamsThe original `phosphor-logging` error descriptions allowed inheritance between 749*ede0a25eSPatrick Williamstwo errors. This is not supported by the proposal for two reasons: 750*ede0a25eSPatrick Williams 751*ede0a25eSPatrick Williams- This introduces complexity in the Redfish Message Registry versioning because 752*ede0a25eSPatrick Williams a change in one file should induce version changes in all dependent files. 753*ede0a25eSPatrick Williams 754*ede0a25eSPatrick Williams- It makes it difficult for a developer to clearly identify all of the fields 755*ede0a25eSPatrick Williams they are expected to populate without traversing multiple files. 756*ede0a25eSPatrick Williams 757*ede0a25eSPatrick Williams### sdbusplus Exception APIs 758*ede0a25eSPatrick Williams 759*ede0a25eSPatrick WilliamsThere are a few possible syntaxes I came up with for constructing the generated 760*ede0a25eSPatrick Williamsexception types. It is important that these have good ergonomics, are easy to 761*ede0a25eSPatrick Williamsunderstand, and can provide compile-time awareness of missing metadata fields. 762*ede0a25eSPatrick Williams 763*ede0a25eSPatrick Williams```cpp 764*ede0a25eSPatrick Williams using Example = sdbusplus::error::xyz::openbmc_project::Example; 765*ede0a25eSPatrick Williams 766*ede0a25eSPatrick Williams // 1) 767*ede0a25eSPatrick Williams throw Example().fru("Motherboard").value(42); 768*ede0a25eSPatrick Williams 769*ede0a25eSPatrick Williams // 2) 770*ede0a25eSPatrick Williams throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42); 771*ede0a25eSPatrick Williams 772*ede0a25eSPatrick Williams // 3) 773*ede0a25eSPatrick Williams throw Example("FRU", "Motherboard", "VALUE", 42); 774*ede0a25eSPatrick Williams 775*ede0a25eSPatrick Williams // 4) 776*ede0a25eSPatrick Williams throw Example([](auto e) { return e.fru("Motherboard").value(42); }); 777*ede0a25eSPatrick Williams 778*ede0a25eSPatrick Williams // 5) 779*ede0a25eSPatrick Williams throw Example({.fru = "Motherboard", .value = 42}); 780*ede0a25eSPatrick Williams``` 781*ede0a25eSPatrick Williams 782*ede0a25eSPatrick Williams**Note**: These examples are all show using `throw` syntax, but could also be 783*ede0a25eSPatrick Williamssaved in local variables, returned from functions, or immediately passed to 784*ede0a25eSPatrick Williams`lg2::commit`. 785*ede0a25eSPatrick Williams 786*ede0a25eSPatrick Williams1. This would be my preference for ergonomics and clarity, as it would allow 787*ede0a25eSPatrick Williams LSP-enabled editors to give completions for the metadata fields but 788*ede0a25eSPatrick Williams unfortunately there is no mechanism in C++ to define a type which can be 789*ede0a25eSPatrick Williams constructed but not thrown, which means we cannot get compile-time checking 790*ede0a25eSPatrick Williams of all metadata fields. 791*ede0a25eSPatrick Williams 792*ede0a25eSPatrick Williams2. This syntax uses tag-dispatch to enables compile-time checking of all 793*ede0a25eSPatrick Williams metadata fields and potential LSP-completion of the tag-types, but is more 794*ede0a25eSPatrick Williams verbose than option 3. 795*ede0a25eSPatrick Williams 796*ede0a25eSPatrick Williams3. This syntax is less verbose than (2) and follows conventions already used in 797*ede0a25eSPatrick Williams `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the 798*ede0a25eSPatrick Williams metadata tags. 799*ede0a25eSPatrick Williams 800*ede0a25eSPatrick Williams4. This syntax is similar to option (1) but uses an indirection of a lambda to 801*ede0a25eSPatrick Williams enable compile-time checking that all metadata fields have been populated by 802*ede0a25eSPatrick Williams the lambda. The LSP-completion is likely not as strong as option (1), due to 803*ede0a25eSPatrick Williams the use of `auto`, and the lambda necessity will likely be a hang-up for 804*ede0a25eSPatrick Williams unfamiliar developers. 805*ede0a25eSPatrick Williams 806*ede0a25eSPatrick Williams5. This syntax has similar characteristics as option (1) but similarly does not 807*ede0a25eSPatrick Williams provide compile-time confirmation that all fields have been populated. 808*ede0a25eSPatrick Williams 809*ede0a25eSPatrick WilliamsThe proposal therefore suggests option (3) is most suitable. 810*ede0a25eSPatrick Williams 811*ede0a25eSPatrick Williams### Redfish Translation Support 812*ede0a25eSPatrick Williams 813*ede0a25eSPatrick WilliamsThe proposed YAML format allows future addition of translation but it is not 814*ede0a25eSPatrick Williamsenabled at this time. Future development could enable the Redfish Message 815*ede0a25eSPatrick WilliamsRegistry to be generated in multiple languages if the `message:language` exists 816*ede0a25eSPatrick Williamsfor those languages. 817*ede0a25eSPatrick Williams 818*ede0a25eSPatrick Williams### Redfish Registry Versioning 819*ede0a25eSPatrick Williams 820*ede0a25eSPatrick WilliamsThe Redfish Message Registries are required to be versioned and has 3 digit 821*ede0a25eSPatrick Williamsfields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the 822*ede0a25eSPatrick WilliamsMessage ID. Rather than using the manually specified version we could take a few 823*ede0a25eSPatrick Williamsother approaches: 824*ede0a25eSPatrick Williams 825*ede0a25eSPatrick Williams- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the 826*ede0a25eSPatrick Williams registry was built. 827*ede0a25eSPatrick Williams 828*ede0a25eSPatrick Williams - This does not cover vendors that may choose to branch for stabilization 829*ede0a25eSPatrick Williams purposes, so we can end up with two machines having the same 830*ede0a25eSPatrick Williams OpenBMC-versioned message registry with different content. 831*ede0a25eSPatrick Williams 832*ede0a25eSPatrick Williams- Use the most recent `openbmc/openbmc` tag as the version. 833*ede0a25eSPatrick Williams 834*ede0a25eSPatrick Williams - This does not cover vendors that build off HEAD and may deploy multiple 835*ede0a25eSPatrick Williams images between two OpenBMC releases. 836*ede0a25eSPatrick Williams 837*ede0a25eSPatrick Williams- Generate the version based on the git-history. 838*ede0a25eSPatrick Williams 839*ede0a25eSPatrick Williams - This requires `phosphor-dbus-interfaces` to be built from a git repository, 840*ede0a25eSPatrick Williams which may not always be true for Yocto source mirrors, and requires 841*ede0a25eSPatrick Williams non-trivial processing that continues to scale over time. 842*ede0a25eSPatrick Williams 843*ede0a25eSPatrick Williams### Existing OpenBMC Redfish Registry 844*ede0a25eSPatrick Williams 845*ede0a25eSPatrick WilliamsThere are currently 191 messages defined in the existing Redfish Message 846*ede0a25eSPatrick WilliamsRegistry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase 847*ede0a25eSPatrick Williamsis emitted with the correct version. 96 of those are only emitted by 848*ede0a25eSPatrick WilliamsIntel-specific code that is not pulled into any upstreamed machine, 39 are 849*ede0a25eSPatrick Williamsemitted by potentially common code, and 56 are not even referenced in the 850*ede0a25eSPatrick Williamscodebase outside of the bmcweb registry. Of the 39 common messages half of them 851*ede0a25eSPatrick Williamshave an equivalent in one of the standard registries that should be leveraged 852*ede0a25eSPatrick Williamsand many of the others do not have attributes that would facilitate a multi-host 853*ede0a25eSPatrick Williamsconfiguration, so the registry at a minimum needs to be updated. None of the 854*ede0a25eSPatrick Williamscurrent implementation has the capability to handle Redfish Resource URIs. 855*ede0a25eSPatrick Williams 856*ede0a25eSPatrick WilliamsThe proposal therefore is to deprecate the existing registry and replace it with 857*ede0a25eSPatrick Williamsthe new generated registries. For repositories that currently emit events in the 858*ede0a25eSPatrick Williamsexisting format, we can maintain those call-sites for a time period of 1-2 859*ede0a25eSPatrick Williamsyears. 860*ede0a25eSPatrick Williams 861*ede0a25eSPatrick WilliamsIf this aspect of the proposal is rejected, the YAML format allows mapping from 862*ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0` 863*ede0a25eSPatrick Williamsregistry `MessageIds`. 864*ede0a25eSPatrick Williams 865*ede0a25eSPatrick WilliamsPotentially common: 866*ede0a25eSPatrick Williams 867*ede0a25eSPatrick Williams- phosphor-post-code-manager 868*ede0a25eSPatrick Williams - BIOSPOSTCode (unique) 869*ede0a25eSPatrick Williams- dbus-sensors 870*ede0a25eSPatrick Williams - ChassisIntrusionDetected (unique) 871*ede0a25eSPatrick Williams - ChassisIntrusionReset (unique) 872*ede0a25eSPatrick Williams - FanInserted 873*ede0a25eSPatrick Williams - FanRedundancyLost (unique) 874*ede0a25eSPatrick Williams - FanRedudancyRegained (unique) 875*ede0a25eSPatrick Williams - FanRemoved 876*ede0a25eSPatrick Williams - LanLost 877*ede0a25eSPatrick Williams - LanRegained 878*ede0a25eSPatrick Williams - PowerSupplyConfigurationError (unique) 879*ede0a25eSPatrick Williams - PowerSupplyConfigurationErrorRecovered (unique) 880*ede0a25eSPatrick Williams - PowerSupplyFailed 881*ede0a25eSPatrick Williams - PowerSupplyFailurePredicted (unique) 882*ede0a25eSPatrick Williams - PowerSupplyFanFailed 883*ede0a25eSPatrick Williams - PowerSupplyFanRecovered 884*ede0a25eSPatrick Williams - PowerSupplyPowerLost 885*ede0a25eSPatrick Williams - PowerSupplyPowerRestored 886*ede0a25eSPatrick Williams - PowerSupplyPredictiedFailureRecovered (unique) 887*ede0a25eSPatrick Williams - PowerSupplyRecovered 888*ede0a25eSPatrick Williams- phosphor-sel-logger 889*ede0a25eSPatrick Williams - IPMIWatchdog (unique) 890*ede0a25eSPatrick Williams - `SensorThreshold*` : 8 different events 891*ede0a25eSPatrick Williams- phosphor-net-ipmid 892*ede0a25eSPatrick Williams - InvalidLoginAttempted (unique) 893*ede0a25eSPatrick Williams- entity-manager 894*ede0a25eSPatrick Williams - InventoryAdded (unique) 895*ede0a25eSPatrick Williams - InventoryRemoved (unique) 896*ede0a25eSPatrick Williams- estoraged 897*ede0a25eSPatrick Williams - ServiceStarted 898*ede0a25eSPatrick Williams- x86-power-control 899*ede0a25eSPatrick Williams - NMIButtonPressed (unique) 900*ede0a25eSPatrick Williams - NMIDiagnosticInterrupt (unique) 901*ede0a25eSPatrick Williams - PowerButtonPressed (unique) 902*ede0a25eSPatrick Williams - PowerRestorePolicyApplied (unique) 903*ede0a25eSPatrick Williams - PowerSupplyPowerGoodFailed (unique) 904*ede0a25eSPatrick Williams - ResetButtonPressed (unique) 905*ede0a25eSPatrick Williams - SystemPowerGoodFailed (unique) 906*ede0a25eSPatrick Williams 907*ede0a25eSPatrick WilliamsIntel-only implementations: 908*ede0a25eSPatrick Williams 909*ede0a25eSPatrick Williams- intel-ipmi-oem 910*ede0a25eSPatrick Williams - ADDDCCorrectable 911*ede0a25eSPatrick Williams - BIOSPostERROR 912*ede0a25eSPatrick Williams - BIOSRecoveryComplete 913*ede0a25eSPatrick Williams - BIOSRecoveryStart 914*ede0a25eSPatrick Williams - FirmwareUpdateCompleted 915*ede0a25eSPatrick Williams - IntelUPILinkWidthReducedToHalf 916*ede0a25eSPatrick Williams - IntelUPILinkWidthReducedToQuarter 917*ede0a25eSPatrick Williams - LegacyPCIPERR 918*ede0a25eSPatrick Williams - LegacyPCISERR 919*ede0a25eSPatrick Williams - `ME*` : 29 different events 920*ede0a25eSPatrick Williams - `Memory*` : 9 different events 921*ede0a25eSPatrick Williams - MirroringRedundancyDegraded 922*ede0a25eSPatrick Williams - MirroringRedundancyFull 923*ede0a25eSPatrick Williams - `PCIeCorrectable*`, `PCIeFatal` : 29 different events 924*ede0a25eSPatrick Williams - SELEntryAdded 925*ede0a25eSPatrick Williams - SparingRedundancyDegraded 926*ede0a25eSPatrick Williams- pfr-manager 927*ede0a25eSPatrick Williams - BIOSFirmwareRecoveryReason 928*ede0a25eSPatrick Williams - BIOSFirmwarePanicReason 929*ede0a25eSPatrick Williams - BMCFirmwarePanicReason 930*ede0a25eSPatrick Williams - BMCFirmwareRecoveryReason 931*ede0a25eSPatrick Williams - BMCFirmwareResiliencyError 932*ede0a25eSPatrick Williams - CPLDFirmwarePanicReason 933*ede0a25eSPatrick Williams - CPLDFirmwareResilencyError 934*ede0a25eSPatrick Williams - FirmwareResiliencyError 935*ede0a25eSPatrick Williams- host-error-monitor 936*ede0a25eSPatrick Williams - CPUError 937*ede0a25eSPatrick Williams - CPUMismatch 938*ede0a25eSPatrick Williams - CPUThermalTrip 939*ede0a25eSPatrick Williams - ComponentOverTemperature 940*ede0a25eSPatrick Williams - SsbThermalTrip 941*ede0a25eSPatrick Williams - VoltageRegulatorOverheated 942*ede0a25eSPatrick Williams- s2600wf-misc 943*ede0a25eSPatrick Williams - DriveError 944*ede0a25eSPatrick Williams - InventoryAdded 945*ede0a25eSPatrick Williams 946*ede0a25eSPatrick Williams## Impacts 947*ede0a25eSPatrick Williams 948*ede0a25eSPatrick Williams- New APIs are defined for error and event logging. This will deprecate existing 949*ede0a25eSPatrick Williams `phosphor-logging` APIs, with a time to migrate, for error reporting. 950*ede0a25eSPatrick Williams 951*ede0a25eSPatrick Williams- The design should improve performance by eliminating the regular parsing of 952*ede0a25eSPatrick Williams the `systemd` journal. The design may decrease performance by allowing the 953*ede0a25eSPatrick Williams number of error and event logs to be dramatically increased, which have an 954*ede0a25eSPatrick Williams impact to file system utilization and potential for DBus impacts some services 955*ede0a25eSPatrick Williams such as `ObjectMapper`. 956*ede0a25eSPatrick Williams 957*ede0a25eSPatrick Williams- Backwards compatibility and documentation should be improved by the automatic 958*ede0a25eSPatrick Williams generation of the Redfish Message Registry corresponding to all error and 959*ede0a25eSPatrick Williams event reports. 960*ede0a25eSPatrick Williams 961*ede0a25eSPatrick Williams### Organizational 962*ede0a25eSPatrick Williams 963*ede0a25eSPatrick Williams- **Does this repository require a new repository?** 964*ede0a25eSPatrick Williams - No 965*ede0a25eSPatrick Williams- **Who will be the initial maintainer(s) of this repository?** 966*ede0a25eSPatrick Williams - N/A 967*ede0a25eSPatrick Williams- **Which repositories are expected to be modified to execute this design?** 968*ede0a25eSPatrick Williams - `sdbusplus` 969*ede0a25eSPatrick Williams - `phosphor-dbus-interfaces` 970*ede0a25eSPatrick Williams - `phosphor-logging` 971*ede0a25eSPatrick Williams - `bmcweb` 972*ede0a25eSPatrick Williams - Any repository creating an error or event. 973*ede0a25eSPatrick Williams 974*ede0a25eSPatrick Williams## Testing 975*ede0a25eSPatrick Williams 976*ede0a25eSPatrick Williams- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error 977*ede0a25eSPatrick Williams and event generation, creation APIs, and to provide coverage on any changes to 978*ede0a25eSPatrick Williams the `Logging.Entry` object management. 979*ede0a25eSPatrick Williams 980*ede0a25eSPatrick Williams- Unit tests will be written for `bmcweb` for basic `Logging.Entry` 981*ede0a25eSPatrick Williams transformation and Message Registry generation. 982*ede0a25eSPatrick Williams 983*ede0a25eSPatrick Williams- Integration tests should be leveraged (and enhanced as necessary) from 984*ede0a25eSPatrick Williams `openbmc-test-automation` to cover the end-to-end error creation and Redfish 985*ede0a25eSPatrick Williams reporting. 986