xref: /openbmc/docs/designs/event-logging.md (revision 9a0248b5)
1ede0a25eSPatrick Williams# Error and Event Logging
2ede0a25eSPatrick Williams
3ede0a25eSPatrick WilliamsAuthor: [Patrick Williams][patrick-email] `<stwcx>`
4ede0a25eSPatrick Williams
5ede0a25eSPatrick Williams[patrick-email]: mailto:patrick@stwcx.xyz
6ede0a25eSPatrick Williams
7ede0a25eSPatrick WilliamsOther contributors:
8ede0a25eSPatrick Williams
9ede0a25eSPatrick WilliamsCreated: May 16, 2024
10ede0a25eSPatrick Williams
11ede0a25eSPatrick Williams## Problem Description
12ede0a25eSPatrick Williams
13ede0a25eSPatrick WilliamsThere is currently not a consistent end-to-end error and event reporting design
14ede0a25eSPatrick Williamsfor the OpenBMC code stack. There are two different implementations, one
15ede0a25eSPatrick Williamsprimarily using phosphor-logging and one using rsyslog, both of which have gaps
16ede0a25eSPatrick Williamsthat a complete solution should address. This proposal is intended to be an
17ede0a25eSPatrick Williamsend-to-end design handling both errors and tracing events which facilitate
18ede0a25eSPatrick Williamsexternal management of the system in an automated and maintainable manner.
19ede0a25eSPatrick Williams
20ede0a25eSPatrick Williams## Background and References
21ede0a25eSPatrick Williams
22ede0a25eSPatrick Williams### Redfish LogEntry and Message Registry
23ede0a25eSPatrick Williams
24ede0a25eSPatrick WilliamsIn Redfish, the [`LogEntry` schema][LogEntry] is used for a range of items that
25ede0a25eSPatrick Williamscould be considered "logs", but one such use within OpenBMC is for an equivalent
26ede0a25eSPatrick Williamsof the IPMI "System Event Log (SEL)".
27ede0a25eSPatrick Williams
28ede0a25eSPatrick WilliamsThe IPMI SEL is the location where the BMC can collect errors and events,
29ede0a25eSPatrick Williamssometimes coming from other entities, such as the BIOS. Examples of these might
30ede0a25eSPatrick Williamsbe "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful".
31ede0a25eSPatrick WilliamsThese SEL records are exposed as human readable strings, either natively by a
32ede0a25eSPatrick WilliamsOEM SEL design or by tools such as `ipmitool`, which are typically unique to
33ede0a25eSPatrick Williamseach system or manufacturer, and could hypothethically change with a BMC or
34ede0a25eSPatrick Williamsfirmware update, and are thus difficult to create automated tooling around. Two
35ede0a25eSPatrick Williamsdifferent vendors might use different strings to represent a critical
36ede0a25eSPatrick Williamstemperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example]
37ede0a25eSPatrick Williamsand ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is
38ede0a25eSPatrick Williamsalso no mechanism with IPMI to ask the machine "what are all of the SELs you
39ede0a25eSPatrick Williamsmight create".
40ede0a25eSPatrick Williams
41ede0a25eSPatrick WilliamsIn order to solve two aspects of this problem, listing of possible events and
42ede0a25eSPatrick Williamsversioning, Redfish has Message Registries. A message registry is a versioned
43ede0a25eSPatrick Williamscollection of all of the error events that a system could generate and hints as
44ede0a25eSPatrick Williamsto how they might be parsed and displayed to a user. An [informative
45ede0a25eSPatrick Williamsreference][Registry-Example] from the DMTF gives this example:
46ede0a25eSPatrick Williams
47ede0a25eSPatrick Williams```json
48ede0a25eSPatrick Williams{
49ede0a25eSPatrick Williams  "@odata.type": "#MessageRegistry.v1_0_0.MessageRegistry",
50ede0a25eSPatrick Williams  "Id": "Alert.1.0.0",
51ede0a25eSPatrick Williams  "RegistryPrefix": "Alert",
52ede0a25eSPatrick Williams  "RegistryVersion": "1.0.0",
53ede0a25eSPatrick Williams  "Messages": {
54ede0a25eSPatrick Williams    "LanDisconnect": {
55ede0a25eSPatrick Williams      "Description": "A LAN Disconnect on %1 was detected on system %2.",
56ede0a25eSPatrick Williams      "Message": "A LAN Disconnect on %1 was detected on system %2.",
57ede0a25eSPatrick Williams      "Severity": "Warning",
58ede0a25eSPatrick Williams      "NumberOfArgs": 2,
59ede0a25eSPatrick Williams      "Resolution": "None"
60ede0a25eSPatrick Williams    }
61ede0a25eSPatrick Williams  }
62ede0a25eSPatrick Williams}
63ede0a25eSPatrick Williams```
64ede0a25eSPatrick Williams
65ede0a25eSPatrick WilliamsThis example defines an event, `Alert.1.0.LanDisconnect`, which can record the
66ede0a25eSPatrick Williamsdisconnect state of a network device and contains placeholders for the affected
67ede0a25eSPatrick Williamsdevice and system. When this event occurs, there might be a `LogEntry` recorded
68ede0a25eSPatrick Williamscontaining something like:
69ede0a25eSPatrick Williams
70ede0a25eSPatrick Williams```json
71ede0a25eSPatrick Williams{
72ede0a25eSPatrick Williams  "Message": "A LAN Disconnnect on EthernetInterface 1 was detected on system /redfish/v1/Systems/1.",
73ede0a25eSPatrick Williams  "MessageId": "Alert.1.0.LanDisconnect",
74ede0a25eSPatrick Williams  "MessageArgs": ["EthernetInterface 1", "/redfish/v1/Systems/1"]
75ede0a25eSPatrick Williams}
76ede0a25eSPatrick Williams```
77ede0a25eSPatrick Williams
78ede0a25eSPatrick WilliamsThe `Message` contains a human readable string which was created by applying the
79ede0a25eSPatrick Williams`MessageArgs` to the placeholders from the `Message` field in the registry.
80ede0a25eSPatrick WilliamsSystem management software can rely on the message registry (referenced from the
81ede0a25eSPatrick Williams`MessageId` field in the `LogEntry`) and `MessageArgs` to avoid needing to
82ede0a25eSPatrick Williamsperform string processing for reacting to the event.
83ede0a25eSPatrick Williams
84ede0a25eSPatrick WilliamsWithin OpenBMC, there is currently a [limited design][existing-design] for this
85ede0a25eSPatrick WilliamsRedfish feature and it requires inserting specially formed Redfish-specific
86ede0a25eSPatrick Williamslogging messages into any application that wants to record these events, tightly
87ede0a25eSPatrick Williamscoupling all applications to the Redfish implementation. It has also been
88ede0a25eSPatrick Williamsobserved that these [strings][app-example], when used, are often out of date
89ede0a25eSPatrick Williamswith the [message registry][registry-example] advertised by `bmcweb`. Some
90ede0a25eSPatrick Williamsmaintainers have rejected adding new Redfish-specific logging messages to their
91ede0a25eSPatrick Williamsapplications.
92ede0a25eSPatrick Williams
93ede0a25eSPatrick Williams[LogEntry]:
94ede0a25eSPatrick Williams  https://github.com/openbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/LogEntry.v1_16_0.json
95ede0a25eSPatrick Williams[HPE-Example]:
96ede0a25eSPatrick Williams  https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000422.html
97ede0a25eSPatrick Williams[Oracle-Example]:
98ede0a25eSPatrick Williams  https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068
99ede0a25eSPatrick Williams[Registry-Example]:
100ede0a25eSPatrick Williams  https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf
101ede0a25eSPatrick Williams[existing-design]:
102ede0a25eSPatrick Williams  https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md
103ede0a25eSPatrick Williams[app-example]:
104ede0a25eSPatrick Williams  https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2c/src/post_code.cpp#L143
105ede0a25eSPatrick Williams[registry-example]:
106ede0a25eSPatrick Williams  https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/include/registries/openbmc.json#L5
107ede0a25eSPatrick Williams
108ede0a25eSPatrick Williams### Existing phosphor-logging implementation
109ede0a25eSPatrick Williams
110ede0a25eSPatrick Williams**Note**: While the word 'exception' is used in this section, the existing (and
111ede0a25eSPatrick Williamsproposed) types can be used by applications and execution contexts with
112ede0a25eSPatrick Williamsexceptions disabled. They are 'exceptions' because they do inherit from
113ede0a25eSPatrick Williams`std::exception` and there is support in the `sdbusplus` bindings for them to be
114ede0a25eSPatrick Williamsused in exception handling.
115ede0a25eSPatrick Williams
116ede0a25eSPatrick WilliamsThe `sdbusplus` bindings have the capability to define new C++ exception types
117ede0a25eSPatrick Williamswhich can be thrown by a DBus server and turned into an error response to the
118ede0a25eSPatrick Williamsclient. `phosphor-logging` extended this to also add metadata associated to the
119ede0a25eSPatrick Williamslog type. See the following example error definitions and usages.
120ede0a25eSPatrick Williams
121ede0a25eSPatrick Williams`sdbusplus` error binding definition (in
122ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.errors.yaml`):
123ede0a25eSPatrick Williams
124ede0a25eSPatrick Williams```yaml
125ede0a25eSPatrick Williams- name: InvalidCertificate
126ede0a25eSPatrick Williams  description: Invalid certificate file.
127ede0a25eSPatrick Williams```
128ede0a25eSPatrick Williams
129ede0a25eSPatrick Williams`phosphor-logging` metadata definition (in
130ede0a25eSPatrick Williams`xyz/openbmc_project/Certs.metadata.yaml`):
131ede0a25eSPatrick Williams
132ede0a25eSPatrick Williams```yaml
133ede0a25eSPatrick Williams- name: InvalidCertificate
134ede0a25eSPatrick Williams  meta:
135ede0a25eSPatrick Williams    - str: "REASON=%s"
136ede0a25eSPatrick Williams      type: string
137ede0a25eSPatrick Williams```
138ede0a25eSPatrick Williams
139ede0a25eSPatrick WilliamsApplication code reporting an error:
140ede0a25eSPatrick Williams
141ede0a25eSPatrick Williams```cpp
142ede0a25eSPatrick Williamselog<InvalidCertificate>(Reason("Invalid certificate file format"));
143ede0a25eSPatrick Williams// or
144ede0a25eSPatrick Williamsreport<InvalidCertificate>(Reason("Existing certificate file is corrupted"));
145ede0a25eSPatrick Williams```
146ede0a25eSPatrick Williams
147ede0a25eSPatrick WilliamsIn this sample, an error named
148ede0a25eSPatrick Williams`xyz.openbmc_project.Certs.Error.InvalidCertificate` has been defined, which can
149ede0a25eSPatrick Williamsbe sent between applications as a DBus response. The `InvalidCertificate` is
150ede0a25eSPatrick Williamsexpected to have additional metadata `REASON` which is a string. The two APIs
151ede0a25eSPatrick Williams`elog` and `report` have slightly different behaviors: `elog` throws an
152ede0a25eSPatrick Williamsexception which can either result in an error DBus result or be handled
153ede0a25eSPatrick Williamselsewhere in the application, while `report` sends the event directly to
154ede0a25eSPatrick Williams`phosphor-logging`'s daemon for recording. As a side-effect of both calls, the
155ede0a25eSPatrick Williamsmetadata is inserted into the `systemd` journal.
156ede0a25eSPatrick Williams
157ede0a25eSPatrick WilliamsWhen an error is sent to the `phosphor-logging` daemon, it will:
158ede0a25eSPatrick Williams
159ede0a25eSPatrick Williams1. Search back through the journal for recorded metadata associated with the
160ede0a25eSPatrick Williams   event (this is a relative slow operation).
161ede0a25eSPatrick Williams2. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object
162ede0a25eSPatrick Williams   with the associated data extracted from the journal.
163ede0a25eSPatrick Williams3. Persist a serialized version of the object.
164ede0a25eSPatrick Williams
165ede0a25eSPatrick WilliamsWithin `bmcweb` there is support for translating
166ede0a25eSPatrick Williams`xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging`
167ede0a25eSPatrick Williamsinto Redfish `LogEntries`, but this support does not reference a Message
168ede0a25eSPatrick WilliamsRegistry. This makes the events of limited utility for consumption by system
169ede0a25eSPatrick Williamsmanagement software, as it cannot know all of the event types and is left to
170ede0a25eSPatrick Williamsperform (hand-coded) regular-expressions to extract any information from the
171ede0a25eSPatrick Williams`Message` field of the `LogEntry`. Furthermore, these regular-expressions are
172ede0a25eSPatrick Williamslikely to become outdated over time as internal OpenBMC error reporting
173ede0a25eSPatrick Williamsstructure, metadata, or message strings evolve.
174ede0a25eSPatrick Williams
175ede0a25eSPatrick Williams[Logging-Entry]:
176ede0a25eSPatrick Williams  https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/yaml/xyz/openbmc_project/Logging/Entry.interface.yaml#L1
177ede0a25eSPatrick Williams
178ede0a25eSPatrick Williams### Issues with the Status Quo
179ede0a25eSPatrick Williams
180ede0a25eSPatrick Williams- There are two different implementations of error logging, neither of which are
181ede0a25eSPatrick Williams  both complete and fully accepted by maintainers. These implementations also do
182ede0a25eSPatrick Williams  not cover tracing events.
183ede0a25eSPatrick Williams
184ede0a25eSPatrick Williams- The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish
185ede0a25eSPatrick Williams  Message Registry and the reporting application. It also requires every
186ede0a25eSPatrick Williams  application to be "Redfish aware" which limits decoupling between applications
187ede0a25eSPatrick Williams  and external management interfaces. This also leaves gaps for reporting errors
188ede0a25eSPatrick Williams  in different management interfaces, such as inband IPMI and PLDM. The approach
189ede0a25eSPatrick Williams  also does not provide comple-time assurance of appropriate metadata
190ede0a25eSPatrick Williams  collection, which can lead to producing code being out-of-date with the
191ede0a25eSPatrick Williams  message registry definitions.
192ede0a25eSPatrick Williams
193ede0a25eSPatrick Williams- The `phosphor-logging` approach does not provide compile-time assurance of
194ede0a25eSPatrick Williams  appropriate metadata collection and requires expensive daemon processing of
195ede0a25eSPatrick Williams  the `systemd` journal on each error report, which limits scalability.
196ede0a25eSPatrick Williams
197ede0a25eSPatrick Williams- The `sdbusplus` bindings for error reporting do not currently handle lossless
198ede0a25eSPatrick Williams  transmission of errors between DBus servers and clients.
199ede0a25eSPatrick Williams
200ede0a25eSPatrick Williams- Similar applications can result in different Redfish `LogEntry` for the same
201ede0a25eSPatrick Williams  error scenario. This has been observed in sensor threshold exceeded events
202ede0a25eSPatrick Williams  between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and
203ede0a25eSPatrick Williams  `phosphor-health-monitor`. One cause of this is two different error reporting
204ede0a25eSPatrick Williams  approaches and disagreements amongst maintainers as to the preferred approach.
205ede0a25eSPatrick Williams
206ede0a25eSPatrick Williams## Requirements
207ede0a25eSPatrick Williams
208ede0a25eSPatrick Williams- Applications running on the BMC must be able to report errors and failure
209ede0a25eSPatrick Williams  which are persisted and available for external system management through
210ede0a25eSPatrick Williams  standards such as Redfish.
211ede0a25eSPatrick Williams
212ede0a25eSPatrick Williams  - These errors must be structured, versioned, and the complete set of errors
213ede0a25eSPatrick Williams    able to be created by the BMC should be available at built-time of a BMC
214ede0a25eSPatrick Williams    image.
215ede0a25eSPatrick Williams  - The set of errors, able to be created by the BMC, must be able to be
216ede0a25eSPatrick Williams    transformed into relevant data sets, such as Redfish Message Registries.
217ede0a25eSPatrick Williams    - For Redfish, the transformation must comply with the Redfish standard
218ede0a25eSPatrick Williams      requirements, such as conforming to semantic versioning expectations.
219ede0a25eSPatrick Williams    - For Redfish, the transformation should allow mapping internally defined
220ede0a25eSPatrick Williams      events to pre-existing Redfish Message Registries for broader
221ede0a25eSPatrick Williams      compatibility.
222ede0a25eSPatrick Williams    - For Redfish, the implementation must also support the EventService
223ede0a25eSPatrick Williams      mechanics for push-reporting.
224ede0a25eSPatrick Williams  - Errors reported by the BMC should contain sufficient information to allow
225ede0a25eSPatrick Williams    service of the system for these failures, either by humans or automation
226ede0a25eSPatrick Williams    (depending on the individual system requirements).
227ede0a25eSPatrick Williams
228ede0a25eSPatrick Williams- Applications running on the BMC should be able to report important tracing
229ede0a25eSPatrick Williams  events relevant to system management and/or debug, such as the system
230ede0a25eSPatrick Williams  successfully reaching a running state.
231ede0a25eSPatrick Williams
232ede0a25eSPatrick Williams  - All requirements relevant to errors are also applicable to tracing events.
233ede0a25eSPatrick Williams  - The implementation must have a mechanism for vendors to be able to disable
234ede0a25eSPatrick Williams    specific tracing events to conform to their own system design requirements.
235ede0a25eSPatrick Williams
236ede0a25eSPatrick Williams- Applications running on the BMC should be able to determine when a previously
237ede0a25eSPatrick Williams  reported error is no longer relevant and mark it as "resolved", while
238ede0a25eSPatrick Williams  maintaining the persistent record for future usages such as debug.
239ede0a25eSPatrick Williams
240ede0a25eSPatrick Williams- The BMC should provide a mechanism for managed entities within the server to
241ede0a25eSPatrick Williams  report their own errors and events. Examples of managed entities would be
242ede0a25eSPatrick Williams  firmware, such as the BIOS, and satellite management controllers.
243ede0a25eSPatrick Williams
244ede0a25eSPatrick Williams- The implementation on the BMC should scale to a minimum of
245ede0a25eSPatrick Williams  [10,000][error-discussion] error and events without impacting the BMC or
246ede0a25eSPatrick Williams  managed system performance.
247ede0a25eSPatrick Williams
248ede0a25eSPatrick Williams- The implementation should provide a mechanism to allow OEM or vendor
249ede0a25eSPatrick Williams  extensions to the error and event definitions (and generated artifacts such as
250ede0a25eSPatrick Williams  the Redfish Message Registry) for usage in closed-source or non-upstreamed
251ede0a25eSPatrick Williams  code. These extensions must be clearly identified, in all interfaces, as
252ede0a25eSPatrick Williams  vendor-specific and not be tied to the OpenBMC project.
253ede0a25eSPatrick Williams
254ede0a25eSPatrick Williams- APIs to implement error and event reporting should have good ergonomics. These
255ede0a25eSPatrick Williams  APIs must provide compile-time identification, for applicable programming
256ede0a25eSPatrick Williams  languages, of call sites which do not conform to the BMC error and event
257ede0a25eSPatrick Williams  specifications.
258ede0a25eSPatrick Williams
259ede0a25eSPatrick Williams  - The generated error classes and APIs should not require exceptions but
260ede0a25eSPatrick Williams    should also integrate with the `sdbusplus` client and server bindings, which
261ede0a25eSPatrick Williams    do leverage exceptions.
262ede0a25eSPatrick Williams
263ede0a25eSPatrick Williams[error-discussion]:
264ede0a25eSPatrick Williams  https://discord.com/channels/775381525260664832/855566794994221117/867794201897992213
265ede0a25eSPatrick Williams
266ede0a25eSPatrick Williams## Proposed Design
267ede0a25eSPatrick Williams
268ede0a25eSPatrick WilliamsThe proposed design has a few high-level design elements:
269ede0a25eSPatrick Williams
270ede0a25eSPatrick Williams- Consolidate the `sdbusplus` and `phosphor-logging` implementation of error
271ede0a25eSPatrick Williams  reporting; expand it to cover tracing events; improve the ergonomics of the
272ede0a25eSPatrick Williams  associated APIs and add compile-time checking of missing metadata.
273ede0a25eSPatrick Williams
274ede0a25eSPatrick Williams- Add APIs to `phosphor-logging` to enable daemons to easily look up their own
275ede0a25eSPatrick Williams  previously reported events (for marking as resolved).
276ede0a25eSPatrick Williams
277ede0a25eSPatrick Williams- Add to `phosphor-logging` a compile-time mechanism to disable recording of
278ede0a25eSPatrick Williams  specific tracing events for vendor-level customization.
279ede0a25eSPatrick Williams
280ede0a25eSPatrick Williams- Generate a Redfish Message Registry for all error and events defined in
281ede0a25eSPatrick Williams  `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance
282ede0a25eSPatrick Williams  `bmcweb` implementation of the `Logging.Entry` to `LogEvent` transformation to
283ede0a25eSPatrick Williams  cover the Redfish Message Registry and `phosphor-logging` enhancements;
284ede0a25eSPatrick Williams  Leverage the Redfish `LogEntry.DiagnosticData` field to provide a
285ede0a25eSPatrick Williams  Base64-encoded JSON representation of the entire `Logging.Entry` for
286ede0a25eSPatrick Williams  additional diagnostics [[does this need to be optional?]]. Add support to the
287ede0a25eSPatrick Williams  `bmcweb` EventService implementation to support `phosphor-logging`-hosted
288ede0a25eSPatrick Williams  events.
289ede0a25eSPatrick Williams
290ede0a25eSPatrick Williams### `sdbusplus`
291ede0a25eSPatrick Williams
292ede0a25eSPatrick WilliamsThe `Foo.errors.yaml` content will be combined with the content formerly in the
293ede0a25eSPatrick Williams`Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new
294ede0a25eSPatrick Williamsfile type `Foo.events.yaml`. This `Foo.events.yaml` format will cover both the
295ede0a25eSPatrick Williamscurrent `error` and `metadata` information as well as augment with additional
296ede0a25eSPatrick Williamsinformation necessary to generate external facing datasets, such as Redfish
297ede0a25eSPatrick WilliamsMessage Registries. The current `Foo.errors.yaml` and `Foo.metadata.yaml` files
298ede0a25eSPatrick Williamswill be deprecated as their usage is replaced by the new format.
299ede0a25eSPatrick Williams
300ede0a25eSPatrick WilliamsThe `sdbusplus` library will be enhanced to provide the following:
301ede0a25eSPatrick Williams
302ede0a25eSPatrick Williams- JSON serialization and de-serialization of generated exception types with
303ede0a25eSPatrick Williams  their assigned metadata; assignment of the JSON serialization to the `message`
304ede0a25eSPatrick Williams  field of `sd_bus_error_set` calls when errors are returned from DBus server
305ede0a25eSPatrick Williams  calls.
306ede0a25eSPatrick Williams
307ede0a25eSPatrick Williams- A facility to register exception types, at library load time, with the
308ede0a25eSPatrick Williams  `sdbusplus` library for automatic conversion back to C++ exception types in
309ede0a25eSPatrick Williams  DBus clients.
310ede0a25eSPatrick Williams
311ede0a25eSPatrick WilliamsThe binding generator(s) will be expanded to do the following:
312ede0a25eSPatrick Williams
313ede0a25eSPatrick Williams- Generate complete C++ exception types, with compile-time checking of missing
314ede0a25eSPatrick Williams  metadata and JSON serialization, for errors and events. Metadata can be of one
315ede0a25eSPatrick Williams  of the following types:
316ede0a25eSPatrick Williams
317ede0a25eSPatrick Williams  - size-type and signed integer
318ede0a25eSPatrick Williams  - floating-point number
319ede0a25eSPatrick Williams  - string
320ede0a25eSPatrick Williams  - DBus object path
321ede0a25eSPatrick Williams
322ede0a25eSPatrick Williams- Generate a format that `bmcweb` can use to create and populate a Redfish
323ede0a25eSPatrick Williams  Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry`
324ede0a25eSPatrick Williams  for a set of errors and events
325ede0a25eSPatrick Williams
326ede0a25eSPatrick WilliamsFor general users of `sdbusplus` these changes should have no impact, except for
327ede0a25eSPatrick Williamsthe availability of new generated exception types and that specialized instances
328ede0a25eSPatrick Williamsof `sdbusplus::exception::generated_exception` will become available in DBus
329ede0a25eSPatrick Williamsclients.
330ede0a25eSPatrick Williams
331ede0a25eSPatrick Williams### `phosphor-dbus-interfaces`
332ede0a25eSPatrick Williams
333ede0a25eSPatrick WilliamsRefactoring will be done to migrate existing `Foo.metadata.yaml` and
334ede0a25eSPatrick Williams`Foo.errors.yaml` content to the `Foo.events.yaml` as migration is done by
335ede0a25eSPatrick Williamsapplications. Minor changes will take place to utilize the new binding
336ede0a25eSPatrick Williamsgenerators from `sdbusplus`. A small library enhancement will be done to
337ede0a25eSPatrick Williamsregister all generated exception types with `sdbusplus`. Future contributors
338ede0a25eSPatrick Williamswill be able to contribute new error and tracing event definitions.
339ede0a25eSPatrick Williams
340ede0a25eSPatrick Williams### `phosphor-logging`
341ede0a25eSPatrick Williams
342ede0a25eSPatrick Williams> TODO: Should a tracing event be a `Logging.Entry` with severity of
343ede0a25eSPatrick Williams> `Informational` or should they be a new type, such as `Logging.Event` and
344ede0a25eSPatrick Williams> managed separately. The `phosphor-logging` default `meson.options` have
345ede0a25eSPatrick Williams> `error_cap=200` and `error_info_cap=10`. If we increase the total number of
346ede0a25eSPatrick Williams> events allowed to 10K, the majority of them are likely going to be information
347ede0a25eSPatrick Williams> / tracing events.
348ede0a25eSPatrick Williams
349ede0a25eSPatrick WilliamsThe `Logging.Entry` interface's `AdditionalData` property should change to
350ede0a25eSPatrick Williams`dict[string, variant[string,int64_t,size_t,object_path]]`.
351ede0a25eSPatrick Williams
352ede0a25eSPatrick WilliamsThe `Logging.Create` interface will have a new method added:
353ede0a25eSPatrick Williams
354ede0a25eSPatrick Williams```yaml
355ede0a25eSPatrick Williams- name: CreateEntry
356ede0a25eSPatrick Williams  parameters:
357ede0a25eSPatrick Williams    - name: Message
358ede0a25eSPatrick Williams      type: string
359ede0a25eSPatrick Williams    - name: Severity
360ede0a25eSPatrick Williams      type: enum[Logging.Entry.Level]
361ede0a25eSPatrick Williams    - name: AdditionalData
362ede0a25eSPatrick Williams      type: dict[string, variant[string,int64_t,size_t,object_path]]
363ede0a25eSPatrick Williams    - name: Hint
364ede0a25eSPatrick Williams      type: string
365ede0a25eSPatrick Williams      default: ""
366ede0a25eSPatrick Williams  returns:
367ede0a25eSPatrick Williams    - name: Entry
368ede0a25eSPatrick Williams      type: object_path
369ede0a25eSPatrick Williams```
370ede0a25eSPatrick Williams
371ede0a25eSPatrick WilliamsThe `Hint` parameter is used for daemons to be able to query for their
372ede0a25eSPatrick Williamspreviously recorded error, for marking as resolved. These strings need to be
373ede0a25eSPatrick Williamsglobally unique and are suggested to be of the format `"<service_name>:<key>"`.
374ede0a25eSPatrick Williams
375ede0a25eSPatrick WilliamsA `Logging.SearchHint` interface will be created, which will be recorded at the
376ede0a25eSPatrick Williamssame object path as a `Logging.Entry` when the `Hint` parameter was not an empty
377ede0a25eSPatrick Williamsstring:
378ede0a25eSPatrick Williams
379ede0a25eSPatrick Williams```yaml
380ede0a25eSPatrick Williams- property: Hint
381ede0a25eSPatrick Williams  type: string
382ede0a25eSPatrick Williams```
383ede0a25eSPatrick Williams
384ede0a25eSPatrick WilliamsThe `Logging.Manager` interface will be added with a single method:
385ede0a25eSPatrick Williams
386ede0a25eSPatrick Williams```yaml
387ede0a25eSPatrick Williams- name: FindEntry
388ede0a25eSPatrick Williams  parameters:
389ede0a25eSPatrick Williams    - name: Hint
390ede0a25eSPatrick Williams      type: String
391ede0a25eSPatrick Williams  returns:
392ede0a25eSPatrick Williams    - name: Entry
393ede0a25eSPatrick Williams      type: object_path
394ede0a25eSPatrick Williams  errors:
395ede0a25eSPatrick Williams    - xyz.openbmc_project.Common.ResourceNotFound
396ede0a25eSPatrick Williams```
397ede0a25eSPatrick Williams
398ede0a25eSPatrick WilliamsA `lg2::commit` API will be added to support the new `sdbusplus` generated
399ede0a25eSPatrick Williamsexception types, calling the new `Logging.Create.CreateEntry` method proposed
400ede0a25eSPatrick Williamsearlier. This new API will support `sdbusplus::bus_t` for synchronous DBus
401ede0a25eSPatrick Williamsoperations and both `sdbusplus::async::context_t` and
402ede0a25eSPatrick Williams`sdbusplus::asio::connection` for asynchronous DBus operations.
403ede0a25eSPatrick Williams
404ede0a25eSPatrick WilliamsThere are outstanding performance concerns with the `phosphor-logging`
405ede0a25eSPatrick Williamsimplementation that may impact the ability for scaling to 10,000 event records.
406ede0a25eSPatrick WilliamsThis issue is expected to be self-contained within `phosphor-logging`, except
407ede0a25eSPatrick Williamsfor potential future changes to the log-retrieval interfaces used by `bmcweb`.
408ede0a25eSPatrick WilliamsIn order to decouple the transition to this design, by callers of the logging
409ede0a25eSPatrick WilliamsAPIs, from the experimentation and improvements in `phosphor-logging`, we will
410ede0a25eSPatrick Williamsadd a compile option and Yocto `DISTRO_FEATURE` that can turn `lg2::commit`
411ede0a25eSPatrick Williamsbehavior into an `OPENBMC_MESSAGE_ID` record in the journal, along the same
412ede0a25eSPatrick Williamsapproach as the previous `REDFISH_MESSAGE_ID`, and corresponding `rsyslog`
413ede0a25eSPatrick Williamsconfiguration and `bmcweb` support to use these directly. This will allow
414ede0a25eSPatrick Williamssystems which knowingly scale to a large number of event records, using
415ede0a25eSPatrick Williams`rsyslog` mechanics, the same level of performance. One caveat of this support
416ede0a25eSPatrick Williamsis that the hint and resolution behavior will not exist when that option is
417ede0a25eSPatrick Williamsenabled.
418ede0a25eSPatrick Williams
419ede0a25eSPatrick Williams### `bmcweb`
420ede0a25eSPatrick Williams
421ede0a25eSPatrick Williams`bmcweb` already has support for build-time conversion from a Redfish Message
422ede0a25eSPatrick WilliamsRegistry, codified in JSON, to header files it uses to serve the registry; this
423ede0a25eSPatrick Williamswill be expanded to support Redfish Message Registries generated by `sdbusplus`.
424ede0a25eSPatrick Williams`bmcweb` will add a Meson option for additional message registries, provided
425ede0a25eSPatrick Williamsfrom bitbake from `phosphor-dbus-interfaces` and vendor-specific event
426ede0a25eSPatrick Williamsdefinitions as a path to a directory of Message Registry JSONs. Support will
427ede0a25eSPatrick Williamsalso be added for adding `phosphor-dbus-interfaces` as a Meson subproject for
428ede0a25eSPatrick Williamsstand-alone testing.
429ede0a25eSPatrick Williams
430ede0a25eSPatrick WilliamsIt is desirable for `sdbusplus` to generate a Redfish Message Registry directly,
431ede0a25eSPatrick Williamsleveraging the existing scripts for integration with `bmcweb`. As part of this
432ede0a25eSPatrick Williamswe would like to support mapping a `Logging.Entry` event to an existing
433ede0a25eSPatrick Williamsstandardized Redfish event (such as those in the Base registry). The generated
434ede0a25eSPatrick Williamsinformation must contain the `Logging.Entry::Message` identifier, the
435ede0a25eSPatrick Williams`AdditionalData` to `MessageArgs` mapping, and the translation from the
436ede0a25eSPatrick Williams`Message` identifier to the Redfish Message ID (when the Message ID is not from
437ede0a25eSPatrick Williams"this" registry). In order to facilitate this, we will need to add OEM fields to
438ede0a25eSPatrick Williamsthe Redfish Message Registry JSON, which are only used by the `bmcweb`
439ede0a25eSPatrick Williamsprocessing scripts, to generate the information necessary for this additional
440ede0a25eSPatrick Williamsmapping.
441ede0a25eSPatrick Williams
442ede0a25eSPatrick WilliamsThe `xyz.openbmc_project.Logging.Entry` to `LogEvent` conversion needs to be
443ede0a25eSPatrick Williamsenhanced, to utilize these Message Registries, in four ways:
444ede0a25eSPatrick Williams
445ede0a25eSPatrick Williams1. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned
446ede0a25eSPatrick Williams   to the `DiagnosticData` property.
447ede0a25eSPatrick Williams
448ede0a25eSPatrick Williams2. If the `Logging.Entry::Message` contains an identifier corresponding to a
449ede0a25eSPatrick Williams   Registry entry, the `MessageId` property will be set to the corresponding
450ede0a25eSPatrick Williams   Redfish Message ID. Otherwise, the `Logging.Entry::Message` will be used
451ede0a25eSPatrick Williams   directly with no further transformation (as is done today).
452ede0a25eSPatrick Williams
453ede0a25eSPatrick Williams3. If the `Logging.Entry::Message` contains an identifier corresponding to a
454ede0a25eSPatrick Williams   Registry entry, the `MessageArgs` property will be filled in by obtaining the
455ede0a25eSPatrick Williams   corresponding values from the `AdditionalData` dictionary and the `Message`
456ede0a25eSPatrick Williams   field will be generated from combining these values with the `Message` string
457ede0a25eSPatrick Williams   from the Registry.
458ede0a25eSPatrick Williams
459ede0a25eSPatrick Williams4. A mechanism should be implemented to translate DBus `object_path` references
460ede0a25eSPatrick Williams   to Redfish Resource URIs. When an `object_path` cannot be translated,
461ede0a25eSPatrick Williams   `bmcweb` will use a prefix such as `object_path:` in the `MessageArgs` value.
462ede0a25eSPatrick Williams
463ede0a25eSPatrick WilliamsThe implementation of `EventService` should be enhanced to support
464ede0a25eSPatrick Williams`phosphor-logging` hosted events. The implementation of `LogService` should be
465ede0a25eSPatrick Williamsenhanced to support log paging for `phosphor-logging` hosted events.
466ede0a25eSPatrick Williams
467ede0a25eSPatrick Williams### `phosphor-sel-logger`
468ede0a25eSPatrick Williams
469ede0a25eSPatrick WilliamsThe `phosphor-sel-logger` has a meson option `send-to-logger` which toggles
470ede0a25eSPatrick Williamsbetween using `phosphor-logging` or the [`REDFISH_MESSAGE_ID`
471ede0a25eSPatrick Williamsmechanism][existing-design]. The `phosphor-logging`-utilizing paths will be
472ede0a25eSPatrick Williamsupdated to utilize `phosphor-dbus-interfaces` specified errors and events.
473ede0a25eSPatrick Williams
474ede0a25eSPatrick Williams### YAML format
475ede0a25eSPatrick Williams
476ede0a25eSPatrick WilliamsConsider an example file in `phosphor-dbus-interfaces` as
477ede0a25eSPatrick Williams`yaml/xyz/openbmc_project/Software/Update.events.yaml` with hypothetical errors
478ede0a25eSPatrick Williamsand events:
479ede0a25eSPatrick Williams
480ede0a25eSPatrick Williams```yaml
481ede0a25eSPatrick Williamsversion: 1.3.1
482ede0a25eSPatrick Williams
483ede0a25eSPatrick Williamserrors:
484ede0a25eSPatrick Williams  - name: UpdateFailure
485ede0a25eSPatrick Williams    severity: critical
486ede0a25eSPatrick Williams    metadata:
487ede0a25eSPatrick Williams      - name: TARGET
488ede0a25eSPatrick Williams        type: string
489ede0a25eSPatrick Williams        primary: true
490ede0a25eSPatrick Williams      - name: ERRNO
491ede0a25eSPatrick Williams        type: int64
492ede0a25eSPatrick Williams      - name: CALLOUT_HARDWARE
493ede0a25eSPatrick Williams        type: object_path
494ede0a25eSPatrick Williams        primary: true
495ede0a25eSPatrick Williams    en:
496ede0a25eSPatrick Williams      description: While updating the firmware on a device, the update failed.
497ede0a25eSPatrick Williams      message: A failure occurred updating {TARGET} on {CALLOUT_HARDWARE}.
498ede0a25eSPatrick Williams      resolution: Retry update.
499ede0a25eSPatrick Williams
500ede0a25eSPatrick Williams  - name: BMCUpdateFailure
501ede0a25eSPatrick Williams    severity: critical
502ede0a25eSPatrick Williams    deprecated: 1.0.0
503ede0a25eSPatrick Williams    en:
504ede0a25eSPatrick Williams      description: Failed to update the BMC
505ede0a25eSPatrick Williams    redfish-mapping: OpenBMC.FirmwareUpdateFailed
506ede0a25eSPatrick Williams
507ede0a25eSPatrick Williamsevents:
508ede0a25eSPatrick Williams  - name: UpdateProgress
509ede0a25eSPatrick Williams    metadata:
510ede0a25eSPatrick Williams      - name: TARGET
511ede0a25eSPatrick Williams        type: string
512ede0a25eSPatrick Williams        primary: true
513ede0a25eSPatrick Williams      - name: COMPLETION
514ede0a25eSPatrick Williams        type: double
515ede0a25eSPatrick Williams        primary: true
516ede0a25eSPatrick Williams    en:
517ede0a25eSPatrick Williams      description: An update is in progress and has reached a checkpoint.
518ede0a25eSPatrick Williams      message: Updating of {TARGET} is {COMPLETION}% complete.
519ede0a25eSPatrick Williams```
520ede0a25eSPatrick Williams
521ede0a25eSPatrick WilliamsEach `foo.events.yaml` file would be used to generate both the C++ classes (via
522ede0a25eSPatrick Williams`sdbusplus`) for exception handling and event reporting, as well as a versioned
523*9a0248b5SPatrick WilliamsRedfish Message Registry for the errors and events. The [YAML
524*9a0248b5SPatrick Williamsschema][yaml-schema] is contained in the sdbusplus repository.
525ede0a25eSPatrick Williams
526ede0a25eSPatrick WilliamsThe above example YAML would generate C++ classes similar to:
527ede0a25eSPatrick Williams
528ede0a25eSPatrick Williams```cpp
529ede0a25eSPatrick Williamsnamespace sdbusplus::errors::xyz::openbmc_project::software::update
530ede0a25eSPatrick Williams{
531ede0a25eSPatrick Williams
532ede0a25eSPatrick Williamsclass UpdateFailure
533ede0a25eSPatrick Williams{
534ede0a25eSPatrick Williams
535ede0a25eSPatrick Williams    template <typename... Args>
536ede0a25eSPatrick Williams    UpdateFailure(Args&&... args);
537ede0a25eSPatrick Williams};
538ede0a25eSPatrick Williams
539ede0a25eSPatrick Williams}
540ede0a25eSPatrick Williams
541ede0a25eSPatrick Williamsnamespace sdbusplus::events::xyz::openbmc_project::software::update
542ede0a25eSPatrick Williams{
543ede0a25eSPatrick Williams
544ede0a25eSPatrick Williamsclass UpdateProgress
545ede0a25eSPatrick Williams{
546ede0a25eSPatrick Williams    template <typename... Args>
547ede0a25eSPatrick Williams    UpdateProgress(Args&&... args);
548ede0a25eSPatrick Williams};
549ede0a25eSPatrick Williams
550ede0a25eSPatrick Williams}
551ede0a25eSPatrick Williams```
552ede0a25eSPatrick Williams
553ede0a25eSPatrick WilliamsThe constructors here are variadic templates because the generated constructor
554ede0a25eSPatrick Williamsimplementation will provide compile-time assurance that all of the metadata
555ede0a25eSPatrick Williamsfields have been populated (in any order). To raise an `UpdateFailure` a
556ede0a25eSPatrick Williamsdevelopers might do something like:
557ede0a25eSPatrick Williams
558ede0a25eSPatrick Williams```cpp
559ede0a25eSPatrick Williams// Immediately report the event:
560ede0a25eSPatrick Williamslg2::commit(UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path));
561ede0a25eSPatrick Williams// or send it in a dbus response (when using sdbusplus generated binding):
562ede0a25eSPatrick Williamsthrow UpdateFailure("TARGET", "BMC Flash A", "ERRNO", rc, "CALLOUT_HARDWARE", bmc_object_path);
563ede0a25eSPatrick Williams```
564ede0a25eSPatrick Williams
565ede0a25eSPatrick WilliamsIf one of the fields, such as `ERRNO` were omitted, a compile failure will be
566ede0a25eSPatrick Williamsraised indicating the first missing field.
567ede0a25eSPatrick Williams
568*9a0248b5SPatrick Williams[yaml-schema]:
569*9a0248b5SPatrick Williams  https://github.com/openbmc/sdbusplus/blob/master/tools/sdbusplus/schemas/events.schema.yaml
570*9a0248b5SPatrick Williams
571ede0a25eSPatrick Williams### Versioning Policy
572ede0a25eSPatrick Williams
573ede0a25eSPatrick WilliamsAssume the version follows semantic versioning `MAJOR.MINOR.PATCH` convention.
574ede0a25eSPatrick Williams
575ede0a25eSPatrick Williams- Adjusting a description or message should result in a `PATCH` increment.
576ede0a25eSPatrick Williams- Adding a new error or event, or adding metadata to an existing error or event,
577ede0a25eSPatrick Williams  should result in a `MINOR` increment.
578ede0a25eSPatrick Williams- Deprecating an error or event should result in a `MAJOR` increment.
579ede0a25eSPatrick Williams
580ede0a25eSPatrick WilliamsThere is [guidance on maintenance][registry-guidance] of the OpenBMC Message
581ede0a25eSPatrick WilliamsRegistry. We will incorporate that guidance into the equivalent
582ede0a25eSPatrick Williams`phosphor-dbus-interfaces` policy.
583ede0a25eSPatrick Williams
584ede0a25eSPatrick Williams[registry-guidance]:
585ede0a25eSPatrick Williams  https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_registry.readmefirst.md
586ede0a25eSPatrick Williams
587ede0a25eSPatrick Williams### Generated Redfish Message Registry
588ede0a25eSPatrick Williams
589ede0a25eSPatrick Williams[DSP0266][dsp0266], the Redfish specification, gives requirements for Redfish
590ede0a25eSPatrick WilliamsMessage Registries and dictates guidelines for identifiers.
591ede0a25eSPatrick Williams
592ede0a25eSPatrick WilliamsThe hypothetical events defined above would create a message registry similar
593ede0a25eSPatrick Williamsto:
594ede0a25eSPatrick Williams
595ede0a25eSPatrick Williams```json
596ede0a25eSPatrick Williams{
597ede0a25eSPatrick Williams  "Id": "OpenBMC_Base_Xyz_OpenbmcProject_Software_Update.1.3.1",
598ede0a25eSPatrick Williams  "Language": "en",
599ede0a25eSPatrick Williams  "Messages": {
600ede0a25eSPatrick Williams    "UpdateFailure": {
601ede0a25eSPatrick Williams      "Description": "While updating the firmware on a device, the update failed.",
602ede0a25eSPatrick Williams      "Message": "A failure occurred updating %1 on %2.",
603ede0a25eSPatrick Williams      "Resolution": "Retry update."
604ede0a25eSPatrick Williams      "NumberOfArgs": 2,
605ede0a25eSPatrick Williams      "ParamTypes": ["string", "string"],
606ede0a25eSPatrick Williams      "Severity": "Critical",
607ede0a25eSPatrick Williams    },
608ede0a25eSPatrick Williams    "UpdateProgress" : {
609ede0a25eSPatrick Williams      "Description": "An update is in progress and has reached a checkpoint."
610ede0a25eSPatrick Williams      "Message": "Updating of %1 is %2\% complete.",
611ede0a25eSPatrick Williams      "Resolution": "None",
612ede0a25eSPatrick Williams      "NumberOfArgs": 2,
613ede0a25eSPatrick Williams      "ParamTypes": ["string", "number"],
614ede0a25eSPatrick Williams      "Severity": "OK",
615ede0a25eSPatrick Williams    }
616ede0a25eSPatrick Williams  }
617ede0a25eSPatrick Williams}
618ede0a25eSPatrick Williams```
619ede0a25eSPatrick Williams
620ede0a25eSPatrick WilliamsThe prefix `OpenBMC_Base` shall be exclusively reserved for use by events from
621ede0a25eSPatrick Williams`phosphor-logging`. Events defined in other repositories will be expected to use
622ede0a25eSPatrick Williamssome other prefix. Vendor-defined repositories should use a vendor-owned prefix
623ede0a25eSPatrick Williamsas directed by [DSP0266][dsp0266].
624ede0a25eSPatrick Williams
625ede0a25eSPatrick Williams[dsp0266]:
626ede0a25eSPatrick Williams  https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.20.0.pdf
627ede0a25eSPatrick Williams
628ede0a25eSPatrick Williams### Vendor implications
629ede0a25eSPatrick Williams
630ede0a25eSPatrick WilliamsAs specified above, vendors must use their own identifiers in order to conform
631ede0a25eSPatrick Williamswith the Redfish specification (see [DSP0266][dsp0266] for requirements on
632ede0a25eSPatrick Williamsidentifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`)
633ede0a25eSPatrick Williamsimplementation(s) will enable vendors to create their own events for downstream
634ede0a25eSPatrick Williamscode and Registries for integration with Redfish, by creating downstream
635ede0a25eSPatrick Williamsrepositories of error definitions. Vendors are responsible for ensuring their
636ede0a25eSPatrick Williamsown versioning and identifiers conform to the expectations in the [Redfish
637ede0a25eSPatrick Williamsspecification][dsp0266].
638ede0a25eSPatrick Williams
639ede0a25eSPatrick WilliamsOne potential bad behavior on the part of vendors would be forking and modifying
640ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events. Vendors must not add their own events
641ede0a25eSPatrick Williamsto `phosphor-dbus-interfaces` in downstream implementations because it would
642ede0a25eSPatrick Williamslead to their implementation advertising support for a message in an
643ede0a25eSPatrick WilliamsOpenBMC-owned Registry which is not the case, but they should add them to their
644ede0a25eSPatrick Williamsown repositories with a separate identifier. Similarly, if a vendor were to
645ede0a25eSPatrick Williams_backport_ upstream changes into their fork, they would need to ensure that the
646ede0a25eSPatrick Williams`foo.events.yaml` file for that version matches identically with the upstream
647ede0a25eSPatrick Williamsimplementation.
648ede0a25eSPatrick Williams
649ede0a25eSPatrick Williams## Alternatives Considered
650ede0a25eSPatrick Williams
651ede0a25eSPatrick WilliamsMany alternatives have been explored and referenced through earlier work. Within
652ede0a25eSPatrick Williamsthis proposal there are many minor-alternatives that have been assessed.
653ede0a25eSPatrick Williams
654ede0a25eSPatrick Williams### Exception inheritance
655ede0a25eSPatrick Williams
656ede0a25eSPatrick WilliamsThe original `phosphor-logging` error descriptions allowed inheritance between
657ede0a25eSPatrick Williamstwo errors. This is not supported by the proposal for two reasons:
658ede0a25eSPatrick Williams
659ede0a25eSPatrick Williams- This introduces complexity in the Redfish Message Registry versioning because
660ede0a25eSPatrick Williams  a change in one file should induce version changes in all dependent files.
661ede0a25eSPatrick Williams
662ede0a25eSPatrick Williams- It makes it difficult for a developer to clearly identify all of the fields
663ede0a25eSPatrick Williams  they are expected to populate without traversing multiple files.
664ede0a25eSPatrick Williams
665ede0a25eSPatrick Williams### sdbusplus Exception APIs
666ede0a25eSPatrick Williams
667ede0a25eSPatrick WilliamsThere are a few possible syntaxes I came up with for constructing the generated
668ede0a25eSPatrick Williamsexception types. It is important that these have good ergonomics, are easy to
669ede0a25eSPatrick Williamsunderstand, and can provide compile-time awareness of missing metadata fields.
670ede0a25eSPatrick Williams
671ede0a25eSPatrick Williams```cpp
672ede0a25eSPatrick Williams    using Example = sdbusplus::error::xyz::openbmc_project::Example;
673ede0a25eSPatrick Williams
674ede0a25eSPatrick Williams    // 1)
675ede0a25eSPatrick Williams    throw Example().fru("Motherboard").value(42);
676ede0a25eSPatrick Williams
677ede0a25eSPatrick Williams    // 2)
678ede0a25eSPatrick Williams    throw Example(Example::fru_{}, "Motherboard", Example::value_{}, 42);
679ede0a25eSPatrick Williams
680ede0a25eSPatrick Williams    // 3)
681ede0a25eSPatrick Williams    throw Example("FRU", "Motherboard", "VALUE", 42);
682ede0a25eSPatrick Williams
683ede0a25eSPatrick Williams    // 4)
684ede0a25eSPatrick Williams    throw Example([](auto e) { return e.fru("Motherboard").value(42); });
685ede0a25eSPatrick Williams
686ede0a25eSPatrick Williams    // 5)
687ede0a25eSPatrick Williams    throw Example({.fru = "Motherboard", .value = 42});
688ede0a25eSPatrick Williams```
689ede0a25eSPatrick Williams
690ede0a25eSPatrick Williams**Note**: These examples are all show using `throw` syntax, but could also be
691ede0a25eSPatrick Williamssaved in local variables, returned from functions, or immediately passed to
692ede0a25eSPatrick Williams`lg2::commit`.
693ede0a25eSPatrick Williams
694ede0a25eSPatrick Williams1. This would be my preference for ergonomics and clarity, as it would allow
695ede0a25eSPatrick Williams   LSP-enabled editors to give completions for the metadata fields but
696ede0a25eSPatrick Williams   unfortunately there is no mechanism in C++ to define a type which can be
697ede0a25eSPatrick Williams   constructed but not thrown, which means we cannot get compile-time checking
698ede0a25eSPatrick Williams   of all metadata fields.
699ede0a25eSPatrick Williams
700ede0a25eSPatrick Williams2. This syntax uses tag-dispatch to enables compile-time checking of all
701ede0a25eSPatrick Williams   metadata fields and potential LSP-completion of the tag-types, but is more
702ede0a25eSPatrick Williams   verbose than option 3.
703ede0a25eSPatrick Williams
704ede0a25eSPatrick Williams3. This syntax is less verbose than (2) and follows conventions already used in
705ede0a25eSPatrick Williams   `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the
706ede0a25eSPatrick Williams   metadata tags.
707ede0a25eSPatrick Williams
708ede0a25eSPatrick Williams4. This syntax is similar to option (1) but uses an indirection of a lambda to
709ede0a25eSPatrick Williams   enable compile-time checking that all metadata fields have been populated by
710ede0a25eSPatrick Williams   the lambda. The LSP-completion is likely not as strong as option (1), due to
711ede0a25eSPatrick Williams   the use of `auto`, and the lambda necessity will likely be a hang-up for
712ede0a25eSPatrick Williams   unfamiliar developers.
713ede0a25eSPatrick Williams
714ede0a25eSPatrick Williams5. This syntax has similar characteristics as option (1) but similarly does not
715ede0a25eSPatrick Williams   provide compile-time confirmation that all fields have been populated.
716ede0a25eSPatrick Williams
717ede0a25eSPatrick WilliamsThe proposal therefore suggests option (3) is most suitable.
718ede0a25eSPatrick Williams
719ede0a25eSPatrick Williams### Redfish Translation Support
720ede0a25eSPatrick Williams
721ede0a25eSPatrick WilliamsThe proposed YAML format allows future addition of translation but it is not
722ede0a25eSPatrick Williamsenabled at this time. Future development could enable the Redfish Message
723ede0a25eSPatrick WilliamsRegistry to be generated in multiple languages if the `message:language` exists
724ede0a25eSPatrick Williamsfor those languages.
725ede0a25eSPatrick Williams
726ede0a25eSPatrick Williams### Redfish Registry Versioning
727ede0a25eSPatrick Williams
728ede0a25eSPatrick WilliamsThe Redfish Message Registries are required to be versioned and has 3 digit
729ede0a25eSPatrick Williamsfields (ie. `XX.YY.ZZ`), but only the first 2 are suppose to be used in the
730ede0a25eSPatrick WilliamsMessage ID. Rather than using the manually specified version we could take a few
731ede0a25eSPatrick Williamsother approaches:
732ede0a25eSPatrick Williams
733ede0a25eSPatrick Williams- Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the
734ede0a25eSPatrick Williams  registry was built.
735ede0a25eSPatrick Williams
736ede0a25eSPatrick Williams  - This does not cover vendors that may choose to branch for stabilization
737ede0a25eSPatrick Williams    purposes, so we can end up with two machines having the same
738ede0a25eSPatrick Williams    OpenBMC-versioned message registry with different content.
739ede0a25eSPatrick Williams
740ede0a25eSPatrick Williams- Use the most recent `openbmc/openbmc` tag as the version.
741ede0a25eSPatrick Williams
742ede0a25eSPatrick Williams  - This does not cover vendors that build off HEAD and may deploy multiple
743ede0a25eSPatrick Williams    images between two OpenBMC releases.
744ede0a25eSPatrick Williams
745ede0a25eSPatrick Williams- Generate the version based on the git-history.
746ede0a25eSPatrick Williams
747ede0a25eSPatrick Williams  - This requires `phosphor-dbus-interfaces` to be built from a git repository,
748ede0a25eSPatrick Williams    which may not always be true for Yocto source mirrors, and requires
749ede0a25eSPatrick Williams    non-trivial processing that continues to scale over time.
750ede0a25eSPatrick Williams
751ede0a25eSPatrick Williams### Existing OpenBMC Redfish Registry
752ede0a25eSPatrick Williams
753ede0a25eSPatrick WilliamsThere are currently 191 messages defined in the existing Redfish Message
754ede0a25eSPatrick WilliamsRegistry at version `OpenBMC.0.4.0`. Of those, not a single one in the codebase
755ede0a25eSPatrick Williamsis emitted with the correct version. 96 of those are only emitted by
756ede0a25eSPatrick WilliamsIntel-specific code that is not pulled into any upstreamed machine, 39 are
757ede0a25eSPatrick Williamsemitted by potentially common code, and 56 are not even referenced in the
758ede0a25eSPatrick Williamscodebase outside of the bmcweb registry. Of the 39 common messages half of them
759ede0a25eSPatrick Williamshave an equivalent in one of the standard registries that should be leveraged
760ede0a25eSPatrick Williamsand many of the others do not have attributes that would facilitate a multi-host
761ede0a25eSPatrick Williamsconfiguration, so the registry at a minimum needs to be updated. None of the
762ede0a25eSPatrick Williamscurrent implementation has the capability to handle Redfish Resource URIs.
763ede0a25eSPatrick Williams
764ede0a25eSPatrick WilliamsThe proposal therefore is to deprecate the existing registry and replace it with
765ede0a25eSPatrick Williamsthe new generated registries. For repositories that currently emit events in the
766ede0a25eSPatrick Williamsexisting format, we can maintain those call-sites for a time period of 1-2
767ede0a25eSPatrick Williamsyears.
768ede0a25eSPatrick Williams
769ede0a25eSPatrick WilliamsIf this aspect of the proposal is rejected, the YAML format allows mapping from
770ede0a25eSPatrick Williams`phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0`
771ede0a25eSPatrick Williamsregistry `MessageIds`.
772ede0a25eSPatrick Williams
773ede0a25eSPatrick WilliamsPotentially common:
774ede0a25eSPatrick Williams
775ede0a25eSPatrick Williams- phosphor-post-code-manager
776ede0a25eSPatrick Williams  - BIOSPOSTCode (unique)
777ede0a25eSPatrick Williams- dbus-sensors
778ede0a25eSPatrick Williams  - ChassisIntrusionDetected (unique)
779ede0a25eSPatrick Williams  - ChassisIntrusionReset (unique)
780ede0a25eSPatrick Williams  - FanInserted
781ede0a25eSPatrick Williams  - FanRedundancyLost (unique)
782ede0a25eSPatrick Williams  - FanRedudancyRegained (unique)
783ede0a25eSPatrick Williams  - FanRemoved
784ede0a25eSPatrick Williams  - LanLost
785ede0a25eSPatrick Williams  - LanRegained
786ede0a25eSPatrick Williams  - PowerSupplyConfigurationError (unique)
787ede0a25eSPatrick Williams  - PowerSupplyConfigurationErrorRecovered (unique)
788ede0a25eSPatrick Williams  - PowerSupplyFailed
789ede0a25eSPatrick Williams  - PowerSupplyFailurePredicted (unique)
790ede0a25eSPatrick Williams  - PowerSupplyFanFailed
791ede0a25eSPatrick Williams  - PowerSupplyFanRecovered
792ede0a25eSPatrick Williams  - PowerSupplyPowerLost
793ede0a25eSPatrick Williams  - PowerSupplyPowerRestored
794ede0a25eSPatrick Williams  - PowerSupplyPredictiedFailureRecovered (unique)
795ede0a25eSPatrick Williams  - PowerSupplyRecovered
796ede0a25eSPatrick Williams- phosphor-sel-logger
797ede0a25eSPatrick Williams  - IPMIWatchdog (unique)
798ede0a25eSPatrick Williams  - `SensorThreshold*` : 8 different events
799ede0a25eSPatrick Williams- phosphor-net-ipmid
800ede0a25eSPatrick Williams  - InvalidLoginAttempted (unique)
801ede0a25eSPatrick Williams- entity-manager
802ede0a25eSPatrick Williams  - InventoryAdded (unique)
803ede0a25eSPatrick Williams  - InventoryRemoved (unique)
804ede0a25eSPatrick Williams- estoraged
805ede0a25eSPatrick Williams  - ServiceStarted
806ede0a25eSPatrick Williams- x86-power-control
807ede0a25eSPatrick Williams  - NMIButtonPressed (unique)
808ede0a25eSPatrick Williams  - NMIDiagnosticInterrupt (unique)
809ede0a25eSPatrick Williams  - PowerButtonPressed (unique)
810ede0a25eSPatrick Williams  - PowerRestorePolicyApplied (unique)
811ede0a25eSPatrick Williams  - PowerSupplyPowerGoodFailed (unique)
812ede0a25eSPatrick Williams  - ResetButtonPressed (unique)
813ede0a25eSPatrick Williams  - SystemPowerGoodFailed (unique)
814ede0a25eSPatrick Williams
815ede0a25eSPatrick WilliamsIntel-only implementations:
816ede0a25eSPatrick Williams
817ede0a25eSPatrick Williams- intel-ipmi-oem
818ede0a25eSPatrick Williams  - ADDDCCorrectable
819ede0a25eSPatrick Williams  - BIOSPostERROR
820ede0a25eSPatrick Williams  - BIOSRecoveryComplete
821ede0a25eSPatrick Williams  - BIOSRecoveryStart
822ede0a25eSPatrick Williams  - FirmwareUpdateCompleted
823ede0a25eSPatrick Williams  - IntelUPILinkWidthReducedToHalf
824ede0a25eSPatrick Williams  - IntelUPILinkWidthReducedToQuarter
825ede0a25eSPatrick Williams  - LegacyPCIPERR
826ede0a25eSPatrick Williams  - LegacyPCISERR
827ede0a25eSPatrick Williams  - `ME*` : 29 different events
828ede0a25eSPatrick Williams  - `Memory*` : 9 different events
829ede0a25eSPatrick Williams  - MirroringRedundancyDegraded
830ede0a25eSPatrick Williams  - MirroringRedundancyFull
831ede0a25eSPatrick Williams  - `PCIeCorrectable*`, `PCIeFatal` : 29 different events
832ede0a25eSPatrick Williams  - SELEntryAdded
833ede0a25eSPatrick Williams  - SparingRedundancyDegraded
834ede0a25eSPatrick Williams- pfr-manager
835ede0a25eSPatrick Williams  - BIOSFirmwareRecoveryReason
836ede0a25eSPatrick Williams  - BIOSFirmwarePanicReason
837ede0a25eSPatrick Williams  - BMCFirmwarePanicReason
838ede0a25eSPatrick Williams  - BMCFirmwareRecoveryReason
839ede0a25eSPatrick Williams  - BMCFirmwareResiliencyError
840ede0a25eSPatrick Williams  - CPLDFirmwarePanicReason
841ede0a25eSPatrick Williams  - CPLDFirmwareResilencyError
842ede0a25eSPatrick Williams  - FirmwareResiliencyError
843ede0a25eSPatrick Williams- host-error-monitor
844ede0a25eSPatrick Williams  - CPUError
845ede0a25eSPatrick Williams  - CPUMismatch
846ede0a25eSPatrick Williams  - CPUThermalTrip
847ede0a25eSPatrick Williams  - ComponentOverTemperature
848ede0a25eSPatrick Williams  - SsbThermalTrip
849ede0a25eSPatrick Williams  - VoltageRegulatorOverheated
850ede0a25eSPatrick Williams- s2600wf-misc
851ede0a25eSPatrick Williams  - DriveError
852ede0a25eSPatrick Williams  - InventoryAdded
853ede0a25eSPatrick Williams
854ede0a25eSPatrick Williams## Impacts
855ede0a25eSPatrick Williams
856ede0a25eSPatrick Williams- New APIs are defined for error and event logging. This will deprecate existing
857ede0a25eSPatrick Williams  `phosphor-logging` APIs, with a time to migrate, for error reporting.
858ede0a25eSPatrick Williams
859ede0a25eSPatrick Williams- The design should improve performance by eliminating the regular parsing of
860ede0a25eSPatrick Williams  the `systemd` journal. The design may decrease performance by allowing the
861ede0a25eSPatrick Williams  number of error and event logs to be dramatically increased, which have an
862ede0a25eSPatrick Williams  impact to file system utilization and potential for DBus impacts some services
863ede0a25eSPatrick Williams  such as `ObjectMapper`.
864ede0a25eSPatrick Williams
865ede0a25eSPatrick Williams- Backwards compatibility and documentation should be improved by the automatic
866ede0a25eSPatrick Williams  generation of the Redfish Message Registry corresponding to all error and
867ede0a25eSPatrick Williams  event reports.
868ede0a25eSPatrick Williams
869ede0a25eSPatrick Williams### Organizational
870ede0a25eSPatrick Williams
871ede0a25eSPatrick Williams- **Does this repository require a new repository?**
872ede0a25eSPatrick Williams  - No
873ede0a25eSPatrick Williams- **Who will be the initial maintainer(s) of this repository?**
874ede0a25eSPatrick Williams  - N/A
875ede0a25eSPatrick Williams- **Which repositories are expected to be modified to execute this design?**
876ede0a25eSPatrick Williams  - `sdbusplus`
877ede0a25eSPatrick Williams  - `phosphor-dbus-interfaces`
878ede0a25eSPatrick Williams  - `phosphor-logging`
879ede0a25eSPatrick Williams  - `bmcweb`
880ede0a25eSPatrick Williams  - Any repository creating an error or event.
881ede0a25eSPatrick Williams
882ede0a25eSPatrick Williams## Testing
883ede0a25eSPatrick Williams
884ede0a25eSPatrick Williams- Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error
885ede0a25eSPatrick Williams  and event generation, creation APIs, and to provide coverage on any changes to
886ede0a25eSPatrick Williams  the `Logging.Entry` object management.
887ede0a25eSPatrick Williams
888ede0a25eSPatrick Williams- Unit tests will be written for `bmcweb` for basic `Logging.Entry`
889ede0a25eSPatrick Williams  transformation and Message Registry generation.
890ede0a25eSPatrick Williams
891ede0a25eSPatrick Williams- Integration tests should be leveraged (and enhanced as necessary) from
892ede0a25eSPatrick Williams  `openbmc-test-automation` to cover the end-to-end error creation and Redfish
893ede0a25eSPatrick Williams  reporting.
894