Lines Matching +full:systemd +full:- +full:resolved

3 Author: [Patrick Williams][patrick-email] `<stwcx>`
5 [patrick-email]: mailto:patrick@stwcx.xyz
13 There is currently not a consistent end-to-end error and event reporting design
15 primarily using phosphor-logging and one using rsyslog, both of which have gaps
17 end-to-end design handling both errors and tracing events which facilitate
30 be "DIMM-A0 encountered an uncorrectable ECC error" or "System boot successful".
36 temperature threshold exceeded: ["temperature threshold exceeded"][HPE-Example]
37 and ["Temperature #0x30 Upper Critical going high"][Oracle-Example]. There is
45 reference][Registry-Example] from the DMTF gives this example:
84 Within OpenBMC, there is currently a [limited design][existing-design] for this
85 Redfish feature and it requires inserting specially formed Redfish-specific
88 observed that these [strings][app-example], when used, are often out of date
89 with the [message registry][obmc-registry-example] advertised by `bmcweb`. Some
90 maintainers have rejected adding new Redfish-specific logging messages to their
94 …nbmc/bmcweb/blob/de0c960c4262169ea92a4b852dd5ebbe3810bf00/redfish-core/schema/dmtf/json-schema/Log…
95 [HPE-Example]:
96 …ublic/docDisplay?docId=sd00002092en_us&docLocale=en_US&page=GUID-D7147C7F-2016-0901-06CE-000000000…
97 [Oracle-Example]:
98 https://docs.oracle.com/cd/E19464-01/820-6850-11/IPMItool.html#50602039_63068
99 [Registry-Example]:
100 https://www.dmtf.org/sites/default/files/Redfish%20School%20-%20Events_0.pdf
101 [existing-design]:
102 https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md
103 [app-example]:
104 …https://github.com/openbmc/phosphor-post-code-manager/blob/f2da78deb3a105c7270f74d9d747c77f0feaae2…
105 [obmc-registry-example]:
106 …https://github.com/openbmc/bmcweb/blob/4ba5be51e3fcbeed49a6a312b4e6b2f1ea7447ba/redfish-core/inclu…
108 ### Existing phosphor-logging implementation
118 client. `phosphor-logging` extended this to also add metadata associated to the
125 - name: InvalidCertificate
129 `phosphor-logging` metadata definition (in
133 - name: InvalidCertificate
135 - str: "REASON=%s"
154 `phosphor-logging`'s daemon for recording. As a side-effect of both calls, the
155 metadata is inserted into the `systemd` journal.
157 When an error is sent to the `phosphor-logging` daemon, it will:
161 2. Create an [`xyz.openbmc_project.Logging.Entry`][Logging-Entry] DBus object
166 `xyz.openbmc_project.Logging.Entry` objects advertised by `phosphor-logging`
170 perform (hand-coded) regular-expressions to extract any information from the
171 `Message` field of the `LogEntry`. Furthermore, these regular-expressions are
175 [Logging-Entry]:
176 …https://github.com/openbmc/phosphor-dbus-interfaces/blob/9012243e543abdc5851b7e878c17c991b2a2a8b7/…
180 - There are two different implementations of error logging, neither of which are
184 - The `REDFISH_MESSAGE_ID` log approach leads to differences between the Redfish
189 also does not provide comple-time assurance of appropriate metadata
190 collection, which can lead to producing code being out-of-date with the
193 - The `phosphor-logging` approach does not provide compile-time assurance of
195 the `systemd` journal on each error report, which limits scalability.
197 - The `sdbusplus` bindings for error reporting do not currently handle lossless
200 - Similar applications can result in different Redfish `LogEntry` for the same
202 between `dbus-sensors`, `phosphor-hwmon`, `phosphor-virtual-sensor`, and
203 `phosphor-health-monitor`. One cause of this is two different error reporting
208 - Applications running on the BMC must be able to report errors and failure
211 - These errors must be structured, versioned, and the complete set of errors
212 able to be created by the BMC should be available at built-time of a BMC
214 - The set of errors, able to be created by the BMC, must be able to be
216 - For Redfish, the transformation must comply with the Redfish standard
218 - For Redfish, the transformation should allow mapping internally defined
219 events to pre-existing Redfish Message Registries for broader
221 - For Redfish, the implementation must also support the EventService
222 mechanics for push-reporting.
223 - Errors reported by the BMC should contain sufficient information to allow
227 - Applications running on the BMC should be able to report important tracing
230 - All requirements relevant to errors are also applicable to tracing events.
231 - The implementation must have a mechanism for vendors to be able to disable
234 - Applications running on the BMC should be able to determine when a previously
235 reported error is no longer relevant and mark it as "resolved", while
238 - The BMC should provide a mechanism for managed entities within the server to
242 - The implementation on the BMC should scale to a minimum of
243 [10,000][error-discussion] error and events without impacting the BMC or
246 - The implementation should provide a mechanism to allow OEM or vendor
248 the Redfish Message Registry) for usage in closed-source or non-upstreamed
250 vendor-specific and not be tied to the OpenBMC project.
252 - APIs to implement error and event reporting should have good ergonomics. These
253 APIs must provide compile-time identification, for applicable programming
256 - The generated error classes and APIs should not require exceptions but
260 [error-discussion]:
265 The proposed design has a few high-level design elements:
267 - Consolidate the `sdbusplus` and `phosphor-logging` implementation of error
269 associated APIs and add compile-time checking of missing metadata.
271 - Add APIs to `phosphor-logging` to enable daemons to easily look up their own
272 previously reported events (for marking as resolved).
274 - Add to `phosphor-logging` a compile-time mechanism to disable recording of
275 specific tracing events for vendor-level customization.
277 - Generate a Redfish Message Registry for all error and events defined in
278 `phosphor-dbus-interfaces`, using binding generators from `sdbusplus`. Enhance
280 cover the Redfish Message Registry and `phosphor-logging` enhancements;
282 Base64-encoded JSON representation of the entire `Logging.Entry` for
284 `bmcweb` EventService implementation to support `phosphor-logging`-hosted
290 `Foo.metadata.yaml` files specified by `phosphor-logging` and specified by a new
299 - JSON serialization and de-serialization of generated exception types with
304 - A facility to register exception types, at library load time, with the
310 - Generate complete C++ exception types, with compile-time checking of missing
313 - size-type and signed integer
314 - floating-point number
315 - string
316 - DBus object path
318 - Generate a format that `bmcweb` can use to create and populate a Redfish
319 Message Registry, and translate from `phosphor-logging` to Redfish `LogEntry`
327 ### `phosphor-dbus-interfaces`
336 ### `phosphor-logging`
340 > managed separately. The `phosphor-logging` default `meson.options` have
351 - name: CreateEntry
353 - name: Message
355 - name: Severity
357 - name: AdditionalData
359 - name: Hint
363 - name: Entry
368 previously recorded error, for marking as resolved. These strings need to be
376 - property: Hint
383 - name: FindEntry
385 - name: Hint
388 - name: Entry
391 - xyz.openbmc_project.Common.ResourceNotFound
400 There are outstanding performance concerns with the `phosphor-logging`
402 This issue is expected to be self-contained within `phosphor-logging`, except
403 for potential future changes to the log-retrieval interfaces used by `bmcweb`.
405 APIs, from the experimentation and improvements in `phosphor-logging`, we will
417 `bmcweb` already has support for build-time conversion from a Redfish Message
421 from bitbake from `phosphor-dbus-interfaces` and vendor-specific event
423 also be added for adding `phosphor-dbus-interfaces` as a Meson subproject for
424 stand-alone testing.
441 1. A Base64-encoded JSON representation of the `Logging.Entry` will be assigned
460 `phosphor-logging` hosted events. The implementation of `LogService` should be
461 enhanced to support log paging for `phosphor-logging` hosted events.
463 ### `phosphor-sel-logger`
465 The `phosphor-sel-logger` has a meson option `send-to-logger` which toggles
466 between using `phosphor-logging` or the [`REDFISH_MESSAGE_ID`
467 mechanism][existing-design]. The `phosphor-logging`-utilizing paths will be
468 updated to utilize `phosphor-dbus-interfaces` specified errors and events.
472 Consider an example file in `phosphor-dbus-interfaces` as
480 - name: UpdateFailure
483 - name: TARGET
486 - name: ERRNO
488 - name: CALLOUT_HARDWARE
496 - name: BMCUpdateFailure
501 redfish-mapping: OpenBMC.FirmwareUpdateFailed
504 - name: UpdateProgress
506 - name: TARGET
509 - name: COMPLETION
520 schema][yaml-schema] is contained in the sdbusplus repository.
550 implementation will provide compile-time assurance that all of the metadata
564 [yaml-schema]:
571 - Adjusting a description or message should result in a `PATCH` increment.
572 - Adding a new error or event, or adding metadata to an existing error or event,
574 - Deprecating an error or event should result in a `MAJOR` increment.
576 There is [guidance on maintenance][registry-guidance] of the OpenBMC Message
578 `phosphor-dbus-interfaces` policy.
580 [registry-guidance]:
581 …https://github.com/openbmc/bmcweb/blob/master/redfish-core/include/registries/openbmc_message_regi…
617 `phosphor-logging`. Events defined in other repositories will be expected to use
618 some other prefix. Vendor-defined repositories should use a vendor-owned prefix
628 identifier naming). The `sdbusplus` (and `phosphor-logging` and `bmcweb`)
636 `phosphor-dbus-interfaces` defined events. Vendors must not add their own events
637 to `phosphor-dbus-interfaces` in downstream implementations because it would
639 OpenBMC-owned Registry which is not the case, but they should add them to their
648 this proposal there are many minor-alternatives that have been assessed.
652 The original `phosphor-logging` error descriptions allowed inheritance between
655 - This introduces complexity in the Redfish Message Registry versioning because
658 - It makes it difficult for a developer to clearly identify all of the fields
665 understand, and can provide compile-time awareness of missing metadata fields.
691 LSP-enabled editors to give completions for the metadata fields but
693 constructed but not thrown, which means we cannot get compile-time checking
696 2. This syntax uses tag-dispatch to enables compile-time checking of all
697 metadata fields and potential LSP-completion of the tag-types, but is more
701 `phosphor-logging`'s `lg2` API, but does not allow LSP-completion of the
705 enable compile-time checking that all metadata fields have been populated by
706 the lambda. The LSP-completion is likely not as strong as option (1), due to
707 the use of `auto`, and the lambda necessity will likely be a hang-up for
711 provide compile-time confirmation that all fields have been populated.
729 - Use a date code (ex. `2024.17.x`) representing the ISO 8601 week when the
731 - This does not cover vendors that may choose to branch for stabilization
733 OpenBMC-versioned message registry with different content.
735 - Use the most recent `openbmc/openbmc` tag as the version.
736 - This does not cover vendors that build off HEAD and may deploy multiple
739 - Generate the version based on the git-history.
740 - This requires `phosphor-dbus-interfaces` to be built from a git repository,
742 non-trivial processing that continues to scale over time.
749 Intel-specific code that is not pulled into any upstreamed machine, 39 are
753 and many of the others do not have attributes that would facilitate a multi-host
759 existing format, we can maintain those call-sites for a time period of 1-2
763 `phosphor-dbus-interfaces` defined events to the current `OpenBMC.0.4.0`
768 - phosphor-post-code-manager
769 - BIOSPOSTCode (unique)
770 - dbus-sensors
771 - ChassisIntrusionDetected (unique)
772 - ChassisIntrusionReset (unique)
773 - FanInserted
774 - FanRedundancyLost (unique)
775 - FanRedudancyRegained (unique)
776 - FanRemoved
777 - LanLost
778 - LanRegained
779 - PowerSupplyConfigurationError (unique)
780 - PowerSupplyConfigurationErrorRecovered (unique)
781 - PowerSupplyFailed
782 - PowerSupplyFailurePredicted (unique)
783 - PowerSupplyFanFailed
784 - PowerSupplyFanRecovered
785 - PowerSupplyPowerLost
786 - PowerSupplyPowerRestored
787 - PowerSupplyPredictiedFailureRecovered (unique)
788 - PowerSupplyRecovered
789 - phosphor-sel-logger
790 - IPMIWatchdog (unique)
791 - `SensorThreshold*` : 8 different events
792 - phosphor-net-ipmid
793 - InvalidLoginAttempted (unique)
794 - entity-manager
795 - InventoryAdded (unique)
796 - InventoryRemoved (unique)
797 - estoraged
798 - ServiceStarted
799 - x86-power-control
800 - NMIButtonPressed (unique)
801 - NMIDiagnosticInterrupt (unique)
802 - PowerButtonPressed (unique)
803 - PowerRestorePolicyApplied (unique)
804 - PowerSupplyPowerGoodFailed (unique)
805 - ResetButtonPressed (unique)
806 - SystemPowerGoodFailed (unique)
808 Intel-only implementations:
810 - intel-ipmi-oem
811 - ADDDCCorrectable
812 - BIOSPostERROR
813 - BIOSRecoveryComplete
814 - BIOSRecoveryStart
815 - FirmwareUpdateCompleted
816 - IntelUPILinkWidthReducedToHalf
817 - IntelUPILinkWidthReducedToQuarter
818 - LegacyPCIPERR
819 - LegacyPCISERR
820 - `ME*` : 29 different events
821 - `Memory*` : 9 different events
822 - MirroringRedundancyDegraded
823 - MirroringRedundancyFull
824 - `PCIeCorrectable*`, `PCIeFatal` : 29 different events
825 - SELEntryAdded
826 - SparingRedundancyDegraded
827 - pfr-manager
828 - BIOSFirmwareRecoveryReason
829 - BIOSFirmwarePanicReason
830 - BMCFirmwarePanicReason
831 - BMCFirmwareRecoveryReason
832 - BMCFirmwareResiliencyError
833 - CPLDFirmwarePanicReason
834 - CPLDFirmwareResilencyError
835 - FirmwareResiliencyError
836 - host-error-monitor
837 - CPUError
838 - CPUMismatch
839 - CPUThermalTrip
840 - ComponentOverTemperature
841 - SsbThermalTrip
842 - VoltageRegulatorOverheated
843 - s2600wf-misc
844 - DriveError
845 - InventoryAdded
849 - New APIs are defined for error and event logging. This will deprecate existing
850 `phosphor-logging` APIs, with a time to migrate, for error reporting.
852 - The design should improve performance by eliminating the regular parsing of
853 the `systemd` journal. The design may decrease performance by allowing the
858 - Backwards compatibility and documentation should be improved by the automatic
864 - **Does this repository require a new repository?**
865 - No
866 - **Who will be the initial maintainer(s) of this repository?**
867 - N/A
868 - **Which repositories are expected to be modified to execute this design?**
869 - `sdbusplus`
870 - `phosphor-dbus-interfaces`
871 - `phosphor-logging`
872 - `bmcweb`
873 - Any repository creating an error or event.
877 - Unit tests will be written in `sdbusplus` and `phosphor-logging` for the error
881 - Unit tests will be written for `bmcweb` for basic `Logging.Entry`
884 - Integration tests should be leveraged (and enhanced as necessary) from
885 `openbmc-test-automation` to cover the end-to-end error creation and Redfish