Lines Matching full:fault

1 # Hardware Fault Monitor
12 The goal is to create a new hardware fault monitor which will provide a
13 framework for collecting various fault and sensor information and making it
17 information through BMC interfaces, the hardware fault monitor will also receive
22 Future expansion of the hardware fault monitor would include adding the means to
23 locally analyze fault and sensor information and then based on specified
25 hardware fault monitor could receive repair action requests via Redfish from
52 - FRU fault manager controls the blinking of LEDs when faults occur:
53 https://github.com/openbmc/phosphor-led-manager/blob/master/fault-monitor/fru-fault-monitor.hpp
60 There is an OpenCompute Fault Management Infrastructure proposal that also
67 goal of the fault monitor is to enable rich error logging (OEM and CPU vendor
71 - The fault monitor must be able to handle receiving fault information that is
72 polled periodically as well as fault information that may come in sporadically
73 based on fault incidents (e.g. crash dumps).
75 - The fault monitor should allow for logging of a variety of sizes of fault
77 severe errors which require more fault information to be collected tend to
81 - Fault information must be added to a Redfish LogService in a timely manner
85 - The fault monitor must allow for custom overwrite rules for its log entries
93 A generic fault monitor will be created to collect fault information. First we
98 platform-specific system-level data. The fault monitor would therefore
104 - The fault monitor would monitor link level retries and link retrainings of
107 crash. The fault monitor in the BMC could check link level retries and link
109 the fault monitor could then add additional information such as high speed
115 could collect the register data). For corrected memory errors, the fault
120 The fault monitor will not have its own dedicated OpenBMC repository, but will
125 functionality needed for the fault monitor. For instance, based on the needs of
126 the OEM, the fault monitor will register to be notified of D-Bus signals of
127 interest in order to be alerted when fault events occur. The fault monitor will
128 also poll registers of interest and log their values to the fault log (described
129 more later). In addition, the host will be able to write fault information to
130 the fault log (via a POST (Create) request to its corresponding Redfish log
131 resource collection). When the fault monitor becomes aware of a new fault
132 occurrence through any of these ways, it may add fault information to the fault
133 log. The fault monitor may also gather relevant sensor data (read via D-Bus from
134 the dbus-sensors services) and add it to the fault log, with a reference to the
135 original fault event information. The EventGroupID in a Redfish LogEntry could
136 potentially be used to associate multiple log entries related to the same fault
139 The fault log for storing relevant fault information (and exposing it to
144 implementation of the fault log including saving and managing log files will be
148 and clearing the log. The fault log will be implemented as a new dump type in an
150 function is in dump_manager_main.cpp). The new fault log would contain dump
152 fault log dump entry class (deriving from the "Entry" class in dump_entry.hpp)
154 type of data that a fault log dump entry's corresponding dump file contains.
157 read and write the fault log. Functionality for handling a POST (Create) request
159 Redfish fault log entry to a Redfish client, large-sized fault information (e.g.
162 send external notifications, such as when the fault monitor needs to notify
163 external data center monitoring software of new fault information being
165 of any repair actions that need to be triggered based on the latest fault
170 We considered adding the fault logs into the main system event log
179 There may be situations where external consumers of fault monitor logs (e.g.
182 consumers can ignore any types of fault information provided by the fault
191 error conditions that will be logged by the fault monitor module.
194 intend to add unit testing for the fault monitor.