xref: /openbmc/docs/designs/bmc-reboot-cause-update.md (revision 12cf111b6a041843bc3a2ba04696471d4301bbb7)
1# BMC Boot Cause Event Log
2
3Author: Patrick Lin (patrick_lin_wiwynn)
4
5Other contributors: Delphine Chiu(delphinechiu), Bonnie Lo(bonnielo), Ricky
6Wu(ricky_cx_wu)
7
8Created: Aug 19, 2024
9
10## Problem Description
11
12Currently, OpenBMC lacks a unified method that meets the needs of various
13vendors to record different types of BMC reboot cause event logs. The purpose of
14this proposal is to update the existing method, consolidating more BMC reboot
15causes to better align with current usage needs.
16
17## Background and References
18
19In the current approach, the only defined reboot causes are **WDIOF_EXTERN1**
20and **WDIOF_CARDRESET**, but this is insufficient to meet today's needs.
21However, due to varying needs among different vendors, it’s not feasible to
22cover all possible reboot causes. In this update, we will add support for
23several more common types.
24
25## Requirements
26
271. Each BMC vendor needs to provide or update their driver to retrieve the
28   corresponding BMC reboot cause.
292. Each BMC vendor needs to record the results of the retrieved reboot cause to
30   the specified path.
313. Each vendor needs to ensure the accuracy of interpreting the reboot cause
32   results.
334. New reboot cause types need to be defined to cover the requirements
345. Revise the definitions of certain existing reboot cause types to better
35   represent their respective conditions.
366. Ensure that methods based on the original design can adapt well to this
37   change.
38
39## Proposed Design
40
41```mermaid
42flowchart TD
43    A[BMC reboot] --> B[Driver get the reboot cause]
44    B --> C[Driver set the corresponding flag based on the reboot cause to /sys/class/watchdog/watchdog0/bootstatus]
45    C --> D[phosphor-state-manager get the flag]
46    D --> E[Log the corresponding event based the the different flag]
47```
48
49After a BMC reboot, each BMC vendor’s driver first retrieves the reboot cause.
50Then, based on the reboot cause, it sets different flags to the specified path.
51The PSM (Phosphor-state-manager) reads the flags from the specified path to
52determine the type of reboot cause. Finally, it generates the corresponding
53event log based on the determination.
54
55This process ensures accurate logging and handling of different BMC reboot
56causes, improving system reliability and monitoring. Belows are the details of
57the new additions and changes:
58
591. Driver Provision by BMC Vendors:
60
61- Each BMC vendor must provide a driver to retrieve the BMC reboot cause and
62  record the result at the specified location.
63
642. Definition and Identification of Reboot Cause Type **Software Reset**:
65
66- This one is still under discussion, please refer to the following link for
67  more detail:
68  https://lore.kernel.org/all/9565c496-44d8-4214-8038-931926210d0f@roeck-us.net/
69
703. Revise the Definition of **WDIOF_CARDRESET**:
71
72- The **WDIOF_CARDRESET** type will now specifically indicate resets caused by
73  the watchdog.
74
754. Clarification of The **Power-on-reset case**:
76
77- When a BMC reset occured, but the flag in the bootstatus remains unchanged by
78  the watchdog driver (i.e., it stays at 0), this indicates that a
79  **Power-on-reset** has occurred.
80
815. Update Reboot Cause Interpretation:
82
83| phosphor-state-manager | bootstatus value | watchdog driver                  |
84| ---------------------- | ---------------- | -------------------------------- |
85| WDIOF_CARDRESET        | 0x20             | return 0x20 if reset by Watchdog |
86| POR                    | 0x00             | Do nothing                       |
87
886. Generate Corresponding Event Log:
89
90- After interpreting the reboot cause, the system should issue the corresponding
91  event log based on the determined type.
92
93## Alternatives Considered
94
95In the original approach, **WDIOF_CARDRESET** was used to represent a **POR**
96(power-on-reset). However, with the new method, we need to distinguish between
97watchdog resets and power-on resets. Therefore, we now use **WDIOF_CARDRESET**
98to represent watchdog resets, aligning with the existing kernel documentation
99and driver implementation.
100
101## Impacts
102
1031. Common reboot causes across vendors can be consolidated to issue a unified
104   event log.
1052. If any functions rely on the original reboot cause, the code must be adjusted
106   to align with the new definitions.
107
108### Organizational
109
110Which repositories are expected to be modified to execute this design?
111phosphor-state-manager
112
113## Testing
114
115Reboot the BMC under various conditions and check whether the corresponding
116event logs are generated correctly.
117