xref: /openbmc/docs/designs/error-log-handling-for-phal.md (revision 14081020824edc9ed1ae7221f71950688eb57062)
1 # Error handling for power Hardware Abstraction Layer (pHAL)
2 
3 Author:
4 Devender Rao <devenrao@in.ibm.com> <devenrao>
5 
6 Primary assignee:
7 Devender Rao <devenrao@in.ibm.com> <devenrao>
8 
9 Other contributors:
10 None
11 
12 Created:
13 14/01/2020
14 
15 ## Problem Description
16 Proposal to provide a mechanism to convert the failure data captured as part
17 of power Hardware Abstraction Layer(pHAL) library calls to
18 [Platform Event Log][1] (PEL) format.
19 
20 ## Background and References
21 OpenBmc Applications use the pHAL layer for hardware access and hardware
22 initialization, any software/hardware error returned by the pHAL layer need
23 to be converted to PEL format for logging the error entry. PEL helps to
24 improve the firmware and platform serviceability during product development,
25 manufacturing and in customer environment.
26 
27 Error data includes register data, targets to [guard][2] and callout.
28 Guard refers to the action of "guarding" faulty hardware from impacting
29 future system operation. Callout points to a specific hardware with in the
30 server that relates to the identified error.
31 
32 [Phosphor-logging][3] [Create][4] interface is used for creating PELs.
33 
34 pHAL layer constitutes below libraries and and these libraries return
35 different return codes.
36 1. libipl used for initial program load
37 2. libfdt for device tree access
38 3. libekb for hardware procedure execution
39 4. libpdbg for hardware access
40 
41 Proposal is to structure the return data to a standard return code format so
42 that the caller can just handle the single return code format for conversion
43 to PEL.
44 
45 ### Glossary
46 pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running
47 in BMC. These libraries are used by Open Power specific application for
48 hardware complex interactions, hostboot and Self Boot Engine initialization,
49 diagnostics and debugging.
50 
51 libfdt: pHAL uses to construct the in-memory tree structure of all targets.
52 [Reference][5]
53 
54 libpdbg: library to allow debugging of the host POWER processors from the BMC
55 [Reference][6]
56 
57 MRW: Machine readable workbook. An XML description of a machine as specified
58 by the system owner.
59 
60 HWP: Hardware procedure. A "black box" code module supplied by the hardware
61 team that initializes host processor and memory subsystems in a platform
62 -independent fashion.
63 
64 Device Tree: A device tree is a data structure describing the hardware
65 components of a particular computer so that the operating system's kernel can
66 use and manage those components, including the CPU or CPUs, the memory, the
67 buses and the peripherals. [Reference][7]
68 
69 EKB: EKB library contains all the hardware procedures (HWP) for the specific
70 platform and corresponding error XML files for each hardware procedure.
71 
72 PEL: [Platform Entity Log][1]
73 
74 ## Requirement
75 ### libekb
76 EKB library contains hardware procedures for the specific platform and the
77 corresponding error xml files for each hardware procedure. Error XML specifies
78 attribute data, targets to callout, targets to guard, and targets to
79 deconfigure for a specific error. Parsers in EKB library parse the error XML
80 file and generate a c++ header file which is used by the hardware procedure
81 in capturing the failure data.
82 
83 Add parser in libekb to parse the error XML file and provide methods that can
84 parse the failure data returned by the hardware procedure methods and return
85 data in key, value pairs so that the same can be used in the creation of PEL.
86 
87 ### libipl
88 Initial program load library used for booting the system. Library internally
89 calls hardware procedures (HWP) of EKB library. Hardware procedure execution
90 status need to be returned to the caller so that caller can create PEL on
91 hardware procedure execution failure.
92 
93 ### libpdbg
94 libpdbg library is used for hardware access, any hardware access errors need
95 to be captured as part of the PEL.
96 
97 
98 ### Message Registry Entries
99 For errors to be raised in pHal corresponding error message registry entries
100 need to be created in the [message registry][8].
101 
102 ## Proposed design
103 ### Hardware procedure failure
104 Add parser in libekb to parse the error XML file and provide methods that can
105 parse the failure data returned by the hardware procedure methods and return
106 data in key, value pairs so that the same can be used in the [Create][4]
107 interface for the creation of PEL.
108 
109 Inventory strings for the targets to Callout/Guard/Deconfig need to be added
110 to the additional data section of the Create interface.
111 
112 Applications need to register callback methods in libekb library to get back the
113 error logging traces.
114 
115 Debug traces returned through the callback method will be added to the PEL.
116 
117 
118 ### libipl internal failure
119 Applications need to register callback methods in libipl library to get back the
120 error logging traces.
121 
122 Debug traces returned through the callback method will be added to the PEL.
123 
124 ### libpdbg internal failure
125 Applications need to register callback methods to get the debug traces from
126 libpdbg library.
127 
128 Debug traces returned through the callback method will be added to the PEL.
129 
130 ## Sequence diagrams
131 ### Register for debug traces and boot errors
132 ![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png)
133 
134 ### Process debug traces
135 ![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png)
136 
137 ### Process boot failures
138 ![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png)
139 
140 ## Alternatives Considered
141 None
142 
143 ## Impacts
144 None
145 
146 ## Future changes
147 At present using [Create][4] by providing the data in std::map format the same
148 will be changed to JSON format when the corresponding support to pass JSON files
149 to the Create interface is added.
150 
151 ## Testing
152 1. Simulate hardware procedure failure and check if PEL is created.
153 
154 [1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md)
155 [2]: (https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md)
156 [3]: (https://github.com/openbmc/phosphor-logging)
157 [4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Create.interface.yaml)
158 [5]: (https://github.com/dgibson/dtc)
159 [6]: (https://github.com/open-power/pdbg)
160 [7]: (https://elinux.org/Device_Tree_Reference)
161 [8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json)
162