1# Error handling for power Hardware Abstraction Layer (pHAL)
2
3Author:
4Devender Rao <devenrao@in.ibm.com> <devenrao>
5
6Primary assignee:
7Devender Rao <devenrao@in.ibm.com> <devenrao>
8
9Other contributors:
10None
11
12Created:
1314/01/2020
14
15## Problem Description
16Proposal to provide a mechanism to convert the failure data captured as part
17of power Hardware Abstraction Layer(pHAL) library calls to
18[Platform Event Log][1] (PEL) format.
19
20## Background and References
21OpenBmc Applications use the pHAL layer for hardware access and hardware
22initialization, any software/hardware error returned by the pHAL layer need
23to be converted to PEL format for logging the error entry. PEL helps to
24improve the firmware and platform serviceability during product development,
25manufacturing and in customer environment.
26
27Error data includes register data, targets to [guard][2] and callout.
28Guard refers to the action of "guarding" faulty hardware from impacting
29future system operation. Callout points to a specific hardware with in the
30server that relates to the identified error.
31
32[Phosphor-logging][3] [Create][4] interface is used for creating PELs.
33
34pHAL layer constitutes below libraries and and these libraries return
35different return codes.
361. libipl used for initial program load
372. libfdt for device tree access
383. libekb for hardware procedure execution
394. libpdbg for hardware access
40
41Proposal is to structure the return data to a standard return code format so
42that the caller can just handle the single return code format for conversion
43to PEL.
44
45### Glossary
46pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running
47in BMC. These libraries are used by Open Power specific application for
48hardware complex interactions, hostboot and Self Boot Engine initialization,
49diagnostics and debugging.
50
51libfdt: pHAL uses to construct the in-memory tree structure of all targets.
52[Reference][5]
53
54libpdbg: library to allow debugging of the host POWER processors from the BMC
55[Reference][6]
56
57MRW: Machine readable workbook. An XML description of a machine as specified
58by the system owner.
59
60HWP: Hardware procedure. A "black box" code module supplied by the hardware
61team that initializes host processor and memory subsystems in a platform
62-independent fashion.
63
64Device Tree: A device tree is a data structure describing the hardware
65components of a particular computer so that the operating system's kernel can
66use and manage those components, including the CPU or CPUs, the memory, the
67buses and the peripherals. [Reference][7]
68
69EKB: EKB library contains all the hardware procedures (HWP) for the specific
70platform and corresponding error XML files for each hardware procedure.
71
72PEL: [Platform Entity Log][1]
73
74## Requirement
75### libekb
76EKB library contains hardware procedures for the specific platform and the
77corresponding error xml files for each hardware procedure. Error XML specifies
78attribute data, targets to callout, targets to guard, and targets to
79deconfigure for a specific error. Parsers in EKB library parse the error XML
80file and generate a c++ header file which is used by the hardware procedure
81in capturing the failure data.
82
83Add parser in libekb to parse the error XML file and provide methods that can
84parse the failure data returned by the hardware procedure methods and return
85data in key, value pairs so that the same can be used in the creation of PEL.
86
87### libipl
88Initial program load library used for booting the system. Library internally
89calls hardware procedures (HWP) of EKB library. Hardware procedure execution
90status need to be returned to the caller so that caller can create PEL on
91hardware procedure execution failure.
92
93### libpdbg
94libpdbg library is used for hardware access, any hardware access errors need
95to be captured as part of the PEL.
96
97
98### Message Registry Entries
99For errors to be raised in pHal corresponding error message registry entries
100need to be created in the [message registry][8].
101
102## Proposed design
103### Hardware procedure failure
104Add parser in libekb to parse the error XML file and provide methods that can
105parse the failure data returned by the hardware procedure methods and return
106data in key, value pairs so that the same can be used in the [Create][4]
107interface for the creation of PEL.
108
109Inventory strings for the targets to Callout/Guard/Deconfig need to be added
110to the additional data section of the Create interface.
111
112Applications need to register callback methods in libekb library to get back the
113error logging traces.
114
115Debug traces returned through the callback method will be added to the PEL.
116
117
118### libipl internal failure
119Applications need to register callback methods in libipl library to get back the
120error logging traces.
121
122Debug traces returned through the callback method will be added to the PEL.
123
124### libpdbg internal failure
125Applications need to register callback methods to get the debug traces from
126libpdbg library.
127
128Debug traces returned through the callback method will be added to the PEL.
129
130## Sequence diagrams
131### Register for debug traces and boot errors
132![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png)
133
134### Process debug traces
135![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png)
136
137### Process boot failures
138![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png)
139
140## Alternatives Considered
141None
142
143## Impacts
144None
145
146## Future changes
147At present using [Create][4] by providing the data in std::map format the same
148will be changed to JSON format when the corresponding support to pass JSON files
149to the Create interface is added.
150
151## Testing
1521. Simulate hardware procedure failure and check if PEL is created.
153
154[1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md)
155[2]: (https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md)
156[3]: (https://github.com/openbmc/phosphor-logging)
157[4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Create.interface.yaml)
158[5]: (https://github.com/dgibson/dtc)
159[6]: (https://github.com/open-power/pdbg)
160[7]: (https://elinux.org/Device_Tree_Reference)
161[8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json)
162