1# Error handling for power Hardware Abstraction Layer (pHAL)
2
3Author:
4Devender Rao <devenrao@in.ibm.com> <devenrao>
5
6Other contributors:
7None
8
9Created:
1014/01/2020
11
12## Problem Description
13Proposal to provide a mechanism to convert the failure data captured as part
14of power Hardware Abstraction Layer(pHAL) library calls to
15[Platform Event Log][1] (PEL) format.
16
17## Background and References
18OpenBmc Applications use the pHAL layer for hardware access and hardware
19initialization, any software/hardware error returned by the pHAL layer need
20to be converted to PEL format for logging the error entry. PEL helps to
21improve the firmware and platform serviceability during product development,
22manufacturing and in customer environment.
23
24Error data includes register data, targets to [guard][2] and callout.
25Guard refers to the action of "guarding" faulty hardware from impacting
26future system operation. Callout points to a specific hardware with in the
27server that relates to the identified error.
28
29[Phosphor-logging][3] [Create][4] interface is used for creating PELs.
30
31pHAL layer constitutes below libraries and and these libraries return
32different return codes.
331. libipl used for initial program load
342. libfdt for device tree access
353. libekb for hardware procedure execution
364. libpdbg for hardware access
37
38Proposal is to structure the return data to a standard return code format so
39that the caller can just handle the single return code format for conversion
40to PEL.
41
42### Glossary
43pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running
44in BMC. These libraries are used by Open Power specific application for
45hardware complex interactions, hostboot and Self Boot Engine initialization,
46diagnostics and debugging.
47
48libfdt: pHAL uses to construct the in-memory tree structure of all targets.
49[Reference][5]
50
51libpdbg: library to allow debugging of the host POWER processors from the BMC
52[Reference][6]
53
54MRW: Machine readable workbook. An XML description of a machine as specified
55by the system owner.
56
57HWP: Hardware procedure. A "black box" code module supplied by the hardware
58team that initializes host processor and memory subsystems in a platform
59-independent fashion.
60
61Device Tree: A device tree is a data structure describing the hardware
62components of a particular computer so that the operating system's kernel can
63use and manage those components, including the CPU or CPUs, the memory, the
64buses and the peripherals. [Reference][7]
65
66EKB: EKB library contains all the hardware procedures (HWP) for the specific
67platform and corresponding error XML files for each hardware procedure.
68
69PEL: [Platform Entity Log][1]
70
71## Requirement
72### libekb
73EKB library contains hardware procedures for the specific platform and the
74corresponding error xml files for each hardware procedure. Error XML specifies
75attribute data, targets to callout, targets to guard, and targets to
76deconfigure for a specific error. Parsers in EKB library parse the error XML
77file and generate a c++ header file which is used by the hardware procedure
78in capturing the failure data.
79
80Add parser in libekb to parse the error XML file and provide methods that can
81parse the failure data returned by the hardware procedure methods and return
82data in key, value pairs so that the same can be used in the creation of PEL.
83
84### libipl
85Initial program load library used for booting the system. Library internally
86calls hardware procedures (HWP) of EKB library. Hardware procedure execution
87status need to be returned to the caller so that caller can create PEL on
88hardware procedure execution failure.
89
90### libpdbg
91libpdbg library is used for hardware access, any hardware access errors need
92to be captured as part of the PEL.
93
94
95### Message Registry Entries
96For errors to be raised in pHal corresponding error message registry entries
97need to be created in the [message registry][8].
98
99## Proposed design
100### Hardware procedure failure
101Add parser in libekb to parse the error XML file and provide methods that can
102parse the failure data returned by the hardware procedure methods and return
103data in key, value pairs so that the same can be used in the [Create][4]
104interface for the creation of PEL.
105
106Inventory strings for the targets to Callout/Guard/Deconfig need to be added
107to the additional data section of the Create interface.
108
109Applications need to register callback methods in libekb library to get back the
110error logging traces.
111
112Debug traces returned through the callback method will be added to the PEL.
113
114
115### libipl internal failure
116Applications need to register callback methods in libipl library to get back the
117error logging traces.
118
119Debug traces returned through the callback method will be added to the PEL.
120
121### libpdbg internal failure
122Applications need to register callback methods to get the debug traces from
123libpdbg library.
124
125Debug traces returned through the callback method will be added to the PEL.
126
127## Sequence diagrams
128### Register for debug traces and boot errors
129![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png)
130
131### Process debug traces
132![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png)
133
134### Process boot failures
135![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png)
136
137## Alternatives Considered
138None
139
140## Impacts
141None
142
143## Future changes
144At present using [Create][4] by providing the data in std::map format the same
145will be changed to JSON format when the corresponding support to pass JSON files
146to the Create interface is added.
147
148## Testing
1491. Simulate hardware procedure failure and check if PEL is created.
150
151[1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md)
152[2]: (https://gerrit.openbmc.org/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md)
153[3]: (https://github.com/openbmc/phosphor-logging)
154[4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Logging/Create.interface.yaml)
155[5]: (https://github.com/dgibson/dtc)
156[6]: (https://github.com/open-power/pdbg)
157[7]: (https://elinux.org/Device_Tree_Reference)
158[8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json)
159