1# Error handling for power Hardware Abstraction Layer (pHAL) 2 3Author: Devender Rao <devenrao@in.ibm.com> <devenrao> 4 5Other contributors: None 6 7Created: 14/01/2020 8 9## Problem Description 10 11Proposal to provide a mechanism to convert the failure data captured as part of 12power Hardware Abstraction Layer(pHAL) library calls to [Platform Event Log][1] 13(PEL) format. 14 15## Background and References 16 17OpenBmc Applications use the pHAL layer for hardware access and hardware 18initialization, any software/hardware error returned by the pHAL layer need to 19be converted to PEL format for logging the error entry. PEL helps to improve the 20firmware and platform serviceability during product development, manufacturing 21and in customer environment. 22 23Error data includes register data, targets to [guard][2] and callout. Guard 24refers to the action of "guarding" faulty hardware from impacting future system 25operation. Callout points to a specific hardware with in the server that relates 26to the identified error. 27 28[Phosphor-logging][3] [Create][4] interface is used for creating PELs. 29 30pHAL layer constitutes below libraries and and these libraries return different 31return codes. 32 331. libipl used for initial program load 342. libfdt for device tree access 353. libekb for hardware procedure execution 364. libpdbg for hardware access 37 38Proposal is to structure the return data to a standard return code format so 39that the caller can just handle the single return code format for conversion to 40PEL. 41 42### Glossary 43 44pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running in 45BMC. These libraries are used by Open Power specific application for hardware 46complex interactions, hostboot and Self Boot Engine initialization, diagnostics 47and debugging. 48 49libfdt: pHAL uses to construct the in-memory tree structure of all targets. 50[Reference][5] 51 52libpdbg: library to allow debugging of the host POWER processors from the BMC 53[Reference][6] 54 55MRW: Machine readable workbook. An XML description of a machine as specified by 56the system owner. 57 58HWP: Hardware procedure. A "black box" code module supplied by the hardware team 59that initializes host processor and memory subsystems in a platform -independent 60fashion. 61 62Device Tree: A device tree is a data structure describing the hardware 63components of a particular computer so that the operating system's kernel can 64use and manage those components, including the CPU or CPUs, the memory, the 65buses and the peripherals. [Reference][7] 66 67EKB: EKB library contains all the hardware procedures (HWP) for the specific 68platform and corresponding error XML files for each hardware procedure. 69 70PEL: [Platform Entity Log][1] 71 72## Requirement 73 74### libekb 75 76EKB library contains hardware procedures for the specific platform and the 77corresponding error xml files for each hardware procedure. Error XML specifies 78attribute data, targets to callout, targets to guard, and targets to deconfigure 79for a specific error. Parsers in EKB library parse the error XML file and 80generate a c++ header file which is used by the hardware procedure in capturing 81the failure data. 82 83Add parser in libekb to parse the error XML file and provide methods that can 84parse the failure data returned by the hardware procedure methods and return 85data in key, value pairs so that the same can be used in the creation of PEL. 86 87### libipl 88 89Initial program load library used for booting the system. Library internally 90calls hardware procedures (HWP) of EKB library. Hardware procedure execution 91status need to be returned to the caller so that caller can create PEL on 92hardware procedure execution failure. 93 94### libpdbg 95 96libpdbg library is used for hardware access, any hardware access errors need to 97be captured as part of the PEL. 98 99### Message Registry Entries 100 101For errors to be raised in pHal corresponding error message registry entries 102need to be created in the [message registry][8]. 103 104## Proposed design 105 106### Hardware procedure failure 107 108Add parser in libekb to parse the error XML file and provide methods that can 109parse the failure data returned by the hardware procedure methods and return 110data in key, value pairs so that the same can be used in the [Create][4] 111interface for the creation of PEL. 112 113Inventory strings for the targets to Callout/Guard/Deconfig need to be added to 114the additional data section of the Create interface. 115 116Applications need to register callback methods in libekb library to get back the 117error logging traces. 118 119Debug traces returned through the callback method will be added to the PEL. 120 121### libipl internal failure 122 123Applications need to register callback methods in libipl library to get back the 124error logging traces. 125 126Debug traces returned through the callback method will be added to the PEL. 127 128### libpdbg internal failure 129 130Applications need to register callback methods to get the debug traces from 131libpdbg library. 132 133Debug traces returned through the callback method will be added to the PEL. 134 135## Sequence diagrams 136 137### Register for debug traces and boot errors 138 139![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png) 140 141### Process debug traces 142 143![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png) 144 145### Process boot failures 146 147![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png) 148 149## Alternatives Considered 150 151None 152 153## Impacts 154 155None 156 157## Future changes 158 159At present using [Create][4] by providing the data in std::map format the same 160will be changed to JSON format when the corresponding support to pass JSON files 161to the Create interface is added. 162 163## Testing 164 1651. Simulate hardware procedure failure and check if PEL is created. 166 167[1]: 168 (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md) 169[2]: 170 (https://gerrit.openbmc.org/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md) 171[3]: (https://github.com/openbmc/phosphor-logging) 172[4]: 173 (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Logging/Create.interface.yaml) 174[5]: (https://github.com/dgibson/dtc) 175[6]: (https://github.com/open-power/pdbg) 176[7]: (https://elinux.org/Device_Tree_Reference) 177[8]: 178 (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json) 179