1# Error handling for power Hardware Abstraction Layer (pHAL) 2 3Author: 4Devender Rao <devenrao@in.ibm.com> <devenrao> 5 6Primary assignee: 7Devender Rao <devenrao@in.ibm.com> <devenrao> 8 9Other contributors: 10None 11 12Created: 1314/01/2020 14 15## Problem Description 16Proposal to provide a mechanism to convert the failure data captured as part 17of power Hardware Abstraction Layer(pHAL) library calls to 18[Platform Event Log][1] (PEL) format. 19 20## Background and References 21OpenBmc Applications use the pHAL layer for hardware access and hardware 22initialization, any software/hardware error returned by the pHAL layer need 23to be converted to PEL format for logging the error entry. PEL helps to 24improve the firmware and platform serviceability during product development, 25manufacturing and in customer environment. 26 27Error data includes register data, targets to [guard][2] and callout. 28Guard refers to the action of "guarding" faulty hardware from impacting 29future system operation. Callout points to a specific hardware with in the 30server that relates to the identified error. 31 32[Phosphor-logging][3] [Create][4] interface is used for creating PELs. 33 34pHAL layer constitutes below libraries and and these libraries return 35different return codes. 361. libipl used for initial program load 372. libfdt for device tree access 383. libekb for hardware procedure execution 394. libpdbg for hardware access 40 41Proposal is to structure the return data to a standard return code format so 42that the caller can just handle the single return code format for conversion 43to PEL. 44 45### Glossary 46pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running 47in BMC. These libraries are used by Open Power specific application for 48hardware complex interactions, hostboot and Self Boot Engine initialization, 49diagnostics and debugging. 50 51libfdt: pHAL uses to construct the in-memory tree structure of all targets. 52[Reference][5] 53 54libpdbg: library to allow debugging of the host POWER processors from the BMC 55[Reference][6] 56 57MRW: Machine readable workbook. An XML description of a machine as specified 58by the system owner. 59 60HWP: Hardware procedure. A "black box" code module supplied by the hardware 61team that initializes host processor and memory subsystems in a platform 62-independent fashion. 63 64Device Tree: A device tree is a data structure describing the hardware 65components of a particular computer so that the operating system's kernel can 66use and manage those components, including the CPU or CPUs, the memory, the 67buses and the peripherals. [Reference][7] 68 69EKB: EKB library contains all the hardware procedures (HWP) for the specific 70platform and corresponding error XML files for each hardware procedure. 71 72PEL: [Platform Entity Log][1] 73 74## Requirement 75### libekb 76EKB library contains hardware procedures for the specific platform and the 77corresponding error xml files for each hardware procedure. Error XML specifies 78attribute data, targets to callout, targets to guard, and targets to 79deconfigure for a specific error. Parsers in EKB library parse the error XML 80file and generate a c++ header file which is used by the hardware procedure 81in capturing the failure data. 82 83Add parser in libekb to parse the error XML file and provide methods that can 84parse the failure data returned by the hardware procedure methods and return 85data in key, value pairs so that the same can be used in the creation of PEL. 86 87### libipl 88Initial program load library used for booting the system. Library internally 89calls hardware procedures (HWP) of EKB library. Hardware procedure execution 90status need to be returned to the caller so that caller can create PEL on 91hardware procedure execution failure. 92 93### libpdbg 94libpdbg library is used for hardware access, any hardware access errors need 95to be captured as part of the PEL. 96 97 98### Message Registry Entries 99For errors to be raised in pHal corresponding error message registry entries 100need to be created in the [message registry][8]. 101 102## Proposed design 103### Hardware procedure failure 104Add parser in libekb to parse the error XML file and provide methods that can 105parse the failure data returned by the hardware procedure methods and return 106data in key, value pairs so that the same can be used in the [Create][4] 107interface for the creation of PEL. 108 109Inventory strings for the targets to Callout/Guard/Deconfig need to be added 110to the additional data section of the Create interface. 111 112Applications need to register callback methods in libekb library to get back the 113error logging traces. 114 115Debug traces returned through the callback method will be added to the PEL. 116 117 118### libipl internal failure 119Applications need to register callback methods in libipl library to get back the 120error logging traces. 121 122Debug traces returned through the callback method will be added to the PEL. 123 124### libpdbg internal failure 125Applications need to register callback methods to get the debug traces from 126libpdbg library. 127 128Debug traces returned through the callback method will be added to the PEL. 129 130## Sequence diagrams 131### Register for debug traces and boot errors 132![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png) 133 134### Process debug traces 135![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png) 136 137### Process boot failures 138![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png) 139 140## Alternatives Considered 141None 142 143## Impacts 144None 145 146## Future changes 147At present using [Create][4] by providing the data in std::map format the same 148will be changed to JSON format when the corresponding support to pass JSON files 149to the Create interface is added. 150 151## Testing 1521. Simulate hardware procedure failure and check if PEL is created. 153 154[1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md) 155[2]: (https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md) 156[3]: (https://github.com/openbmc/phosphor-logging) 157[4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Create.interface.yaml) 158[5]: (https://github.com/dgibson/dtc) 159[6]: (https://github.com/open-power/pdbg) 160[7]: (https://elinux.org/Device_Tree_Reference) 161[8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json) 162