1# Error handling for power Hardware Abstraction Layer (pHAL) 2 3Author: 4Devender Rao <devenrao@in.ibm.com> <devenrao> 5 6Other contributors: 7None 8 9Created: 1014/01/2020 11 12## Problem Description 13Proposal to provide a mechanism to convert the failure data captured as part 14of power Hardware Abstraction Layer(pHAL) library calls to 15[Platform Event Log][1] (PEL) format. 16 17## Background and References 18OpenBmc Applications use the pHAL layer for hardware access and hardware 19initialization, any software/hardware error returned by the pHAL layer need 20to be converted to PEL format for logging the error entry. PEL helps to 21improve the firmware and platform serviceability during product development, 22manufacturing and in customer environment. 23 24Error data includes register data, targets to [guard][2] and callout. 25Guard refers to the action of "guarding" faulty hardware from impacting 26future system operation. Callout points to a specific hardware with in the 27server that relates to the identified error. 28 29[Phosphor-logging][3] [Create][4] interface is used for creating PELs. 30 31pHAL layer constitutes below libraries and and these libraries return 32different return codes. 331. libipl used for initial program load 342. libfdt for device tree access 353. libekb for hardware procedure execution 364. libpdbg for hardware access 37 38Proposal is to structure the return data to a standard return code format so 39that the caller can just handle the single return code format for conversion 40to PEL. 41 42### Glossary 43pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running 44in BMC. These libraries are used by Open Power specific application for 45hardware complex interactions, hostboot and Self Boot Engine initialization, 46diagnostics and debugging. 47 48libfdt: pHAL uses to construct the in-memory tree structure of all targets. 49[Reference][5] 50 51libpdbg: library to allow debugging of the host POWER processors from the BMC 52[Reference][6] 53 54MRW: Machine readable workbook. An XML description of a machine as specified 55by the system owner. 56 57HWP: Hardware procedure. A "black box" code module supplied by the hardware 58team that initializes host processor and memory subsystems in a platform 59-independent fashion. 60 61Device Tree: A device tree is a data structure describing the hardware 62components of a particular computer so that the operating system's kernel can 63use and manage those components, including the CPU or CPUs, the memory, the 64buses and the peripherals. [Reference][7] 65 66EKB: EKB library contains all the hardware procedures (HWP) for the specific 67platform and corresponding error XML files for each hardware procedure. 68 69PEL: [Platform Entity Log][1] 70 71## Requirement 72### libekb 73EKB library contains hardware procedures for the specific platform and the 74corresponding error xml files for each hardware procedure. Error XML specifies 75attribute data, targets to callout, targets to guard, and targets to 76deconfigure for a specific error. Parsers in EKB library parse the error XML 77file and generate a c++ header file which is used by the hardware procedure 78in capturing the failure data. 79 80Add parser in libekb to parse the error XML file and provide methods that can 81parse the failure data returned by the hardware procedure methods and return 82data in key, value pairs so that the same can be used in the creation of PEL. 83 84### libipl 85Initial program load library used for booting the system. Library internally 86calls hardware procedures (HWP) of EKB library. Hardware procedure execution 87status need to be returned to the caller so that caller can create PEL on 88hardware procedure execution failure. 89 90### libpdbg 91libpdbg library is used for hardware access, any hardware access errors need 92to be captured as part of the PEL. 93 94 95### Message Registry Entries 96For errors to be raised in pHal corresponding error message registry entries 97need to be created in the [message registry][8]. 98 99## Proposed design 100### Hardware procedure failure 101Add parser in libekb to parse the error XML file and provide methods that can 102parse the failure data returned by the hardware procedure methods and return 103data in key, value pairs so that the same can be used in the [Create][4] 104interface for the creation of PEL. 105 106Inventory strings for the targets to Callout/Guard/Deconfig need to be added 107to the additional data section of the Create interface. 108 109Applications need to register callback methods in libekb library to get back the 110error logging traces. 111 112Debug traces returned through the callback method will be added to the PEL. 113 114 115### libipl internal failure 116Applications need to register callback methods in libipl library to get back the 117error logging traces. 118 119Debug traces returned through the callback method will be added to the PEL. 120 121### libpdbg internal failure 122Applications need to register callback methods to get the debug traces from 123libpdbg library. 124 125Debug traces returned through the callback method will be added to the PEL. 126 127## Sequence diagrams 128### Register for debug traces and boot errors 129![image](https://user-images.githubusercontent.com/26330444/76838214-e4e7dc80-6859-11ea-818c-031bf5a191d6.png) 130 131### Process debug traces 132![image](https://user-images.githubusercontent.com/26330444/76838355-152f7b00-685a-11ea-9975-4091ae1064cc.png) 133 134### Process boot failures 135![image](https://user-images.githubusercontent.com/26330444/76838503-3a23ee00-685a-11ea-9f2a-559e233b408f.png) 136 137## Alternatives Considered 138None 139 140## Impacts 141None 142 143## Future changes 144At present using [Create][4] by providing the data in std::map format the same 145will be changed to JSON format when the corresponding support to pass JSON files 146to the Create interface is added. 147 148## Testing 1491. Simulate hardware procedure failure and check if PEL is created. 150 151[1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md) 152[2]: (https://gerrit.openbmc.org/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md) 153[3]: (https://github.com/openbmc/phosphor-logging) 154[4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Logging/Create.interface.yaml) 155[5]: (https://github.com/dgibson/dtc) 156[6]: (https://github.com/open-power/pdbg) 157[7]: (https://elinux.org/Device_Tree_Reference) 158[8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json) 159