1 # Error handling for power Hardware Abstraction Layer (pHAL) 2 3 Author: 4 Devender Rao <devenrao@in.ibm.com> <devenrao> 5 6 Primary assignee: 7 Devender Rao <devenrao@in.ibm.com> <devenrao> 8 9 Other contributors: 10 None 11 12 Created: 13 14/01/2020 14 15 ## Problem Description 16 Proposal to provide a mechanism to convert the failure data captured as part 17 of power Hardware Abstraction Layer(pHAL) library calls to 18 [Platform Event Log][1] (PEL) format. 19 20 ## Background and References 21 OpenBmc Applications use the pHAL layer for hardware access and hardware 22 initialization, any software/hardware error returned by the pHAL layer need 23 to be converted to PEL format for logging the error entry. PEL helps to 24 improve the firmware and platform serviceability during product development, 25 manufacturing and in customer environment. 26 27 Error data includes register data, targets to [guard][2] and callout. 28 Guard refers to the action of "guarding" faulty hardware from impacting 29 future system operation. Callout points to a specific hardware with in the 30 server that relates to the identified error. 31 32 [Phosphor-logging][3] [Create][4] interface is used for creating PELs. 33 34 pHAL layer constitutes below libraries and and these libraries return 35 different return codes. 36 1. libipl used for initial program load 37 2. libfdt for device tree access 38 3. libekb for hardware procedure execution 39 4. libpdbg for hardware access 40 41 Proposal is to structure the return data to a standard return code format so 42 that the caller can just handle the single return code format for conversion 43 to PEL. 44 45 ### Glossary 46 pHAL: power Hardware Abstraction Layer. pHAL is group of libraries running 47 in BMC. These libraries are used by Open Power specific application for 48 hardware complex interactions, hostboot and Self Boot Engine initialization, 49 diagnostics and debugging. 50 51 libfdt: pHAL uses to construct the in-memory tree structure of all targets. 52 [Reference][5] 53 54 libpdbg: library to allow debugging of the host POWER processors from the BMC 55 [Reference][6] 56 57 MRW: Machine readable workbook. An XML description of a machine as specified 58 by the system owner. 59 60 HWP: Hardware procedure. A "black box" code module supplied by the hardware 61 team that initializes host processor and memory subsystems in a platform 62 -independent fashion. 63 64 Device Tree: A device tree is a data structure describing the hardware 65 components of a particular computer so that the operating system's kernel can 66 use and manage those components, including the CPU or CPUs, the memory, the 67 buses and the peripherals. [Reference][7] 68 69 EKB: EKB library contains all the hardware procedures (HWP) for the specific 70 platform and corresponding error XML files for each hardware procedure. 71 72 PEL: [Platform Entity Log][1] 73 74 ## Requirement 75 ### libekb 76 EKB library contains hardware procedures for the specific platform and the 77 corresponding error xml files for each hardware procedure. Error XML specifies 78 attribute data, targets to callout, targets to guard, and targets to 79 deconfigure for a specific error. Parsers in EKB library parse the error XML 80 file and generate a c++ header file which is used by the hardware procedure 81 in capturing the failure data. 82 83 Add parser in libekb to parse the error XML file and provide methods that can 84 parse the failure data returned by the hardware procedure methods and return 85 data in key, value pairs so that the same can be used in the creation of PEL. 86 87 ### libipl 88 Initial program load library used for booting the system. Library internally 89 calls hardware procedures (HWP) of EKB library. Hardware procedure execution 90 status need to be returned to the caller so that caller can create PEL on 91 hardware procedure execution failure. 92 93 ### libpdbg 94 libpdbg library is used for hardware access, any hardware access errors need 95 to be captured as part of the PEL. 96 97 98 ### Message Registry Entries 99 For errors to be raised in pHal corresponding error message registry entries 100 need to be created in the [message registry][8]. 101 102 ## Proposed design 103 ### Hardware procedure failure 104 Add parser in libekb to parse the error XML file and provide methods that can 105 parse the failure data returned by the hardware procedure methods and return 106 data in key, value pairs so that the same can be used in the [Create][4] 107 interface for the creation of PEL. 108 109 Inventory strings for the targets to Callout/Guard/Deconfig need to be added 110 to the additional data section of the Create interface. 111 112 Applications need to register callback methods in libekb library to get back the 113 error logging traces. 114 115 Debug traces returned through the callback method will be added to the PEL. 116 117 118 ### libipl internal failure 119 Applications need to register callback methods in libipl library to get back the 120 error logging traces. 121 122 Debug traces returned through the callback method will be added to the PEL. 123 124 ### libpdbg internal failure 125 Applications need to register callback methods to get the debug traces from 126 libpdbg library. 127 128 Debug traces returned through the callback method will be added to the PEL. 129 130 ## Sequence diagrams 131 ### Register for debug traces and boot errors 132  133 134 ### Process debug traces 135  136 137 ### Process boot failures 138  139 140 ## Alternatives Considered 141 None 142 143 ## Impacts 144 None 145 146 ## Future changes 147 At present using [Create][4] by providing the data in std::map format the same 148 will be changed to JSON format when the corresponding support to pass JSON files 149 to the Create interface is added. 150 151 ## Testing 152 1. Simulate hardware procedure failure and check if PEL is created. 153 154 [1]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/README.md) 155 [2]: (https://gerrit.openbmc-project.xyz/#/c/openbmc/docs/+/27804/2/designs/gard_on_bmc.md) 156 [3]: (https://github.com/openbmc/phosphor-logging) 157 [4]: (https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Logging/Create.interface.yaml) 158 [5]: (https://github.com/dgibson/dtc) 159 [6]: (https://github.com/open-power/pdbg) 160 [7]: (https://elinux.org/Device_Tree_Reference) 161 [8]: (https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/registry/message_registry.json) 162