1# OpenPower Platform Event Log (PEL) extension 2 3This extension will create PELs for every OpenBMC event log. It is also 4possible to point to the raw PEL to use in the OpenBMC event, and then that 5will be used instead of creating one. 6 7## Contents 8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log) 9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels) 10* [The PEL Message Registry](#the-pel-message-registry) 11* [Callouts](#callouts) 12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules) 13* [D-Bus Interfaces](#d-bus-interfaces) 14* [PEL Retention](#pel-retention) 15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing) 16* [Fail Boot on Host Errors](#fail-boot-on-host-errors) 17 18## Passing PEL related data within an OpenBMC event log 19 20An error log creator can pass in data that is relevant to a PEL by using 21certain keywords in the AdditionalData property of the event log. 22 23### AdditionalData keywords 24 25#### RAWPEL 26 27This keyword is used to point to an existing PEL in a binary file that should 28be associated with this event log. The syntax is: 29``` 30RAWPEL=<path to PEL File> 31e.g. 32RAWPEL="/tmp/pels/pel.5" 33``` 34The code will assign its own error log ID to this PEL, and also update the 35commit timestamp field to the current time. 36 37#### POWER_THERMAL_CRITICAL_FAULT 38 39This keyword is used to set the power fault bit in PEL. The syntax is: 40``` 41POWER_THERMAL_CRITICAL_FAULT=<FLAG> 42e.g. 43POWER_THERMAL_CRITICAL_FAULT=TRUE 44``` 45 46Note that TRUE is the only value supported. 47 48#### SEVERITY_DETAIL 49 50This is used when the passed in event log severity determines the PEL 51severity and a more granular PEL severity is needed beyond what the normal 52event log to PEL severity conversion could give. 53 54The syntax is: 55``` 56SEVERITY_DETAIL=<SEVERITY_TYPE> 57e.g. 58SEVERITY_DETAIL=SYSTEM_TERM 59``` 60Option Supported: 61- SYSTEM_TERM, changes the Severity value from 0x50 to 0x51 62 63#### ESEL 64 65This keyword's data contains a full PEL in string format. This is how hostboot 66sends down PELs when it is configured in IPMI communication mode. The PEL is 67handled just like the PEL obtained using the RAWPEL keyword. 68 69The syntax is: 70 71``` 72ESEL= 73"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..." 74``` 75 76Note that there are 16 bytes of IPMI SEL data before the PEL data starts. 77 78#### _PID 79 80This keyword that contains the application's PID is added automatically by the 81phosphor-logging daemon when the `commit` or `report` APIs are used to create 82an event log, but not when the `Create` D-Bus method is used. If a caller of 83the `Create` API wishes to have their PID captured in the PEL this should be 84used. 85 86This will be added to the PEL in a section of type User Data (UD), along with 87the application name it corresponds to. 88 89The syntax is: 90``` 91_PID=<PID of application> 92e.g. 93_PID="12345" 94``` 95 96#### CALLOUT_INVENTORY_PATH 97 98This is used to pass in an inventory item to use as a callout. See [here for 99details](#passing-callouts-in-with-the-additionaldata-property) 100 101#### CALLOUT_PRIORITY 102 103This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of 104that FRU callout. If not specified, the default priority is "H"/High Priority. 105 106The possible values are: 107- "H": High Priority 108- "M": Medium Priority 109- "L": Low Priority 110 111See [here for details](#passing-callouts-in-with-the-additionaldata-property) 112 113#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 114 115This is used to pass in a device path to create callouts from. See [here for 116details](#passing-callouts-in-with-the-additionaldata-property) 117 118#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 119 120This is used to pass in an I2C bus and address to create callouts from. See 121[here for details](#passing-callouts-in-with-the-additionaldata-property) 122 123### FFDC Intended For UserData PEL sections 124 125When one needs to add FFDC into the PEL UserData sections, the 126`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create` 127interface must be used when creating a new event log. This method takes a list 128of files to store in the PEL UserData sections. 129 130That API is the same as the 'Create' one, except it has a new parameter: 131 132``` 133std::vector<std::tuple<enum[FFDCFormat], 134 uint8_t, 135 uint8_t, 136 sdbusplus::message::unix_fd>> 137``` 138 139Each entry in the vector contains a file descriptor for a file that will 140be stored in a unique UserData section. The tuple's arguments are: 141 142- enum[FFDCFormat]: The data format type, the options are: 143 - 'JSON' 144 - The parser will use nlohmann::json\'s pretty print 145 - 'CBOR' 146 - The parser will use nlohmann::json\'s pretty print 147 - 'Text' 148 - The parser will output ASCII text 149 - 'Custom' 150 - The parser will hexdump the data, unless there is a parser registered 151 for this component ID and subtype. 152- uint8_t: subType 153 - Useful for the 'custom' type. Not used with the other types. 154- uint8_t: version 155 - The version of the data. 156 - Used for the custom type. 157 - Not planning on using for JSON/BSON unless a reason to do so appears. 158- unixfd - The file descriptor for the opened file that contains the 159 contents. The file descriptor can be closed and the file can be deleted if 160 desired after the method call. 161 162An example of saving JSON data to a file and getting its file descriptor is: 163 164``` 165nlohmann::json json = ...; 166auto jsonString = json.dump(); 167FILE* fp = fopen(filename, "w"); 168fwrite(jsonString.data(), 1, jsonString.size(), fp); 169int fd = fileno(fp); 170``` 171 172Alternatively, 'open()' can be used to obtain the file descriptor of the file. 173 174Upon receiving this data, the PEL code will create UserData sections for each 175entry in that vector with the following UserData fields: 176 177- Section header component ID: 178 - If the type field from the tuple is "custom", use the component ID from 179 the message registry. 180 - Otherwise, set the component ID to the phosphor-logging component ID so 181 that the parser knows to use the built in parsers (e.g. json) for the 182 type. 183- Section header subtype: The subtype field from the tuple. 184- Section header version: The version field from the tuple. 185- Section data: The data from the file. 186 187If there is a peltool parser registered for the custom type (method is TBD), 188that will be used by peltool to print the data, otherwise it will be hexdumped. 189 190Before adding each of these UserData sections, a check will be done to see if 191the PEL size will remain under the maximum size of 16KB. If not, the UserData 192section will be truncated down enough so that it will fit into the 16KB. 193 194## Default UserData sections for BMC created PELs 195 196The extension code that creates PELs will add these UserData sections to every 197PEL: 198 199- The AdditionalData property contents 200 - If the AdditionalData property in the OpenBMC event log has anything in it, 201 it will be saved in a UserData section as a JSON string. 202 203- System information 204 - This section contains various pieces of system information, such as the 205 full code level and the BMC, chassis, and host state properties. 206 207## The PEL Message Registry 208 209The PEL message registry is used to create PELs from OpenBMC event logs. 210Documentation can be found [here](registry/README.md). 211 212## Callouts 213 214A callout points to a FRU, a symbolic FRU, or an isolation procedure. There 215can be from zero to ten of them in each PEL, where they are located in the SRC 216section. 217 218There are a few different ways to add callouts to a PEL. In all cases, the 219callouts will be sorted from highest to lowest priority within the PEL after 220they are added. 221 222### Passing callouts in with the AdditionalData property 223 224The PEL code can add callouts based on the values of special entries in the 225AdditionalData event log property. They are: 226 227- CALLOUT_INVENTORY_PATH 228 229 This keyword is used to call out a single FRU by passing in its D-Bus 230 inventory path. When the PEL code sees this, it will create a single FRU 231 callout, using the VPD properties (location code, FN, CCIN) from that 232 inventory item. If that item is not a FRU itself and does not have a 233 location code, it will keep searching its parents until it finds one that 234 is. 235 236 The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY 237 keyword is also present and contains a different priority in which case it 238 will be used instead. This can be useful when a maintenance procedure with 239 a high priority callout is specified for this error in the message registry 240 and the FRU callout needs to have a different priority. 241 242 ``` 243 CALLOUT_INVENTORY_PATH= 244 "/xyz/openbmc_project/inventory/system/chassis/motherboard" 245 ``` 246 247- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 248 249 These keywords are required as a pair to indicate faulty device 250 communication, usually detected by a failure accessing a device at that 251 sysfs path. The PEL code will use a data table generated by the MRW to map 252 these device paths to FRU callout lists. The errno value may influence the 253 callout. 254 255 I2C, FSI, FSI-I2C, and FSI-SPI paths are supported. 256 257 ``` 258 CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069" 259 CALLOUT_ERRNO="2" 260 ``` 261 262- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 263 264 These 3 keywords can be used to callout a failing I2C device path when the 265 full device path isn't known. It is similar to CALLOUT_DEVICE_PATH in that 266 it will use data tables generated by the MRW to create the callouts. 267 268 CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or 269 just the bus number by itself. 270 CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number 271 if preceded with a "0x". 272 273 ``` 274 CALLOUT_IIC_BUS="/dev/i2c-7" 275 CALLOUT_IIC_ADDR="81" 276 CALLOUT_ERRNO=62 277 ``` 278 279### Defining callouts in the message registry 280 281Callouts can be completely defined inside that error's definition in the PEL 282message registry. This method allows the callouts to vary based on the system 283type or on any AdditionalData item. 284 285At a high level, this involves defining a callout section inside the registry 286entry that contain the location codes or procedure names to use, along with 287their priority. If these can vary based on system type, the type provided by 288the entity manager will be one of the keys. If they can also depend on an 289AdditionalData entry, then that will also be a key. 290 291See the message registry [README](registry/README.md) and 292[schema](registry/schema/schema.json) for the details. 293 294### Using the message registry along with CALLOUT_ entries 295 296If the message registry entry contains a callout definition and the event log 297also contains one of aforementioned CALLOUT keys in the AdditionalData 298property, then the PEL code will first add the callouts stemming from the 299CALLOUT items, followed by the callouts from the message registry. 300 301### Specifying multiple callouts using JSON format FFDC files 302 303Multiple callouts can be passed in by the creator at the time of PEL creation. 304This is done by specifying them in a JSON file that is then passed in as an 305[FFDC file](#ffdc-intended-for-userdata-pel-sections). The JSON will still be 306added into a PEL UserData section for debug. 307 308To specify that an FFDC file contains callouts, the format value for that FFDC 309entry must be set to JSON, and the subtype field must be set to 0xCA: 310 311``` 312using FFDC = std::tuple<CreateIface::FFDCFormat, 313 uint8_t, 314 uint8_t, 315 sdbusplus::message::unix_fd>; 316 317FFDC ffdc{ 318 CreateIface::FFDCFormat::JSON, 319 0xCA, // Callout subtype 320 0x01, // Callout version, set to 0x01 321 fd}; 322``` 323 324The JSON contains an array of callouts that must be in order of highest 325priority to lowest, with a maximum of 10. Any callouts after the 10th will 326just be thrown away as there is no room for them in the PEL. The format looks 327like: 328 329``` 330[ 331 { 332 // First callout 333 }, 334 { 335 // Second callout 336 }, 337 { 338 // Nth callout 339 } 340] 341``` 342 343A callout entry can be a normal hardware callout, a maintenance procedure 344callout, or a symbolic FRU callout. Each callout must contain a Priority 345field, where the possible values are: 346 347* "H" = High 348* "M" = Medium 349* "A" = Medium Group A 350* "B" = Medium Group B 351* "C" = Medium Group C 352* "L" = Low 353 354Either unexpanded location codes or D-Bus inventory object paths can be used to 355specify the called out part. An unexpanded location code does not have the 356system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so 357can be either Ufcs-P1 or just P1). 358 359#### Normal hardware FRU callout 360 361Normal hardware callouts must contain either the location code or inventory 362path, and priority. Even though the PEL code doesn't do any guarding or 363deconfiguring itself, it needs to know if either of those things occurred as 364there are status bits in the PEL to reflect them. The Guarded and Deconfigured 365fields are used for this. Those fields are optional and if omitted then their 366values will be false. 367 368When the inventory path of a sub-FRU is passed in, the PEL code will put the 369location code of the parent FRU into the callout. 370 371``` 372{ 373 "LocationCode": "P0-C1", 374 "Priority": "H" 375} 376 377{ 378 "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5", 379 "Priority": "H", 380 "Deconfigured": true, 381 "Guarded": true 382} 383 384``` 385 386MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally 387be added to callouts to specify failing devices on a FRU. These may be used 388during the manufacturing test process, where there may be the ability to do 389these replacements. There can be up to 15 MRUs, each with its own priority, 390embedded in a callout. The possible priority values match the FRU priority 391values. 392 393Note that since JSON only supports numbers in decimal and not in hex, MRU IDs 394will show up as decimal when visually inspecting the JSON. 395 396``` 397{ 398 "LocationCode": "P0-C1", 399 "Priority": "H", 400 "MRUs": [ 401 { 402 "ID": 1234, 403 "Priority": "H" 404 }, 405 { 406 "ID": 5678, 407 "Priority": "H" 408 } 409 ] 410} 411``` 412 413#### Maintenance procedure callout 414 415The LocationCode field is not used with procedure callouts. Only the first 7 416characters of the Procedure field will be used by the PEL. 417 418``` 419{ 420 "Procedure": "PRONAME", 421 "Priority": "H" 422} 423``` 424 425#### Symbolic FRU callout 426 427Only the first seven characters of the SymbolicFRU field will be used by the PEL. 428 429If the TrustedLocationCode field is present and set to true, this means the 430location code may be used to turn on service indicators, so the LocationCode 431field is required. If TrustedLocationCode is false or missing, then the 432LocationCode field is optional. 433 434``` 435{ 436 "TrustedLocationCode": true, 437 "Location Code": "P0-C1", 438 "Priority": "H", 439 "SymbolicFRU": "FRUNAME" 440} 441``` 442 443## `Action Flags` and `Event Type` Rules 444 445The `Action Flags` and `Event Type` PEL fields are optional in the message 446registry, and if not present the code will set them based on certain rules 447layed out in the PEL spec. 448 449These rules are: 4501. Always set the `Report` flag, unless the `Do Not Report` flag is already on. 4512. Always clear the `SP Call Home` flag, as that feature isn't supported. 4523. If the severity is `Non-error Event`: 453 - Clear the `Service Action` flag. 454 - Clear the `Call Home` flag. 455 - If the `Event Type` field is `Not Applicable`, change it to `Information 456 Only`. 457 - If the `Event Type` field is `Information Only` or `Tracing`, set the 458 `Hidden` flag. 4594. If the severity is `Recovered`: 460 - Set the `Hidden` flag. 461 - Clear the `Service Action` flag. 462 - Clear the `Call Home` flag. 4635. For all other severities: 464 - Clear the `Hidden` flag. 465 - Set the `Service Action` flag. 466 - Set the `Call Home` flag. 467 468Additional rules may be added in the future if necessary. 469 470## D-Bus Interfaces 471 472See the org.open_power.Logging.PEL interface definition for the most up to date 473information. 474 475## PEL Retention 476 477The PEL repository is allocated a set amount of space on the BMC. When that 478space gets close to being full, the code will remove a percentage of PELs to 479make room for new ones. In addition, the code will keep a cap on the total 480number of PELs allowed. Note that removing a PEL will also remove the 481corresponding OpenBMC event log. 482 483The disk capacity limit is set to 20MB, and the number limit is 3000. 484 485The rules used to remove logs are listed below. The checks will be run after a 486PEL has been added and the method to create the PEL has returned to the caller, 487i.e. run when control gets back to the event loop. 488 489### Removal Algorithm 490 491If the size used is 95% or under of the allocated space and under the limit on 492the number of PELs, nothing further needs to be done, otherwise continue and 493run all 5 of the following steps. Each step itself only deletes PELs until it 494meets its requirement and then it stops. 495 496The steps are: 497 4981. Remove BMC created informational PELs until they take up 15% or less of the 499 allocated space. 500 5012. Remove BMC created non-informational PELs until they take up 30% or less of 502 the allocated space. 503 5043. Remove non-BMC created informational PELs until they take up 15% or less of 505 the allocated space. 506 5074. Remove non-BMC created non-informational PELs until they take up 30% or less 508 of the allocated space. 509 5105. After the previous 4 steps are complete, if there are still more than the 511 maximum number of PELs, remove PELs down to 80% of the maximum. 512 513PELs with associated guard records will never be deleted. Each step above 514makes the following 4 passes, stopping as soon as its limit is reached: 515 516Pass 1. Remove HMC acknowledged PELs.<br> 517Pass 2. Remove OS acknowledged PELs.<br> 518Pass 3. Remove PHYP acknowledged PELs.<br> 519Pass 4. Remove all PELs. 520 521After all these steps, disk capacity will be at most 90% (15% + 30% + 15% + 52230%). 523 524## Adding python3 modules for PEL UserData and SRC parsing 525 526In order to support python3 modules for the parsing of PEL User Data sections 527and to decode SRC data, setuptools is used to import python3 packages from 528external repos to be included in the OpenBMC image. 529``` 530Sample layout for setuptools: 531 532setup.py 533src/usr/scom/plugins/ebmc/b0300.py 534src/usr/i2c/plugins/ebmc/b0700.py 535src/build/tools/ebmc/errludP_Helpers.py 536``` 537 538`setup.py` is the build script for setuptools. It contains information about the 539package (such as the name and version) as well as which code files to include. 540 541The setup.py template to be used for eBMC User Data parsers: 542``` 543import os.path 544from setuptools import setup 545 546# To update this dict with new key/value pair for every component added 547# Key: The package name to be installed as 548# Value: The path containing the package's python modules 549dirmap = { 550 "b0300": "src/usr/scom/plugins/ebmc", 551 "b0700": "src/usr/i2c/plugins/ebmc", 552 "helpers": "src/build/tools/ebmc" 553} 554 555# All packages will be installed under 'udparsers' namespace 556def get_package_name(dirmap_key): 557 return "udparsers.{}".format(dirmap_key) 558 559def get_package_dirent(dirmap_item): 560 package_name = get_package_name(dirmap_item[0]) 561 package_dir = dirmap_item[1] 562 return (package_name, package_dir) 563 564def get_packages(): 565 return map(get_package_name, dirmap.keys()) 566 567def get_package_dirs(): 568 return map(get_package_dirent, dirmap.items()) 569 570setup( 571 name="Hostboot", 572 version="0.1", 573 packages=list(get_packages()), 574 package_dir=dict(get_package_dirs()) 575) 576``` 577- User Data parser module 578 - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the 579 Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID 580 from the User Data section itself (in HEX). All should be converted to 581 lowercase. 582 - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100 583 - Function to provide: `parseUDToJson` 584 - Argument list: 585 1. (int) Sub-section type 586 2. (int) Section version 587 3. (memoryview): Data 588 - Return data: 589 1. (str) JSON string 590 591 - Sample User Data parser module: 592 ``` 593 import json 594 def parseUDToJson(subType, ver, data): 595 d = dict() 596 ... 597 # Parse and populate data into dictionary 598 ... 599 jsonStr = json.dumps(d) 600 return jsonStr 601 ``` 602- SRC parser module 603 - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the 604 Private Header section (in ASCII, converted to lowercase). 605 - For example: `bsrc.py` for Hostboot generated SRCs 606 - Function to provide: `parseSRCToJson` 607 - Argument list: 608 1. (str) Refcode ASCII string 609 2. (str) Hexword 2 610 3. (str) Hexword 3 611 4. (str) Hexword 4 612 5. (str) Hexword 5 613 6. (str) Hexword 6 614 7. (str) Hexword 7 615 8. (str) Hexword 8 616 9. (str) Hexword 9 617 - Return data: 618 1. (str) JSON string 619 620 - Sample SRC parser module: 621 ``` 622 import json 623 def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \ 624 word8, word9): 625 d = dict() 626 ... 627 # Decode SRC data into dictionary 628 ... 629 jsonStr = json.dumps(d) 630 return jsonStr 631 ``` 632 633## Fail Boot on Host Errors 634 635The fail boot on hw error [design][1] provides a function where a system owner 636can tell the firmware to fail the boot of a system if a BMC phosphor-logging 637event has a hardware callout in it. 638 639It is required that when this fail boot on hardware error setting is enabled, 640that the BMC fail the boot for **any** error from the host which satisfies the 641following criteria: 642- not SeverityType::nonError 643- has a callout of any kind from the `FailingComponentType` structure 644 645[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md 646