1# OpenPower Platform Event Log (PEL) extension 2 3This extension will create PELs for every OpenBMC event log. It is also 4possible to point to the raw PEL to use in the OpenBMC event, and then that 5will be used instead of creating one. 6 7## Contents 8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log) 9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels) 10* [The PEL Message Registry](#the-pel-message-registry) 11* [Callouts](#callouts) 12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules) 13* [D-Bus Interfaces](#d-bus-interfaces) 14* [PEL Retention](#pel-retention) 15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing) 16* [Fail Boot on Host Errors](#fail-boot-on-host-errors) 17 18## Passing PEL related data within an OpenBMC event log 19 20An error log creator can pass in data that is relevant to a PEL by using 21certain keywords in the AdditionalData property of the event log. 22 23### AdditionalData keywords 24 25#### RAWPEL 26 27This keyword is used to point to an existing PEL in a binary file that should 28be associated with this event log. The syntax is: 29``` 30RAWPEL=<path to PEL File> 31e.g. 32RAWPEL="/tmp/pels/pel.5" 33``` 34The code will assign its own error log ID to this PEL, and also update the 35commit timestamp field to the current time. 36 37#### POWER_THERMAL_CRITICAL_FAULT 38 39This keyword is used to set the power fault bit in PEL. The syntax is: 40``` 41POWER_THERMAL_CRITICAL_FAULT=<FLAG> 42e.g. 43POWER_THERMAL_CRITICAL_FAULT=TRUE 44``` 45 46Note that TRUE is the only value supported. 47 48#### ESEL 49 50This keyword's data contains a full PEL in string format. This is how hostboot 51sends down PELs when it is configured in IPMI communication mode. The PEL is 52handled just like the PEL obtained using the RAWPEL keyword. 53 54The syntax is: 55 56``` 57ESEL= 58"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..." 59``` 60 61Note that there are 16 bytes of IPMI SEL data before the PEL data starts. 62 63#### _PID 64 65This keyword that contains the application's PID is added automatically by the 66phosphor-logging daemon when the `commit` or `report` APIs are used to create 67an event log, but not when the `Create` D-Bus method is used. If a caller of 68the `Create` API wishes to have their PID captured in the PEL this should be 69used. 70 71This will be added to the PEL in a section of type User Data (UD), along with 72the application name it corresponds to. 73 74The syntax is: 75``` 76_PID=<PID of application> 77e.g. 78_PID="12345" 79``` 80 81#### CALLOUT_INVENTORY_PATH 82 83This is used to pass in an inventory item to use as a callout. See [here for 84details](#passing-callouts-in-with-the-additionaldata-property) 85 86#### CALLOUT_PRIORITY 87 88This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of 89that FRU callout. If not specified, the default priority is "H"/High Priority. 90 91The possible values are: 92- "H": High Priority 93- "M": Medium Priority 94- "L": Low Priority 95 96See [here for details](#passing-callouts-in-with-the-additionaldata-property) 97 98#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 99 100This is used to pass in a device path to create callouts from. See [here for 101details](#passing-callouts-in-with-the-additionaldata-property) 102 103#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 104 105This is used to pass in an I2C bus and address to create callouts from. See 106[here for details](#passing-callouts-in-with-the-additionaldata-property) 107 108### FFDC Intended For UserData PEL sections 109 110When one needs to add FFDC into the PEL UserData sections, the 111`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create` 112interface must be used when creating a new event log. This method takes a list 113of files to store in the PEL UserData sections. 114 115That API is the same as the 'Create' one, except it has a new parameter: 116 117``` 118std::vector<std::tuple<enum[FFDCFormat], 119 uint8_t, 120 uint8_t, 121 sdbusplus::message::unix_fd>> 122``` 123 124Each entry in the vector contains a file descriptor for a file that will 125be stored in a unique UserData section. The tuple's arguments are: 126 127- enum[FFDCFormat]: The data format type, the options are: 128 - 'JSON' 129 - The parser will use nlohmann::json\'s pretty print 130 - 'CBOR' 131 - The parser will use nlohmann::json\'s pretty print 132 - 'Text' 133 - The parser will output ASCII text 134 - 'Custom' 135 - The parser will hexdump the data, unless there is a parser registered 136 for this component ID and subtype. 137- uint8_t: subType 138 - Useful for the 'custom' type. Not used with the other types. 139- uint8_t: version 140 - The version of the data. 141 - Used for the custom type. 142 - Not planning on using for JSON/BSON unless a reason to do so appears. 143- unixfd - The file descriptor for the opened file that contains the 144 contents. The file descriptor can be closed and the file can be deleted if 145 desired after the method call. 146 147An example of saving JSON data to a file and getting its file descriptor is: 148 149``` 150nlohmann::json json = ...; 151auto jsonString = json.dump(); 152FILE* fp = fopen(filename, "w"); 153fwrite(jsonString.data(), 1, jsonString.size(), fp); 154int fd = fileno(fp); 155``` 156 157Alternatively, 'open()' can be used to obtain the file descriptor of the file. 158 159Upon receiving this data, the PEL code will create UserData sections for each 160entry in that vector with the following UserData fields: 161 162- Section header component ID: 163 - If the type field from the tuple is "custom", use the component ID from 164 the message registry. 165 - Otherwise, set the component ID to the phosphor-logging component ID so 166 that the parser knows to use the built in parsers (e.g. json) for the 167 type. 168- Section header subtype: The subtype field from the tuple. 169- Section header version: The version field from the tuple. 170- Section data: The data from the file. 171 172If there is a peltool parser registered for the custom type (method is TBD), 173that will be used by peltool to print the data, otherwise it will be hexdumped. 174 175Before adding each of these UserData sections, a check will be done to see if 176the PEL size will remain under the maximum size of 16KB. If not, the UserData 177section will be truncated down enough so that it will fit into the 16KB. 178 179## Default UserData sections for BMC created PELs 180 181The extension code that creates PELs will add these UserData sections to every 182PEL: 183 184- The AdditionalData property contents 185 - If the AdditionalData property in the OpenBMC event log has anything in it, 186 it will be saved in a UserData section as a JSON string. 187 188- System information 189 - This section contains various pieces of system information, such as the 190 full code level and the BMC, chassis, and host state properties. 191 192## The PEL Message Registry 193 194The PEL message registry is used to create PELs from OpenBMC event logs. 195Documentation can be found [here](registry/README.md). 196 197## Callouts 198 199A callout points to a FRU, a symbolic FRU, or an isolation procedure. There 200can be from zero to ten of them in each PEL, where they are located in the SRC 201section. 202 203There are a few different ways to add callouts to a PEL. In all cases, the 204callouts will be sorted from highest to lowest priority within the PEL after 205they are added. 206 207### Passing callouts in with the AdditionalData property 208 209The PEL code can add callouts based on the values of special entries in the 210AdditionalData event log property. They are: 211 212- CALLOUT_INVENTORY_PATH 213 214 This keyword is used to call out a single FRU by passing in its D-Bus 215 inventory path. When the PEL code sees this, it will create a single FRU 216 callout, using the VPD properties (location code, FN, CCIN) from that 217 inventory item. If that item is not a FRU itself and does not have a 218 location code, it will keep searching its parents until it finds one that 219 is. 220 221 The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY 222 keyword is also present and contains a different priority in which case it 223 will be used instead. This can be useful when a maintenance procedure with 224 a high priority callout is specified for this error in the message registry 225 and the FRU callout needs to have a different priority. 226 227 ``` 228 CALLOUT_INVENTORY_PATH= 229 "/xyz/openbmc_project/inventory/system/chassis/motherboard" 230 ``` 231 232- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 233 234 These keywords are required as a pair to indicate faulty device 235 communication, usually detected by a failure accessing a device at that 236 sysfs path. The PEL code will use a data table generated by the MRW to map 237 these device paths to FRU callout lists. The errno value may influence the 238 callout. 239 240 I2C, FSI, FSI-I2C, and FSI-SPI paths are supported. 241 242 ``` 243 CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069" 244 CALLOUT_ERRNO="2" 245 ``` 246 247- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 248 249 These 3 keywords can be used to callout a failing I2C device path when the 250 full device path isn't known. It is similar to CALLOUT_DEVICE_PATH in that 251 it will use data tables generated by the MRW to create the callouts. 252 253 CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or 254 just the bus number by itself. 255 CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number 256 if preceded with a "0x". 257 258 ``` 259 CALLOUT_IIC_BUS="/dev/i2c-7" 260 CALLOUT_IIC_ADDR="81" 261 CALLOUT_ERRNO=62 262 ``` 263 264### Defining callouts in the message registry 265 266Callouts can be completely defined inside that error's definition in the PEL 267message registry. This method allows the callouts to vary based on the system 268type or on any AdditionalData item. 269 270At a high level, this involves defining a callout section inside the registry 271entry that contain the location codes or procedure names to use, along with 272their priority. If these can vary based on system type, the type provided by 273the entity manager will be one of the keys. If they can also depend on an 274AdditionalData entry, then that will also be a key. 275 276See the message registry [README](registry/README.md) and 277[schema](registry/schema/schema.json) for the details. 278 279### Using the message registry along with CALLOUT_ entries 280 281If the message registry entry contains a callout definition and the event log 282also contains one of aforementioned CALLOUT keys in the AdditionalData 283property, then the PEL code will first add the callouts stemming from the 284CALLOUT items, followed by the callouts from the message registry. 285 286### Specifying multiple callouts using JSON format FFDC files 287 288Multiple callouts can be passed in by the creator at the time of PEL creation. 289This is done by specifying them in a JSON file that is then passed in as an 290[FFDC file](#ffdc-intended-for-userdata-pel-sections). The JSON will still be 291added into a PEL UserData section for debug. 292 293To specify that an FFDC file contains callouts, the format value for that FFDC 294entry must be set to JSON, and the subtype field must be set to 0xCA: 295 296``` 297using FFDC = std::tuple<CreateIface::FFDCFormat, 298 uint8_t, 299 uint8_t, 300 sdbusplus::message::unix_fd>; 301 302FFDC ffdc{ 303 CreateIface::FFDCFormat::JSON, 304 0xCA, // Callout subtype 305 0x01, // Callout version, set to 0x01 306 fd}; 307``` 308 309The JSON contains an array of callouts that must be in order of highest 310priority to lowest, with a maximum of 10. Any callouts after the 10th will 311just be thrown away as there is no room for them in the PEL. The format looks 312like: 313 314``` 315[ 316 { 317 // First callout 318 }, 319 { 320 // Second callout 321 }, 322 { 323 // Nth callout 324 } 325] 326``` 327 328A callout entry can be a normal hardware callout, a maintenance procedure 329callout, or a symbolic FRU callout. Each callout must contain a Priority 330field, where the possible values are: 331 332* "H" = High 333* "M" = Medium 334* "A" = Medium Group A 335* "B" = Medium Group B 336* "C" = Medium Group C 337* "L" = Low 338 339Either unexpanded location codes or D-Bus inventory object paths can be used to 340specify the called out part. An unexpanded location code does not have the 341system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so 342can be either Ufcs-P1 or just P1). 343 344#### Normal hardware FRU callout 345 346Normal hardware callouts must contain either the location code or inventory 347path, and priority. Even though the PEL code doesn't do any guarding or 348deconfiguring itself, it needs to know if either of those things occurred as 349there are status bits in the PEL to reflect them. The Guarded and Deconfigured 350fields are used for this. Those fields are optional and if omitted then their 351values will be false. 352 353When the inventory path of a sub-FRU is passed in, the PEL code will put the 354location code of the parent FRU into the callout. 355 356``` 357{ 358 "LocationCode": "P0-C1", 359 "Priority": "H" 360} 361 362{ 363 "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5", 364 "Priority": "H", 365 "Deconfigured": true, 366 "Guarded": true 367} 368 369``` 370 371MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally 372be added to callouts to specify failing devices on a FRU. These may be used 373during the manufacturing test process, where there may be the ability to do 374these replacements. There can be up to 15 MRUs, each with its own priority, 375embedded in a callout. The possible priority values match the FRU priority 376values. 377 378Note that since JSON only supports numbers in decimal and not in hex, MRU IDs 379will show up as decimal when visually inspecting the JSON. 380 381``` 382{ 383 "LocationCode": "P0-C1", 384 "Priority": "H", 385 "MRUs": [ 386 { 387 "ID": 1234, 388 "Priority": "H" 389 }, 390 { 391 "ID": 5678, 392 "Priority": "H" 393 } 394 ] 395} 396``` 397 398#### Maintenance procedure callout 399 400The LocationCode field is not used with procedure callouts. Only the first 7 401characters of the Procedure field will be used by the PEL. 402 403``` 404{ 405 "Procedure": "PRONAME", 406 "Priority": "H" 407} 408``` 409 410#### Symbolic FRU callout 411 412Only the first seven characters of the SymbolicFRU field will be used by the PEL. 413 414If the TrustedLocationCode field is present and set to true, this means the 415location code may be used to turn on service indicators, so the LocationCode 416field is required. If TrustedLocationCode is false or missing, then the 417LocationCode field is optional. 418 419``` 420{ 421 "TrustedLocationCode": true, 422 "Location Code": "P0-C1", 423 "Priority": "H", 424 "SymbolicFRU": "FRUNAME" 425} 426``` 427 428## `Action Flags` and `Event Type` Rules 429 430The `Action Flags` and `Event Type` PEL fields are optional in the message 431registry, and if not present the code will set them based on certain rules 432layed out in the PEL spec. 433 434These rules are: 4351. Always set the `Report` flag, unless the `Do Not Report` flag is already on. 4362. Always clear the `SP Call Home` flag, as that feature isn't supported. 4373. If the severity is `Non-error Event`: 438 - Clear the `Service Action` flag. 439 - Clear the `Call Home` flag. 440 - If the `Event Type` field is `Not Applicable`, change it to `Information 441 Only`. 442 - If the `Event Type` field is `Information Only` or `Tracing`, set the 443 `Hidden` flag. 4444. If the severity is `Recovered`: 445 - Set the `Hidden` flag. 446 - Clear the `Service Action` flag. 447 - Clear the `Call Home` flag. 4485. For all other severities: 449 - Clear the `Hidden` flag. 450 - Set the `Service Action` flag. 451 - Set the `Call Home` flag. 452 453Additional rules may be added in the future if necessary. 454 455## D-Bus Interfaces 456 457See the org.open_power.Logging.PEL interface definition for the most up to date 458information. 459 460## PEL Retention 461 462The PEL repository is allocated a set amount of space on the BMC. When that 463space gets close to being full, the code will remove a percentage of PELs to 464make room for new ones. In addition, the code will keep a cap on the total 465number of PELs allowed. Note that removing a PEL will also remove the 466corresponding OpenBMC event log. 467 468The disk capacity limit is set to 20MB, and the number limit is 3000. 469 470The rules used to remove logs are listed below. The checks will be run after a 471PEL has been added and the method to create the PEL has returned to the caller, 472i.e. run when control gets back to the event loop. 473 474### Removal Algorithm 475 476If the size used is 95% or under of the allocated space and under the limit on 477the number of PELs, nothing further needs to be done, otherwise continue and 478run all 5 of the following steps. Each step itself only deletes PELs until it 479meets its requirement and then it stops. 480 481The steps are: 482 4831. Remove BMC created informational PELs until they take up 15% or less of the 484 allocated space. 485 4862. Remove BMC created non-informational PELs until they take up 30% or less of 487 the allocated space. 488 4893. Remove non-BMC created informational PELs until they take up 15% or less of 490 the allocated space. 491 4924. Remove non-BMC created non-informational PELs until they take up 30% or less 493 of the allocated space. 494 4955. After the previous 4 steps are complete, if there are still more than the 496 maximum number of PELs, remove PELs down to 80% of the maximum. 497 498PELs with associated guard records will never be deleted. Each step above 499makes the following 4 passes, stopping as soon as its limit is reached: 500 501Pass 1. Remove HMC acknowledged PELs.<br> 502Pass 2. Remove OS acknowledged PELs.<br> 503Pass 3. Remove PHYP acknowledged PELs.<br> 504Pass 4. Remove all PELs. 505 506After all these steps, disk capacity will be at most 90% (15% + 30% + 15% + 50730%). 508 509## Adding python3 modules for PEL UserData and SRC parsing 510 511In order to support python3 modules for the parsing of PEL User Data sections 512and to decode SRC data, setuptools is used to import python3 packages from 513external repos to be included in the OpenBMC image. 514``` 515Sample layout for setuptools: 516 517setup.py 518src/usr/scom/plugins/ebmc/b0300.py 519src/usr/i2c/plugins/ebmc/b0700.py 520src/build/tools/ebmc/errludP_Helpers.py 521``` 522 523`setup.py` is the build script for setuptools. It contains information about the 524package (such as the name and version) as well as which code files to include. 525 526The setup.py template to be used for eBMC User Data parsers: 527``` 528import os.path 529from setuptools import setup 530 531# To update this dict with new key/value pair for every component added 532# Key: The package name to be installed as 533# Value: The path containing the package's python modules 534dirmap = { 535 "b0300": "src/usr/scom/plugins/ebmc", 536 "b0700": "src/usr/i2c/plugins/ebmc", 537 "helpers": "src/build/tools/ebmc" 538} 539 540# All packages will be installed under 'udparsers' namespace 541def get_package_name(dirmap_key): 542 return "udparsers.{}".format(dirmap_key) 543 544def get_package_dirent(dirmap_item): 545 package_name = get_package_name(dirmap_item[0]) 546 package_dir = dirmap_item[1] 547 return (package_name, package_dir) 548 549def get_packages(): 550 return map(get_package_name, dirmap.keys()) 551 552def get_package_dirs(): 553 return map(get_package_dirent, dirmap.items()) 554 555setup( 556 name="Hostboot", 557 version="0.1", 558 packages=list(get_packages()), 559 package_dir=dict(get_package_dirs()) 560) 561``` 562- User Data parser module 563 - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the 564 Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID 565 from the User Data section itself (in HEX). All should be converted to 566 lowercase. 567 - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100 568 - Function to provide: `parseUDToJson` 569 - Argument list: 570 1. (int) Sub-section type 571 2. (int) Section version 572 3. (memoryview): Data 573 - Return data: 574 1. (str) JSON string 575 576 - Sample User Data parser module: 577 ``` 578 import json 579 def parseUDToJson(subType, ver, data): 580 d = dict() 581 ... 582 # Parse and populate data into dictionary 583 ... 584 jsonStr = json.dumps(d) 585 return jsonStr 586 ``` 587- SRC parser module 588 - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the 589 Private Header section (in ASCII, converted to lowercase). 590 - For example: `bsrc.py` for Hostboot generated SRCs 591 - Function to provide: `parseSRCToJson` 592 - Argument list: 593 1. (str) Refcode ASCII string 594 2. (str) Hexword 2 595 3. (str) Hexword 3 596 4. (str) Hexword 4 597 5. (str) Hexword 5 598 6. (str) Hexword 6 599 7. (str) Hexword 7 600 8. (str) Hexword 8 601 9. (str) Hexword 9 602 - Return data: 603 1. (str) JSON string 604 605 - Sample SRC parser module: 606 ``` 607 import json 608 def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \ 609 word8, word9): 610 d = dict() 611 ... 612 # Decode SRC data into dictionary 613 ... 614 jsonStr = json.dumps(d) 615 return jsonStr 616 ``` 617 618## Fail Boot on Host Errors 619 620The fail boot on hw error [design][1] provides a function where a system owner 621can tell the firmware to fail the boot of a system if a BMC phosphor-logging 622event has a hardware callout in it. 623 624It is required that when this fail boot on hardware error setting is enabled, 625that the BMC fail the boot for **any** error from the host which satisfies the 626following criteria: 627- not SeverityType::nonError 628- has a callout of any kind from the `FailingComponentType` structure 629 630[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md 631