1# OpenPower Platform Event Log (PEL) extension 2 3This extension will create PELs for every OpenBMC event log. It is also possible 4to point to the raw PEL to use in the OpenBMC event, and then that will be used 5instead of creating one. 6 7## Contents 8 9- [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log) 10- [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels) 11- [The PEL Message Registry](#the-pel-message-registry) 12- [Callouts](#callouts) 13- [Action Flags and Event Type Rules](#action-flags-and-event-type-rules) 14- [D-Bus Interfaces](#d-bus-interfaces) 15- [PEL Retention](#pel-retention) 16- [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing) 17- [Fail Boot on Host Errors](#fail-boot-on-host-errors) 18 19## Passing PEL related data within an OpenBMC event log 20 21An error log creator can pass in data that is relevant to a PEL by using certain 22keywords in the AdditionalData property of the event log. 23 24### AdditionalData keywords 25 26#### RAWPEL 27 28This keyword is used to point to an existing PEL in a binary file that should be 29associated with this event log. The syntax is: 30 31``` 32RAWPEL=<path to PEL File> 33e.g. 34RAWPEL="/tmp/pels/pel.5" 35``` 36 37The code will assign its own error log ID to this PEL, and also update the 38commit timestamp field to the current time. 39 40#### POWER_THERMAL_CRITICAL_FAULT 41 42This keyword is used to set the power fault bit in PEL. The syntax is: 43 44``` 45POWER_THERMAL_CRITICAL_FAULT=<FLAG> 46e.g. 47POWER_THERMAL_CRITICAL_FAULT=TRUE 48``` 49 50Note that TRUE is the only value supported. 51 52#### SEVERITY_DETAIL 53 54This is used when the passed in event log severity determines the PEL severity 55and a more granular PEL severity is needed beyond what the normal event log to 56PEL severity conversion could give. 57 58The syntax is: 59 60``` 61SEVERITY_DETAIL=<SEVERITY_TYPE> 62e.g. 63SEVERITY_DETAIL=SYSTEM_TERM 64``` 65 66Option Supported: 67 68- SYSTEM_TERM, changes the Severity value from 0x50 to 0x51 69 70#### ESEL 71 72This keyword's data contains a full PEL in string format. This is how hostboot 73sends down PELs when it is configured in IPMI communication mode. The PEL is 74handled just like the PEL obtained using the RAWPEL keyword. 75 76The syntax is: 77 78``` 79ESEL= 80"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..." 81``` 82 83Note that there are 16 bytes of IPMI SEL data before the PEL data starts. 84 85#### \_PID 86 87This keyword that contains the application's PID is added automatically by the 88phosphor-logging daemon when the `commit` or `report` APIs are used to create an 89event log, but not when the `Create` D-Bus method is used. If a caller of the 90`Create` API wishes to have their PID captured in the PEL this should be used. 91 92This will be added to the PEL in a section of type User Data (UD), along with 93the application name it corresponds to. 94 95The syntax is: 96 97``` 98_PID=<PID of application> 99e.g. 100_PID="12345" 101``` 102 103#### CALLOUT_INVENTORY_PATH 104 105This is used to pass in an inventory item to use as a callout. See 106[here for details](#passing-callouts-in-with-the-additionaldata-property) 107 108#### CALLOUT_PRIORITY 109 110This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of 111that FRU callout. If not specified, the default priority is "H"/High Priority. 112 113The possible values are: 114 115- "H": High Priority 116- "M": Medium Priority 117- "L": Low Priority 118 119See [here for details](#passing-callouts-in-with-the-additionaldata-property) 120 121#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 122 123This is used to pass in a device path to create callouts from. See 124[here for details](#passing-callouts-in-with-the-additionaldata-property) 125 126#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 127 128This is used to pass in an I2C bus and address to create callouts from. See 129[here for details](#passing-callouts-in-with-the-additionaldata-property) 130 131#### PEL_SUBSYSTEM 132 133This keyword is used to pass in the subsystem that should be associated with 134this event log. The syntax is: PEL_SUBSYSTEM=<subsystem value in hex> e.g. 135PEL_SUBSYSTEM=0x20 136 137### FFDC Intended For UserData PEL sections 138 139When one needs to add FFDC into the PEL UserData sections, the 140`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create` 141interface must be used when creating a new event log. This method takes a list 142of files to store in the PEL UserData sections. 143 144That API is the same as the 'Create' one, except it has a new parameter: 145 146``` 147std::vector<std::tuple<enum[FFDCFormat], 148 uint8_t, 149 uint8_t, 150 sdbusplus::message::unix_fd>> 151``` 152 153Each entry in the vector contains a file descriptor for a file that will be 154stored in a unique UserData section. The tuple's arguments are: 155 156- enum[FFDCFormat]: The data format type, the options are: 157 - 'JSON' 158 - The parser will use nlohmann::json\'s pretty print 159 - 'CBOR' 160 - The parser will use nlohmann::json\'s pretty print 161 - 'Text' 162 - The parser will output ASCII text 163 - 'Custom' 164 - The parser will hexdump the data, unless there is a parser registered for 165 this component ID and subtype. 166- uint8_t: subType 167 - Useful for the 'custom' type. Not used with the other types. 168- uint8_t: version 169 - The version of the data. 170 - Used for the custom type. 171 - Not planning on using for JSON/BSON unless a reason to do so appears. 172- unixfd - The file descriptor for the opened file that contains the contents. 173 The file descriptor can be closed and the file can be deleted if desired after 174 the method call. 175 176An example of saving JSON data to a file and getting its file descriptor is: 177 178``` 179nlohmann::json json = ...; 180auto jsonString = json.dump(); 181FILE* fp = fopen(filename, "w"); 182fwrite(jsonString.data(), 1, jsonString.size(), fp); 183int fd = fileno(fp); 184``` 185 186Alternatively, 'open()' can be used to obtain the file descriptor of the file. 187 188Upon receiving this data, the PEL code will create UserData sections for each 189entry in that vector with the following UserData fields: 190 191- Section header component ID: 192 - If the type field from the tuple is "custom", use the component ID from the 193 message registry. 194 - Otherwise, set the component ID to the phosphor-logging component ID so that 195 the parser knows to use the built in parsers (e.g. json) for the type. 196- Section header subtype: The subtype field from the tuple. 197- Section header version: The version field from the tuple. 198- Section data: The data from the file. 199 200If there is a peltool parser registered for the custom type (method is TBD), 201that will be used by peltool to print the data, otherwise it will be hexdumped. 202 203Before adding each of these UserData sections, a check will be done to see if 204the PEL size will remain under the maximum size of 16KB. If not, the UserData 205section will be truncated down enough so that it will fit into the 16KB. 206 207## Default UserData sections for BMC created PELs 208 209The extension code that creates PELs will add these UserData sections to every 210PEL: 211 212- The AdditionalData property contents 213 214 - If the AdditionalData property in the OpenBMC event log has anything in it, 215 it will be saved in a UserData section as a JSON string. 216 217- System information 218 - This section contains various pieces of system information, such as the full 219 code level and the BMC, chassis, and host state properties. 220 221## The PEL Message Registry 222 223The PEL message registry is used to create PELs from OpenBMC event logs. 224Documentation can be found [here](registry/README.md). 225 226## Callouts 227 228A callout points to a FRU, a symbolic FRU, or an isolation procedure. There can 229be from zero to ten of them in each PEL, where they are located in the SRC 230section. 231 232There are a few different ways to add callouts to a PEL. In all cases, the 233callouts will be sorted from highest to lowest priority within the PEL after 234they are added. 235 236### Passing callouts in with the AdditionalData property 237 238The PEL code can add callouts based on the values of special entries in the 239AdditionalData event log property. They are: 240 241- CALLOUT_INVENTORY_PATH 242 243 This keyword is used to call out a single FRU by passing in its D-Bus 244 inventory path. When the PEL code sees this, it will create a single FRU 245 callout, using the VPD properties (location code, FN, CCIN) from that 246 inventory item. If that item is not a FRU itself and does not have a location 247 code, it will keep searching its parents until it finds one that is. 248 249 The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY 250 keyword is also present and contains a different priority in which case it 251 will be used instead. This can be useful when a maintenance procedure with a 252 high priority callout is specified for this error in the message registry and 253 the FRU callout needs to have a different priority. 254 255 ``` 256 CALLOUT_INVENTORY_PATH= 257 "/xyz/openbmc_project/inventory/system/chassis/motherboard" 258 ``` 259 260- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 261 262 These keywords are required as a pair to indicate faulty device communication, 263 usually detected by a failure accessing a device at that sysfs path. The PEL 264 code will use a data table generated by the MRW to map these device paths to 265 FRU callout lists. The errno value may influence the callout. 266 267 I2C, FSI, FSI-I2C, and FSI-SPI paths are supported. 268 269 ``` 270 CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069" 271 CALLOUT_ERRNO="2" 272 ``` 273 274- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 275 276 These 3 keywords can be used to callout a failing I2C device path when the 277 full device path isn't known. It is similar to CALLOUT_DEVICE_PATH in that it 278 will use data tables generated by the MRW to create the callouts. 279 280 CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or just 281 the bus number by itself. CALLOUT_IIC_ADDR is the 7 bit address either as a 282 decimal or a hex number if preceded with a "0x". 283 284 ``` 285 CALLOUT_IIC_BUS="/dev/i2c-7" 286 CALLOUT_IIC_ADDR="81" 287 CALLOUT_ERRNO=62 288 ``` 289 290### Defining callouts in the message registry 291 292Callouts can be completely defined inside that error's definition in the PEL 293message registry. This method allows the callouts to vary based on the system 294type or on any AdditionalData item. 295 296At a high level, this involves defining a callout section inside the registry 297entry that contain the location codes or procedure names to use, along with 298their priority. If these can vary based on system type, the type provided by the 299entity manager will be one of the keys. If they can also depend on an 300AdditionalData entry, then that will also be a key. 301 302See the message registry [README](registry/README.md) and 303[schema](registry/schema/schema.json) for the details. 304 305### Using the message registry along with CALLOUT\_ entries 306 307If the message registry entry contains a callout definition and the event log 308also contains one of aforementioned CALLOUT keys in the AdditionalData property, 309then the PEL code will first add the callouts stemming from the CALLOUT items, 310followed by the callouts from the message registry. 311 312### Specifying multiple callouts using JSON format FFDC files 313 314Multiple callouts can be passed in by the creator at the time of PEL creation. 315This is done by specifying them in a JSON file that is then passed in as an 316[FFDC file](#ffdc-intended-for-userdata-pel-sections). The JSON will still be 317added into a PEL UserData section for debug. 318 319To specify that an FFDC file contains callouts, the format value for that FFDC 320entry must be set to JSON, and the subtype field must be set to 0xCA: 321 322``` 323using FFDC = std::tuple<CreateIface::FFDCFormat, 324 uint8_t, 325 uint8_t, 326 sdbusplus::message::unix_fd>; 327 328FFDC ffdc{ 329 CreateIface::FFDCFormat::JSON, 330 0xCA, // Callout subtype 331 0x01, // Callout version, set to 0x01 332 fd}; 333``` 334 335The JSON contains an array of callouts that must be in order of highest priority 336to lowest, with a maximum of 10. Any callouts after the 10th will just be thrown 337away as there is no room for them in the PEL. The format looks like: 338 339``` 340[ 341 { 342 // First callout 343 }, 344 { 345 // Second callout 346 }, 347 { 348 // Nth callout 349 } 350] 351``` 352 353A callout entry can be a normal hardware callout, a maintenance procedure 354callout, or a symbolic FRU callout. Each callout must contain a Priority field, 355where the possible values are: 356 357- "H" = High 358- "M" = Medium 359- "A" = Medium Group A 360- "B" = Medium Group B 361- "C" = Medium Group C 362- "L" = Low 363 364Either unexpanded location codes or D-Bus inventory object paths can be used to 365specify the called out part. An unexpanded location code does not have the 366system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so 367can be either Ufcs-P1 or just P1). 368 369#### Normal hardware FRU callout 370 371Normal hardware callouts must contain either the location code or inventory 372path, and priority. Even though the PEL code doesn't do any guarding or 373deconfiguring itself, it needs to know if either of those things occurred as 374there are status bits in the PEL to reflect them. The Guarded and Deconfigured 375fields are used for this. Those fields are optional and if omitted then their 376values will be false. 377 378When the inventory path of a sub-FRU is passed in, the PEL code will put the 379location code of the parent FRU into the callout. 380 381``` 382{ 383 "LocationCode": "P0-C1", 384 "Priority": "H" 385} 386 387{ 388 "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5", 389 "Priority": "H", 390 "Deconfigured": true, 391 "Guarded": true 392} 393 394``` 395 396MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally be 397added to callouts to specify failing devices on a FRU. These may be used during 398the manufacturing test process, where there may be the ability to do these 399replacements. There can be up to 15 MRUs, each with its own priority, embedded 400in a callout. The possible priority values match the FRU priority values. 401 402Note that since JSON only supports numbers in decimal and not in hex, MRU IDs 403will show up as decimal when visually inspecting the JSON. 404 405``` 406{ 407 "LocationCode": "P0-C1", 408 "Priority": "H", 409 "MRUs": [ 410 { 411 "ID": 1234, 412 "Priority": "H" 413 }, 414 { 415 "ID": 5678, 416 "Priority": "H" 417 } 418 ] 419} 420``` 421 422#### Maintenance procedure callout 423 424The LocationCode field is not used with procedure callouts. Only the first 7 425characters of the Procedure field will be used by the PEL. 426 427``` 428{ 429 "Procedure": "PRONAME", 430 "Priority": "H" 431} 432``` 433 434#### Symbolic FRU callout 435 436Only the first seven characters of the SymbolicFRU field will be used by the 437PEL. 438 439If the TrustedLocationCode field is present and set to true, this means the 440location code may be used to turn on service indicators, so the LocationCode 441field is required. If TrustedLocationCode is false or missing, then the 442LocationCode field is optional. 443 444``` 445{ 446 "TrustedLocationCode": true, 447 "Location Code": "P0-C1", 448 "Priority": "H", 449 "SymbolicFRU": "FRUNAME" 450} 451``` 452 453## `Action Flags` and `Event Type` Rules 454 455The `Action Flags` and `Event Type` PEL fields are optional in the message 456registry, and if not present the code will set them based on certain rules layed 457out in the PEL spec. 458 459These rules are: 460 4611. Always set the `Report` flag, unless the `Do Not Report` flag is already on. 4622. Always clear the `SP Call Home` flag, as that feature isn't supported. 4633. If the severity is `Non-error Event`: 464 - Clear the `Service Action` flag. 465 - Clear the `Call Home` flag. 466 - If the `Event Type` field is `Not Applicable`, change it to 467 `Information Only`. 468 - If the `Event Type` field is `Information Only` or `Tracing`, set the 469 `Hidden` flag. 4704. If the severity is `Recovered`: 471 - Set the `Hidden` flag. 472 - Clear the `Service Action` flag. 473 - Clear the `Call Home` flag. 4745. For all other severities: 475 - Clear the `Hidden` flag. 476 - Set the `Service Action` flag. 477 - Set the `Call Home` flag. 478 479Additional rules may be added in the future if necessary. 480 481## D-Bus Interfaces 482 483See the org.open_power.Logging.PEL interface definition for the most up to date 484information. 485 486## PEL Retention 487 488The PEL repository is allocated a set amount of space on the BMC. When that 489space gets close to being full, the code will remove a percentage of PELs to 490make room for new ones. In addition, the code will keep a cap on the total 491number of PELs allowed. Note that removing a PEL will also remove the 492corresponding OpenBMC event log. 493 494The disk capacity limit is set to 20MB, and the number limit is 3000. 495 496The rules used to remove logs are listed below. The checks will be run after a 497PEL has been added and the method to create the PEL has returned to the caller, 498i.e. run when control gets back to the event loop. 499 500### Removal Algorithm 501 502If the size used is 95% or under of the allocated space and under the limit on 503the number of PELs, nothing further needs to be done, otherwise continue and run 504all 5 of the following steps. Each step itself only deletes PELs until it meets 505its requirement and then it stops. 506 507The steps are: 508 5091. Remove BMC created informational PELs until they take up 15% or less of the 510 allocated space. 511 5122. Remove BMC created non-informational PELs until they take up 30% or less of 513 the allocated space. 514 5153. Remove non-BMC created informational PELs until they take up 15% or less of 516 the allocated space. 517 5184. Remove non-BMC created non-informational PELs until they take up 30% or less 519 of the allocated space. 520 5215. After the previous 4 steps are complete, if there are still more than the 522 maximum number of PELs, remove PELs down to 80% of the maximum. 523 524PELs with associated guard records will never be deleted. Each step above makes 525the following 4 passes, stopping as soon as its limit is reached: 526 527Pass 1. Remove HMC acknowledged PELs.<br> Pass 2. Remove OS acknowledged 528PELs.<br> Pass 3. Remove PHYP acknowledged PELs.<br> Pass 4. Remove all PELs. 529 530After all these steps, disk capacity will be at most 90% (15% + 30% + 15% + 53130%). 532 533## Adding python3 modules for PEL UserData and SRC parsing 534 535In order to support python3 modules for the parsing of PEL User Data sections 536and to decode SRC data, setuptools is used to import python3 packages from 537external repos to be included in the OpenBMC image. 538 539``` 540Sample layout for setuptools: 541 542setup.py 543src/usr/scom/plugins/ebmc/b0300.py 544src/usr/i2c/plugins/ebmc/b0700.py 545src/build/tools/ebmc/errludP_Helpers.py 546``` 547 548`setup.py` is the build script for setuptools. It contains information about the 549package (such as the name and version) as well as which code files to include. 550 551The setup.py template to be used for eBMC User Data parsers: 552 553``` 554import os.path 555from setuptools import setup 556 557# To update this dict with new key/value pair for every component added 558# Key: The package name to be installed as 559# Value: The path containing the package's python modules 560dirmap = { 561 "b0300": "src/usr/scom/plugins/ebmc", 562 "b0700": "src/usr/i2c/plugins/ebmc", 563 "helpers": "src/build/tools/ebmc" 564} 565 566# All packages will be installed under 'udparsers' namespace 567def get_package_name(dirmap_key): 568 return "udparsers.{}".format(dirmap_key) 569 570def get_package_dirent(dirmap_item): 571 package_name = get_package_name(dirmap_item[0]) 572 package_dir = dirmap_item[1] 573 return (package_name, package_dir) 574 575def get_packages(): 576 return map(get_package_name, dirmap.keys()) 577 578def get_package_dirs(): 579 return map(get_package_dirent, dirmap.items()) 580 581setup( 582 name="Hostboot", 583 version="0.1", 584 packages=list(get_packages()), 585 package_dir=dict(get_package_dirs()) 586) 587``` 588 589- User Data parser module 590 591 - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the Private 592 Header section (in ASCII) and `zzzz` is the 2 byte Component ID from the 593 User Data section itself (in HEX). All should be converted to lowercase. 594 - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100 595 - Function to provide: `parseUDToJson` 596 597 - Argument list: 598 1. (int) Sub-section type 599 2. (int) Section version 600 3. (memoryview): Data 601 - Return data: 602 1. (str) JSON string 603 604 - Sample User Data parser module: 605 ``` 606 import json 607 def parseUDToJson(subType, ver, data): 608 d = dict() 609 ... 610 # Parse and populate data into dictionary 611 ... 612 jsonStr = json.dumps(d) 613 return jsonStr 614 ``` 615 616- SRC parser module 617 618 - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the Private 619 Header section (in ASCII, converted to lowercase). 620 - For example: `bsrc.py` for Hostboot generated SRCs 621 - Function to provide: `parseSRCToJson` 622 623 - Argument list: 624 1. (str) Refcode ASCII string 625 2. (str) Hexword 2 626 3. (str) Hexword 3 627 4. (str) Hexword 4 628 5. (str) Hexword 5 629 6. (str) Hexword 6 630 7. (str) Hexword 7 631 8. (str) Hexword 8 632 9. (str) Hexword 9 633 - Return data: 634 1. (str) JSON string 635 636 - Sample SRC parser module: 637 ``` 638 import json 639 def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \ 640 word8, word9): 641 d = dict({'A': 1, 'B': 2}) 642 ... 643 # Decode SRC data into dictionary 644 ... 645 jsonStr = json.dumps(d) 646 return jsonStr 647 ``` 648 649## Fail Boot on Host Errors 650 651The fail boot on hw error [design][1] provides a function where a system owner 652can tell the firmware to fail the boot of a system if a BMC phosphor-logging 653event has a hardware callout in it. 654 655It is required that when this fail boot on hardware error setting is enabled, 656that the BMC fail the boot for **any** error from the host which satisfies the 657following criteria: 658 659- not SeverityType::nonError 660- has a callout of any kind from the `FailingComponentType` structure 661 662## Self Boot Engine(SBE) First Failure Data Capture(FFDC) Support 663 664During SBE chip-op failure SBE creates FFDC with custom data format. SBE FFDC 665contains different packets, which include SBE internal failure related Trace and 666user data also Hardware procedure failure FFDC created by FAPI infrastructure. 667PEL infrastructure provides support to process SBE FFDC packets created by FAPI 668infrastructure during hardware procedure execution failures, also add callouts, 669user data section information based on FAPI processing in case non FAPI based 670failure, just keeps the raw FFDC data in the user section to support SBE parser 671plugins. 672 673CreatePELWithFFDCFiles D-Bus method on the `org.open_power.Logging.PEL` 674interface must be used when creating a new event log. 675 676To specify that an FFDC file contains SBE FFDC, the format value for that FFDC 677entry must be set to "custom", and the subtype field must be set to 0xCB: 678 679``` 680using FFDC = std::tuple<CreateIface::FFDCFormat, 681 uint8_t, 682 uint8_t, 683 sdbusplus::message::unix_fd>; 684 685FFDC ffdc{ 686 CreateIface::FFDCFormat::custom, 687 0xCB, // SBE FFDC subtype 688 0x01, // SBE FFDC version, set to 0x01 689 fd}; 690``` 691 692"SRC6" Keyword in the additional data section should be populated with below. 693 694- [0:15] chip position (hex) 695- [16:23] command class (hex) 696- [24:31] command (hex) 697 698e.g for GetSCOM 699 700SRC6="0002A201" 701 702Note: "phal" build-time configure option should be "enabled" to enable this 703feature. 704 705## PEL Archiving 706 707When an OpenBMC event log is deleted its corresponding PEL is moved to an 708archive folder. These archived PELs will be available in BMC dump. The archive 709path: /var/lib/phosphor-logging/extensions/pels/logs/archive. 710 711Highlighted points are: 712 713- PELs whose corresponding event logs have been deleted will be available in the 714 archive folder. 715- Archive folder size is tracked along with logs folder size and if combined 716 size exceeds warning size all archived PELs will be deleted. 717- Archived PEL logs can be viewed using peltool with flag --archive. 718- If a PEL is deleted using peltool its not archived. 719 720[1]: 721 https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md 722