1# OpenPower Platform Event Log (PEL) extension 2 3This extension will create PELs for every OpenBMC event log. It is also 4possible to point to the raw PEL to use in the OpenBMC event, and then that 5will be used instead of creating one. 6 7## Contents 8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log) 9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels) 10* [The PEL Message Registry](#the-pel-message-registry) 11* [Callouts](#callouts) 12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules) 13* [D-Bus Interfaces](#d-bus-interfaces) 14* [PEL Retention](#pel-retention) 15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing) 16* [Fail Boot on Host Errors](#fail-boot-on-host-errors) 17 18## Passing PEL related data within an OpenBMC event log 19 20An error log creator can pass in data that is relevant to a PEL by using 21certain keywords in the AdditionalData property of the event log. 22 23### AdditionalData keywords 24 25#### RAWPEL 26 27This keyword is used to point to an existing PEL in a binary file that should 28be associated with this event log. The syntax is: 29``` 30RAWPEL=<path to PEL File> 31e.g. 32RAWPEL="/tmp/pels/pel.5" 33``` 34The code will assign its own error log ID to this PEL, and also update the 35commit timestamp field to the current time. 36 37#### POWER_THERMAL_CRITICAL_FAULT 38 39This keyword is used to set the power fault bit in PEL. The syntax is: 40``` 41POWER_THERMAL_CRITICAL_FAULT=<FLAG> 42e.g. 43POWER_THERMAL_CRITICAL_FAULT=TRUE 44``` 45 46Note that TRUE is the only value supported. 47 48#### SEVERITY_DETAIL 49 50This is used when the passed in event log severity determines the PEL 51severity and a more granular PEL severity is needed beyond what the normal 52event log to PEL severity conversion could give. 53 54The syntax is: 55``` 56SEVERITY_DETAIL=<SEVERITY_TYPE> 57e.g. 58SEVERITY_DETAIL=SYSTEM_TERM 59``` 60Option Supported: 61- SYSTEM_TERM, changes the Severity value from 0x50 to 0x51 62 63#### ESEL 64 65This keyword's data contains a full PEL in string format. This is how hostboot 66sends down PELs when it is configured in IPMI communication mode. The PEL is 67handled just like the PEL obtained using the RAWPEL keyword. 68 69The syntax is: 70 71``` 72ESEL= 73"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..." 74``` 75 76Note that there are 16 bytes of IPMI SEL data before the PEL data starts. 77 78#### _PID 79 80This keyword that contains the application's PID is added automatically by the 81phosphor-logging daemon when the `commit` or `report` APIs are used to create 82an event log, but not when the `Create` D-Bus method is used. If a caller of 83the `Create` API wishes to have their PID captured in the PEL this should be 84used. 85 86This will be added to the PEL in a section of type User Data (UD), along with 87the application name it corresponds to. 88 89The syntax is: 90``` 91_PID=<PID of application> 92e.g. 93_PID="12345" 94``` 95 96#### CALLOUT_INVENTORY_PATH 97 98This is used to pass in an inventory item to use as a callout. See [here for 99details](#passing-callouts-in-with-the-additionaldata-property) 100 101#### CALLOUT_PRIORITY 102 103This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of 104that FRU callout. If not specified, the default priority is "H"/High Priority. 105 106The possible values are: 107- "H": High Priority 108- "M": Medium Priority 109- "L": Low Priority 110 111See [here for details](#passing-callouts-in-with-the-additionaldata-property) 112 113#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 114 115This is used to pass in a device path to create callouts from. See [here for 116details](#passing-callouts-in-with-the-additionaldata-property) 117 118#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 119 120This is used to pass in an I2C bus and address to create callouts from. See 121[here for details](#passing-callouts-in-with-the-additionaldata-property) 122 123#### PEL_SUBSYSTEM 124This keyword is used to pass in the subsystem that should be associated with 125this event log. The syntax is: 126PEL_SUBSYSTEM=<subsystem value in hex> 127e.g. 128PEL_SUBSYSTEM=0x20 129 130### FFDC Intended For UserData PEL sections 131 132When one needs to add FFDC into the PEL UserData sections, the 133`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create` 134interface must be used when creating a new event log. This method takes a list 135of files to store in the PEL UserData sections. 136 137That API is the same as the 'Create' one, except it has a new parameter: 138 139``` 140std::vector<std::tuple<enum[FFDCFormat], 141 uint8_t, 142 uint8_t, 143 sdbusplus::message::unix_fd>> 144``` 145 146Each entry in the vector contains a file descriptor for a file that will 147be stored in a unique UserData section. The tuple's arguments are: 148 149- enum[FFDCFormat]: The data format type, the options are: 150 - 'JSON' 151 - The parser will use nlohmann::json\'s pretty print 152 - 'CBOR' 153 - The parser will use nlohmann::json\'s pretty print 154 - 'Text' 155 - The parser will output ASCII text 156 - 'Custom' 157 - The parser will hexdump the data, unless there is a parser registered 158 for this component ID and subtype. 159- uint8_t: subType 160 - Useful for the 'custom' type. Not used with the other types. 161- uint8_t: version 162 - The version of the data. 163 - Used for the custom type. 164 - Not planning on using for JSON/BSON unless a reason to do so appears. 165- unixfd - The file descriptor for the opened file that contains the 166 contents. The file descriptor can be closed and the file can be deleted if 167 desired after the method call. 168 169An example of saving JSON data to a file and getting its file descriptor is: 170 171``` 172nlohmann::json json = ...; 173auto jsonString = json.dump(); 174FILE* fp = fopen(filename, "w"); 175fwrite(jsonString.data(), 1, jsonString.size(), fp); 176int fd = fileno(fp); 177``` 178 179Alternatively, 'open()' can be used to obtain the file descriptor of the file. 180 181Upon receiving this data, the PEL code will create UserData sections for each 182entry in that vector with the following UserData fields: 183 184- Section header component ID: 185 - If the type field from the tuple is "custom", use the component ID from 186 the message registry. 187 - Otherwise, set the component ID to the phosphor-logging component ID so 188 that the parser knows to use the built in parsers (e.g. json) for the 189 type. 190- Section header subtype: The subtype field from the tuple. 191- Section header version: The version field from the tuple. 192- Section data: The data from the file. 193 194If there is a peltool parser registered for the custom type (method is TBD), 195that will be used by peltool to print the data, otherwise it will be hexdumped. 196 197Before adding each of these UserData sections, a check will be done to see if 198the PEL size will remain under the maximum size of 16KB. If not, the UserData 199section will be truncated down enough so that it will fit into the 16KB. 200 201## Default UserData sections for BMC created PELs 202 203The extension code that creates PELs will add these UserData sections to every 204PEL: 205 206- The AdditionalData property contents 207 - If the AdditionalData property in the OpenBMC event log has anything in it, 208 it will be saved in a UserData section as a JSON string. 209 210- System information 211 - This section contains various pieces of system information, such as the 212 full code level and the BMC, chassis, and host state properties. 213 214## The PEL Message Registry 215 216The PEL message registry is used to create PELs from OpenBMC event logs. 217Documentation can be found [here](registry/README.md). 218 219## Callouts 220 221A callout points to a FRU, a symbolic FRU, or an isolation procedure. There 222can be from zero to ten of them in each PEL, where they are located in the SRC 223section. 224 225There are a few different ways to add callouts to a PEL. In all cases, the 226callouts will be sorted from highest to lowest priority within the PEL after 227they are added. 228 229### Passing callouts in with the AdditionalData property 230 231The PEL code can add callouts based on the values of special entries in the 232AdditionalData event log property. They are: 233 234- CALLOUT_INVENTORY_PATH 235 236 This keyword is used to call out a single FRU by passing in its D-Bus 237 inventory path. When the PEL code sees this, it will create a single FRU 238 callout, using the VPD properties (location code, FN, CCIN) from that 239 inventory item. If that item is not a FRU itself and does not have a 240 location code, it will keep searching its parents until it finds one that 241 is. 242 243 The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY 244 keyword is also present and contains a different priority in which case it 245 will be used instead. This can be useful when a maintenance procedure with 246 a high priority callout is specified for this error in the message registry 247 and the FRU callout needs to have a different priority. 248 249 ``` 250 CALLOUT_INVENTORY_PATH= 251 "/xyz/openbmc_project/inventory/system/chassis/motherboard" 252 ``` 253 254- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO 255 256 These keywords are required as a pair to indicate faulty device 257 communication, usually detected by a failure accessing a device at that 258 sysfs path. The PEL code will use a data table generated by the MRW to map 259 these device paths to FRU callout lists. The errno value may influence the 260 callout. 261 262 I2C, FSI, FSI-I2C, and FSI-SPI paths are supported. 263 264 ``` 265 CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069" 266 CALLOUT_ERRNO="2" 267 ``` 268 269- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO 270 271 These 3 keywords can be used to callout a failing I2C device path when the 272 full device path isn't known. It is similar to CALLOUT_DEVICE_PATH in that 273 it will use data tables generated by the MRW to create the callouts. 274 275 CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or 276 just the bus number by itself. 277 CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number 278 if preceded with a "0x". 279 280 ``` 281 CALLOUT_IIC_BUS="/dev/i2c-7" 282 CALLOUT_IIC_ADDR="81" 283 CALLOUT_ERRNO=62 284 ``` 285 286### Defining callouts in the message registry 287 288Callouts can be completely defined inside that error's definition in the PEL 289message registry. This method allows the callouts to vary based on the system 290type or on any AdditionalData item. 291 292At a high level, this involves defining a callout section inside the registry 293entry that contain the location codes or procedure names to use, along with 294their priority. If these can vary based on system type, the type provided by 295the entity manager will be one of the keys. If they can also depend on an 296AdditionalData entry, then that will also be a key. 297 298See the message registry [README](registry/README.md) and 299[schema](registry/schema/schema.json) for the details. 300 301### Using the message registry along with CALLOUT_ entries 302 303If the message registry entry contains a callout definition and the event log 304also contains one of aforementioned CALLOUT keys in the AdditionalData 305property, then the PEL code will first add the callouts stemming from the 306CALLOUT items, followed by the callouts from the message registry. 307 308### Specifying multiple callouts using JSON format FFDC files 309 310Multiple callouts can be passed in by the creator at the time of PEL creation. 311This is done by specifying them in a JSON file that is then passed in as an 312[FFDC file](#ffdc-intended-for-userdata-pel-sections). The JSON will still be 313added into a PEL UserData section for debug. 314 315To specify that an FFDC file contains callouts, the format value for that FFDC 316entry must be set to JSON, and the subtype field must be set to 0xCA: 317 318``` 319using FFDC = std::tuple<CreateIface::FFDCFormat, 320 uint8_t, 321 uint8_t, 322 sdbusplus::message::unix_fd>; 323 324FFDC ffdc{ 325 CreateIface::FFDCFormat::JSON, 326 0xCA, // Callout subtype 327 0x01, // Callout version, set to 0x01 328 fd}; 329``` 330 331The JSON contains an array of callouts that must be in order of highest 332priority to lowest, with a maximum of 10. Any callouts after the 10th will 333just be thrown away as there is no room for them in the PEL. The format looks 334like: 335 336``` 337[ 338 { 339 // First callout 340 }, 341 { 342 // Second callout 343 }, 344 { 345 // Nth callout 346 } 347] 348``` 349 350A callout entry can be a normal hardware callout, a maintenance procedure 351callout, or a symbolic FRU callout. Each callout must contain a Priority 352field, where the possible values are: 353 354* "H" = High 355* "M" = Medium 356* "A" = Medium Group A 357* "B" = Medium Group B 358* "C" = Medium Group C 359* "L" = Low 360 361Either unexpanded location codes or D-Bus inventory object paths can be used to 362specify the called out part. An unexpanded location code does not have the 363system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so 364can be either Ufcs-P1 or just P1). 365 366#### Normal hardware FRU callout 367 368Normal hardware callouts must contain either the location code or inventory 369path, and priority. Even though the PEL code doesn't do any guarding or 370deconfiguring itself, it needs to know if either of those things occurred as 371there are status bits in the PEL to reflect them. The Guarded and Deconfigured 372fields are used for this. Those fields are optional and if omitted then their 373values will be false. 374 375When the inventory path of a sub-FRU is passed in, the PEL code will put the 376location code of the parent FRU into the callout. 377 378``` 379{ 380 "LocationCode": "P0-C1", 381 "Priority": "H" 382} 383 384{ 385 "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5", 386 "Priority": "H", 387 "Deconfigured": true, 388 "Guarded": true 389} 390 391``` 392 393MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally 394be added to callouts to specify failing devices on a FRU. These may be used 395during the manufacturing test process, where there may be the ability to do 396these replacements. There can be up to 15 MRUs, each with its own priority, 397embedded in a callout. The possible priority values match the FRU priority 398values. 399 400Note that since JSON only supports numbers in decimal and not in hex, MRU IDs 401will show up as decimal when visually inspecting the JSON. 402 403``` 404{ 405 "LocationCode": "P0-C1", 406 "Priority": "H", 407 "MRUs": [ 408 { 409 "ID": 1234, 410 "Priority": "H" 411 }, 412 { 413 "ID": 5678, 414 "Priority": "H" 415 } 416 ] 417} 418``` 419 420#### Maintenance procedure callout 421 422The LocationCode field is not used with procedure callouts. Only the first 7 423characters of the Procedure field will be used by the PEL. 424 425``` 426{ 427 "Procedure": "PRONAME", 428 "Priority": "H" 429} 430``` 431 432#### Symbolic FRU callout 433 434Only the first seven characters of the SymbolicFRU field will be used by the PEL. 435 436If the TrustedLocationCode field is present and set to true, this means the 437location code may be used to turn on service indicators, so the LocationCode 438field is required. If TrustedLocationCode is false or missing, then the 439LocationCode field is optional. 440 441``` 442{ 443 "TrustedLocationCode": true, 444 "Location Code": "P0-C1", 445 "Priority": "H", 446 "SymbolicFRU": "FRUNAME" 447} 448``` 449 450## `Action Flags` and `Event Type` Rules 451 452The `Action Flags` and `Event Type` PEL fields are optional in the message 453registry, and if not present the code will set them based on certain rules 454layed out in the PEL spec. 455 456These rules are: 4571. Always set the `Report` flag, unless the `Do Not Report` flag is already on. 4582. Always clear the `SP Call Home` flag, as that feature isn't supported. 4593. If the severity is `Non-error Event`: 460 - Clear the `Service Action` flag. 461 - Clear the `Call Home` flag. 462 - If the `Event Type` field is `Not Applicable`, change it to `Information 463 Only`. 464 - If the `Event Type` field is `Information Only` or `Tracing`, set the 465 `Hidden` flag. 4664. If the severity is `Recovered`: 467 - Set the `Hidden` flag. 468 - Clear the `Service Action` flag. 469 - Clear the `Call Home` flag. 4705. For all other severities: 471 - Clear the `Hidden` flag. 472 - Set the `Service Action` flag. 473 - Set the `Call Home` flag. 474 475Additional rules may be added in the future if necessary. 476 477## D-Bus Interfaces 478 479See the org.open_power.Logging.PEL interface definition for the most up to date 480information. 481 482## PEL Retention 483 484The PEL repository is allocated a set amount of space on the BMC. When that 485space gets close to being full, the code will remove a percentage of PELs to 486make room for new ones. In addition, the code will keep a cap on the total 487number of PELs allowed. Note that removing a PEL will also remove the 488corresponding OpenBMC event log. 489 490The disk capacity limit is set to 20MB, and the number limit is 3000. 491 492The rules used to remove logs are listed below. The checks will be run after a 493PEL has been added and the method to create the PEL has returned to the caller, 494i.e. run when control gets back to the event loop. 495 496### Removal Algorithm 497 498If the size used is 95% or under of the allocated space and under the limit on 499the number of PELs, nothing further needs to be done, otherwise continue and 500run all 5 of the following steps. Each step itself only deletes PELs until it 501meets its requirement and then it stops. 502 503The steps are: 504 5051. Remove BMC created informational PELs until they take up 15% or less of the 506 allocated space. 507 5082. Remove BMC created non-informational PELs until they take up 30% or less of 509 the allocated space. 510 5113. Remove non-BMC created informational PELs until they take up 15% or less of 512 the allocated space. 513 5144. Remove non-BMC created non-informational PELs until they take up 30% or less 515 of the allocated space. 516 5175. After the previous 4 steps are complete, if there are still more than the 518 maximum number of PELs, remove PELs down to 80% of the maximum. 519 520PELs with associated guard records will never be deleted. Each step above 521makes the following 4 passes, stopping as soon as its limit is reached: 522 523Pass 1. Remove HMC acknowledged PELs.<br> 524Pass 2. Remove OS acknowledged PELs.<br> 525Pass 3. Remove PHYP acknowledged PELs.<br> 526Pass 4. Remove all PELs. 527 528After all these steps, disk capacity will be at most 90% (15% + 30% + 15% + 52930%). 530 531## Adding python3 modules for PEL UserData and SRC parsing 532 533In order to support python3 modules for the parsing of PEL User Data sections 534and to decode SRC data, setuptools is used to import python3 packages from 535external repos to be included in the OpenBMC image. 536``` 537Sample layout for setuptools: 538 539setup.py 540src/usr/scom/plugins/ebmc/b0300.py 541src/usr/i2c/plugins/ebmc/b0700.py 542src/build/tools/ebmc/errludP_Helpers.py 543``` 544 545`setup.py` is the build script for setuptools. It contains information about the 546package (such as the name and version) as well as which code files to include. 547 548The setup.py template to be used for eBMC User Data parsers: 549``` 550import os.path 551from setuptools import setup 552 553# To update this dict with new key/value pair for every component added 554# Key: The package name to be installed as 555# Value: The path containing the package's python modules 556dirmap = { 557 "b0300": "src/usr/scom/plugins/ebmc", 558 "b0700": "src/usr/i2c/plugins/ebmc", 559 "helpers": "src/build/tools/ebmc" 560} 561 562# All packages will be installed under 'udparsers' namespace 563def get_package_name(dirmap_key): 564 return "udparsers.{}".format(dirmap_key) 565 566def get_package_dirent(dirmap_item): 567 package_name = get_package_name(dirmap_item[0]) 568 package_dir = dirmap_item[1] 569 return (package_name, package_dir) 570 571def get_packages(): 572 return map(get_package_name, dirmap.keys()) 573 574def get_package_dirs(): 575 return map(get_package_dirent, dirmap.items()) 576 577setup( 578 name="Hostboot", 579 version="0.1", 580 packages=list(get_packages()), 581 package_dir=dict(get_package_dirs()) 582) 583``` 584- User Data parser module 585 - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the 586 Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID 587 from the User Data section itself (in HEX). All should be converted to 588 lowercase. 589 - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100 590 - Function to provide: `parseUDToJson` 591 - Argument list: 592 1. (int) Sub-section type 593 2. (int) Section version 594 3. (memoryview): Data 595 - Return data: 596 1. (str) JSON string 597 598 - Sample User Data parser module: 599 ``` 600 import json 601 def parseUDToJson(subType, ver, data): 602 d = dict() 603 ... 604 # Parse and populate data into dictionary 605 ... 606 jsonStr = json.dumps(d) 607 return jsonStr 608 ``` 609- SRC parser module 610 - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the 611 Private Header section (in ASCII, converted to lowercase). 612 - For example: `bsrc.py` for Hostboot generated SRCs 613 - Function to provide: `parseSRCToJson` 614 - Argument list: 615 1. (str) Refcode ASCII string 616 2. (str) Hexword 2 617 3. (str) Hexword 3 618 4. (str) Hexword 4 619 5. (str) Hexword 5 620 6. (str) Hexword 6 621 7. (str) Hexword 7 622 8. (str) Hexword 8 623 9. (str) Hexword 9 624 - Return data: 625 1. (str) JSON string 626 627 - Sample SRC parser module: 628 ``` 629 import json 630 def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \ 631 word8, word9): 632 d = dict({'A': 1, 'B': 2}) 633 ... 634 # Decode SRC data into dictionary 635 ... 636 jsonStr = json.dumps(d) 637 return jsonStr 638 ``` 639 640## Fail Boot on Host Errors 641 642The fail boot on hw error [design][1] provides a function where a system owner 643can tell the firmware to fail the boot of a system if a BMC phosphor-logging 644event has a hardware callout in it. 645 646It is required that when this fail boot on hardware error setting is enabled, 647that the BMC fail the boot for **any** error from the host which satisfies the 648following criteria: 649- not SeverityType::nonError 650- has a callout of any kind from the `FailingComponentType` structure 651 652## Self Boot Engine(SBE) First Failure Data Capture(FFDC) Support 653 654During SBE chip-op failure SBE creates FFDC with custom data format. 655SBE FFDC contains different packets, which include SBE internal failure related 656Trace and user data also Hardware procedure failure FFDC created by FAPI 657infrastructure. PEL infrastructure provides support to process SBE FFDC packets 658created by FAPI infrastructure during hardware procedure execution failures, 659also add callouts, user data section information based on FAPI processing 660in case non FAPI based failure, just keeps the raw FFDC data in the user section 661to support SBE parser plugins. 662 663 664CreatePELWithFFDCFiles D-Bus method on the `org.open_power.Logging.PEL` 665interface must be used when creating a new event log. 666 667To specify that an FFDC file contains SBE FFDC, the format value for that FFDC 668entry must be set to "custom", and the subtype field must be set to 0xCB: 669 670``` 671using FFDC = std::tuple<CreateIface::FFDCFormat, 672 uint8_t, 673 uint8_t, 674 sdbusplus::message::unix_fd>; 675 676FFDC ffdc{ 677 CreateIface::FFDCFormat::custom, 678 0xCB, // SBE FFDC subtype 679 0x01, // SBE FFDC version, set to 0x01 680 fd}; 681 ``` 682 683"SRC6" Keyword in the additional data section should be populated with below. 684 685 - [0:15] chip position (hex) 686 - [16:23] command class (hex) 687 - [24:31] command (hex) 688 689e.g for GetSCOM 690 691 SRC6="0002A201" 692 693Note: "phal" build-time configure option should be "enabled" to enable this 694 feature. 695 696## PEL Archiving 697 698When an OpenBMC event log is deleted its corresponding PEL is moved to 699an archive folder. These archived PELs will be available in BMC dump. 700The archive path: /var/lib/phosphor-logging/extensions/pels/logs/archive. 701 702Highlighted points are: 703- PELs whose corresponding event logs have been deleted will be available 704 in the archive folder. 705- Archive folder size is tracked along with logs folder size and if 706 combined size exceeds warning size all archived PELs will be deleted. 707- Archived PEL logs can be viewed using peltool with flag --archive. 708- If a PEL is deleted using peltool its not archived. 709 710[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md 711