1# Platform Event Log Message Registry 2 3On the BMC, PELs are created from the standard event logs provided by 4phosphor-logging using a message registry that provides the PEL related fields. 5The message registry is a JSON file. 6 7## Contents 8 9- [Component IDs](#component-ids) 10- [Message Registry](#message-registry-fields) 11- [Modifying and Testing](#modifying-and-testing) 12 13## Component IDs 14 15A component ID is a 2 byte value of the form 0xYY00 used in a PEL to: 16 171. Provide the upper byte (the YY from above) of an SRC reason code in `BD` 18 SRCs. 192. Reside in the section header of the Private Header PEL section to specify the 20 error log creator's component ID. 213. Reside in the section header of the User Header section to specify the error 22 log committer's component ID. 234. Reside in the section header in the User Data section to specify which parser 24 to call to parse that section. 25 26Component IDs are specified in the message registry either as the upper byte of 27the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID` 28field. 29 30Component IDs will be unique on a per-repository basis for errors unique to that 31repository. When the same errors are created by multiple repositories, those 32errors will all share the same component ID. The master list of component IDs is 33[available](O_component_ids.json). That file can used by PEL parsers to display 34a name for the component ID. The 'O' in the name is the creator ID value for BMC 35created PELs. 36 37## Message Registry Fields 38 39The [message registry schema](schema/schema.json) and the 40[message registry](message_registry.json) is available. The schema will be 41validated either during a bitbake build or during CI, or eventually possibly 42both. 43 44In the message registry, there are fields for specifying: 45 46### Name 47 48This is the key into the message registry, and is the Message property of the 49OpenBMC event log that the PEL is being created from. 50 51```json 52"Name": "xyz.openbmc_project.Power.Fault" 53``` 54 55### Subsystem 56 57This field is part of the PEL User Header section, and is used to specify the 58subsystem pertaining to the error. It is an enumeration that maps to the actual 59PEL value. If the subsystem isn't known ahead of time, it can be passed in at 60the time of PEL creation using the 'PEL_SUBSYSTEM' AdditionalData field. In this 61case, 'Subsystem' isn't required, though 'PossibleSubsystems' is. 62 63```json 64"Subsystem": "power_supply" 65``` 66 67### PossibleSubsystems 68 69This field is used by scripts that build documentation from the message registry 70to know which subsystems are possible for an error when it can't be hardcoded 71using the 'Subsystem' field. It is mutually exclusive with the 'Subsystem' 72field. 73 74```json 75"PossibleSubsystems": ["memory", "processor"] 76``` 77 78### Severity 79 80This field is part of the PEL User Header section, and is used to specify the 81PEL severity. It is an optional field, if it isn't specified, then the severity 82of the OpenBMC event log will be converted into a PEL severity value. 83 84It can either be the plain severity value, or an array of severity values that 85are based on system type, where an entry without a system type will match 86anything unless another entry has a matching system type. 87 88```json 89"Severity": "unrecoverable" 90``` 91 92```json 93Severity": 94[ 95 { 96 "System": "system1", 97 "SevValue": "recovered" 98 }, 99 { 100 "Severity": "unrecoverable" 101 } 102] 103``` 104 105The above example shows that on system 'system1' the severity will be recovered, 106and on every other system it will be unrecoverable. 107 108### Mfg Severity 109 110This is an optional field and is used to override the Severity field when a 111specific manufacturing isolation mode is enabled. It has the same format as 112Severity. 113 114```json 115"MfgSeverity": "unrecoverable" 116``` 117 118### Event Scope 119 120This field is part of the PEL User Header section, and is used to specify the 121event scope, as defined by the PEL spec. It is optional and defaults to "entire 122platform". 123 124```json 125"EventScope": "entire_platform" 126``` 127 128### Event Type 129 130This field is part of the PEL User Header section, and is used to specify the 131event type, as defined by the PEL spec. It is optional and defaults to "not 132applicable" for non-informational logs, and "misc_information_only" for 133informational ones. 134 135```json 136"EventType": "na" 137``` 138 139### Action Flags 140 141This field is part of the PEL User Header section, and is used to specify the 142PEL action flags, as defined by the PEL spec. It is an array of enumerations. 143 144The action flags can usually be deduced from other PEL fields, such as the 145severity or if there are any callouts. As such, this is an optional field and if 146not supplied the code will fill them in based on those fields. 147 148In fact, even if supplied here, the code may still modify them to ensure they 149are correct. The rules used for this are in the 150[OpenPower PELs README](../README.md#action-flags-and-event-type-rules). 151 152```json 153"ActionFlags": ["service_action", "report", "call_home"] 154``` 155 156### Mfg Action Flags 157 158This is an optional field and is used to override the Action Flags field when a 159specific manufacturing isolation mode is enabled. 160 161```json 162"MfgActionFlags": ["service_action", "report", "call_home"] 163``` 164 165### Component ID 166 167This is the component ID of the PEL creator, in the form 0xYY00. For `BD` SRCs, 168this is an optional field and if not present the value will be taken from the 169upper byte of the reason code. If present for `BD` SRCs, then this byte must 170match the upper byte of the reason code. 171 172```json 173"ComponentID": "0x5500" 174``` 175 176### SRC Type 177 178This specifies the type of SRC to create. The type is the first 2 characters of 179the 8 character ASCII string field of the PEL. The allowed types are `BD`, for 180the standard OpenBMC error, and `11`, for power related errors. It is optional 181and if not specified will default to `BD`. 182 183Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where: 184 185- BD = SRC type 186- BB = PEL subsystem as mentioned above 187- CCCC SRC reason code 188 189For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code. 190 191```json 192"Type": "11" 193``` 194 195### SRC Reason Code 196 197This is the 4 character value in the latter half of the SRC ASCII string. It is 198treated as a 2 byte hex value, such as 0x5678. For `BD` SRCs, the first byte is 199the same as the first byte of the component ID field in the Private Header 200section that represents the creator's component ID. 201 202```json 203"ReasonCode": "0x5544" 204``` 205 206### SRC Symptom ID Fields 207 208The symptom ID is in the Extended User Header section and is defined in the PEL 209spec as the unique event signature string. It always starts with the ASCII 210string. This field in the message registry allows one to choose which SRC words 211to use in addition to the ASCII string field to form the symptom ID. All words 212are separated by underscores. If not specified, the code will choose a default 213format, which may depend on the SRC type. 214 215For example: ["SRCWord3", "SRCWord9"] would be: 216`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like: 217`B181320_00000050_49000000`. 218 219```json 220"SymptomIDFields": ["SRCWord3", "SRCWord9"] 221``` 222 223### SRC words 6 to 9 224 225In a PEL, these SRC words are free format and can be filled in by the user as 226desired. On the BMC, the source of these words is the AdditionalData fields in 227the event log. The message registry provides a way for the log creator to 228specify which AdditionalData property field to get the data from, and also to 229define what the SRC word means for use by parsers. If not specified, these SRC 230words will be set to zero in the PEL. 231 232```json 233"Words6to9": 234{ 235 "6": 236 { 237 "description": "Failing unit number", 238 "AdditionalDataPropSource": "PS_NUM" 239 } 240} 241``` 242 243### SRC Deconfig Flag 244 245Bit 6 in hex word 5 of the SRC means that one or more called out resources have 246been deconfigured, and this flag can be used to set that bit. The only other way 247to set it is by indicating it when 248[passing in the callouts via JSON](../README.md#callouts). 249 250This is looked at by the software that creates the periodic PELs that indicate a 251system is running with deconfigured hardware. 252 253```json 254"DeconfigFlag": true 255``` 256 257### SRC Checkstop Flag 258 259This is used to indicate the PEL is for a hardware checkstop, and causes bit 0 260in hex word 5 of the SRC to be set. 261 262```json 263"CheckstopFlag": true 264``` 265 266### Documentation Fields 267 268The documentation fields are used by PEL parsers to display a human readable 269description of a PEL. They are also the source for the Redfish event log 270messages. 271 272#### Message 273 274This field is used by the BMC's PEL parser as the description of the error log. 275It will also be used in Redfish event logs. It supports argument substitution 276using the %1, %2, etc placeholders allowing any of the SRC user data words 6 - 9 277to be displayed as part of the message. If the placeholders are used, then the 278`MessageArgSources` property must be present to say which SRC words to use for 279each placeholder. 280 281```json 282"Message": "Processor %1 had %2 errors" 283``` 284 285#### MessageArgSources 286 287This optional field is required when the Message field contains the %X 288placeholder arguments. It is an array that says which SRC words to get the 289placeholders from. In the example below, SRC word 6 would be used for %1, and 290SRC word 7 for %2. 291 292```json 293"MessageArgSources": 294[ 295 "SRCWord6", "SRCWord7" 296] 297``` 298 299#### Description 300 301A short description of the error. This is required by the Redfish schema to 302generate a Redfish message entry, but is not used in Redfish or PEL output. 303 304```json 305"Description": "A power fault" 306``` 307 308#### Notes 309 310This is an optional free format text field for keeping any notes for the 311registry entry, as comments are not allowed in JSON. It is an array of strings 312for easier readability of long fields. 313 314```json 315"Notes": [ 316 "This entry is for every type of power fault.", 317 "There is probably a hardware failure." 318] 319``` 320 321### Callout Fields 322 323The callout fields allow one to specify the PEL callouts (either a hardware FRU, 324a symbolic FRU, or a maintenance procedure) in the entry for a particular error. 325These callouts can vary based on system type, as well as a user specified 326AdditionalData property field. Callouts will be added to the PEL in the order 327they are listed in the JSON. If a callout is passed into the error, say with 328CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL before the 329callouts in the registry. 330 331There is room for up to 10 callouts in a PEL. 332 333The callouts based on system type can be added in two ways, by using either a 334key called `System` or by `Systems`. 335 336The `System` key will accept the system name as a string and the user can add 337the callouts specific to that system under the `System`. 338 339Suppose if multiple systems have same callouts, the `Systems` key can be used. 340The `Systems` can accept the system names as an array of strings and the list of 341callouts common to those systems can be listed under the key. 342 343Available maintenance procedures are listed in the [parser][1] and in the 344[source code][2]. 345 346[1]: 347 https://github.com/ibm-openbmc/openpower-pel-parsers/blob/master/modules/calloutparsers/ocallouts/ocallouts.py 348[2]: 349 https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/pel_values.cpp 350 351If a procedure is needed that doesn't exist yet, please contact the owner of 352this code for instructions. 353 354#### Callouts example based on the system type 355 356```json 357"Callouts": 358[ 359 { 360 "System": "system1", 361 "CalloutList": 362 [ 363 { 364 "Priority": "high", 365 "LocCode": "P1-C1" 366 }, 367 { 368 "Priority": "low", 369 "LocCode": "P1" 370 } 371 ] 372 }, 373 { 374 "CalloutList": 375 [ 376 { 377 "Priority": "high", 378 "Procedure": "BMC0002" 379 } 380 ] 381 382 } 383] 384 385``` 386 387The above example shows that on system `system1`, the FRU at location P1-C1 will 388be called out with a priority of high, and the FRU at P1 with a priority of low. 389On every other system, the maintenance procedure BMC0002 is called out. 390 391#### Callouts example based on the Systems type 392 393```json 394"Callouts": 395[ 396 { 397 "Systems": ["system1", "system2"], 398 "CalloutList": 399 [ 400 { 401 "Priority": "high", 402 "LocCode": "P1-C1" 403 }, 404 { 405 "Priority": "low", 406 "LocCode": "P1" 407 } 408 ] 409 }, 410 { 411 "System": "system1", 412 "CalloutList": 413 [ 414 { 415 "Priority": "low", 416 "SymbolicFRU": "service_docs" 417 }, 418 { 419 "Priority": "low", 420 "SymbolicFRUTrusted": "air_mover", 421 "UseInventoryLocCode": true 422 } 423 ] 424 }, 425 { 426 "CalloutList": 427 [ 428 { 429 "Priority": "medium", 430 "Procedure": "BMC0001" 431 } 432 ] 433 } 434] 435``` 436 437The above example shows that on `system1`, the FRU at location P1-C1, P1, 438service_docs and air_mover will be called out. For `system2`, the FRU at 439location P1-C1, P1 will be called out. On every other system, the maintenance 440procedure BMC0001 is called out. 441 442#### Callouts example based on an AdditionalData field 443 444```json 445"CalloutsUsingAD": 446{ 447 "ADName": "PROC_NUM", 448 "CalloutsWithTheirADValues": 449 [ 450 { 451 "ADValue": "0", 452 "Callouts": 453 [ 454 { 455 "CalloutList": 456 [ 457 { 458 "Priority": "high", 459 "LocCode": "P1-C5" 460 } 461 ] 462 } 463 ] 464 }, 465 { 466 "ADValue": "1", 467 "Callouts": 468 [ 469 { 470 "CalloutList": 471 [ 472 { 473 "Priority": "high", 474 "LocCode": "P1-C6" 475 } 476 ] 477 } 478 ] 479 } 480 ] 481} 482 483``` 484 485This example shows that the callouts were selected based on the 'PROC_NUM' 486AdditionalData field. When PROC_NUM was 0, the FRU at P1-C5 was called out. When 487it was 1, P1-C6 was called out. Note that the same 'Callouts' array is used as 488in the previous example, so these callouts can also depend on the system type. 489 490If it's desired to use a different set of callouts when there isn't a match on 491the AdditionalData field, one can use CalloutsWhenNoADMatch. In the following 492example, the 'air_mover' callout will be added if 'PROC_NUM' isn't 0. 493'CalloutsWhenNoADMatch' has the same schema as the 'Callouts' section. 494 495```json 496"CalloutsUsingAD": 497{ 498 "ADName": "PROC_NUM", 499 "CalloutsWithTheirADValues": 500 [ 501 { 502 "ADValue": "0", 503 "Callouts": 504 [ 505 { 506 "CalloutList": 507 [ 508 { 509 "Priority": "high", 510 "LocCode": "P1-C5" 511 } 512 ] 513 } 514 ] 515 }, 516 ], 517 "CalloutsWhenNoADMatch": [ 518 { 519 "CalloutList": [ 520 { 521 "Priority": "high", 522 "SymbolicFRU": "air_mover" 523 } 524 ] 525 } 526 ] 527} 528 529``` 530 531#### CalloutType 532 533This field can be used to modify the failing component type field in the callout 534when the default doesn\'t fit: 535 536```json 537{ 538 539 "Priority": "high", 540 "Procedure": "FIXIT22" 541 "CalloutType": "config_procedure" 542} 543``` 544 545The defaults are: 546 547- Normal hardware FRU: hardware_fru 548- Symbolic FRU: symbolic_fru 549- Procedure: maint_procedure 550 551#### Symbolic FRU callouts with dynamic trusted location codes 552 553A special case is when one wants to use a symbolic FRU callout with a trusted 554location code, but the location code to use isn\'t known until runtime. This 555means it can\'t be specified using the 'LocCode' key in the registry. 556 557In this case, one should use the 'SymbolicFRUTrusted' key along with the 558'UseInventoryLocCode' key, and then pass in the inventory item that has the 559desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the 560AdditionalData property. The code will then look up the location code for that 561passed in inventory FRU and place it in the symbolic FRU callout. The normal FRU 562callout with that inventory item will not be created. The symbolic FRU must be 563the first callout in the registry for this to work. 564 565```json 566{ 567 "Priority": "high", 568 "SymbolicFRUTrusted": "AIR_MOVR", 569 "UseInventoryLocCode": true 570} 571``` 572 573### Capturing the Journal 574 575The PEL daemon can be told to capture pieces of the journal in PEL UserData 576sections. This could be useful for debugging problems where a BMC dump which 577would also contain the journal isn't available. 578 579The 'JournalCapture' field has two formats, one that will create one UserData 580section with the previous N lines of the journal, and another that can capture 581any number of journal snippets based on the journal's SYSLOG_IDENTIFIER field. 582 583```json 584"JournalCapture": { 585 "NumLines": 30 586} 587``` 588 589```json 590"JournalCapture": 591{ 592 "Sections": [ 593 { 594 "SyslogID": "phosphor-bmc-state-manager", 595 "NumLines": 20 596 }, 597 { 598 "SyslogID": "phosphor-log-manager", 599 "NumLines": 15 600 } 601 ] 602} 603``` 604 605The first example will capture the previous 30 lines from the journal into a 606single UserData section. 607 608The second example will create two UserData sections, the first with the most 609recent 20 lines from phosphor-bmc-state-manager, and the second with 15 lines 610from phosphor-log-manager. 611 612If a UserData section would make the PEL exceed its maximum size of 16KB, it 613will be dropped. 614 615## Modifying and Testing 616 617The general process for adding new entries to the message registry is: 618 6191. Update message_registry.json to add the new errors. 6202. If a new component ID is used (usually the first byte of the SRC reason 621 code), document it in O_component_ids.json. 6223. Validate the file. It must be valid JSON and obey the schema. The 623 `validate_registry.py` script in `extensions/openpower-pels/registry/tools` 624 will validate both, though it requires the python-jsonschema package to do 625 the schema validation. This script is also run to validate the message 626 registry as part of CI testing. 627 628 ```sh 629 ./tools/validate_registry.py -s schema/schema.json -r message_registry.json 630 ``` 631 6324. One can test what PELs are generated from these new entries without writing 633 any code to create the corresponding event logs: 634 1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on 635 the BMC. That directory may need to be created. 636 2. Use busctl to call the Create method to create an event log corresponding 637 to the message registry entry under test. 638 639 ```sh 640 busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \ 641 xyz.openbmc_project.Logging.Create Create ssa{ss} \ 642 xyz.openbmc_project.Common.Error.Timeout \ 643 xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5" 644 ``` 645 646 3. Check the PEL that was created using peltool. 647 4. When finished, delete the file from `/etc/phosphor-logging/`. 648