1# Platform Event Log Message Registry 2 3On the BMC, PELs are created from the standard event logs provided by 4phosphor-logging using a message registry that provides the PEL related fields. 5The message registry is a JSON file. 6 7## Contents 8 9- [Component IDs](#component-ids) 10- [Message Registry](#message-registry-fields) 11- [Modifying and Testing](#modifying-and-testing) 12 13## Component IDs 14 15A component ID is a 2 byte value of the form 0xYY00 used in a PEL to: 16 171. Provide the upper byte (the YY from above) of an SRC reason code in `BD` 18 SRCs. 192. Reside in the section header of the Private Header PEL section to specify the 20 error log creator's component ID. 213. Reside in the section header of the User Header section to specify the error 22 log committer's component ID. 234. Reside in the section header in the User Data section to specify which parser 24 to call to parse that section. 25 26Component IDs are specified in the message registry either as the upper byte of 27the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID` 28field. 29 30Component IDs will be unique on a per-repository basis for errors unique to that 31repository. When the same errors are created by multiple repositories, those 32errors will all share the same component ID. The master list of component IDs is 33[here](O_component_ids.json). That file can used by PEL parsers to display a 34name for the component ID. The 'O' in the name is the creator ID value for BMC 35created PELs. 36 37## Message Registry Fields 38 39The message registry schema is [here](schema/schema.json), and the message 40registry itself is [here](message_registry.json). The schema will be validated 41either during a bitbake build or during CI, or eventually possibly both. 42 43In the message registry, there are fields for specifying: 44 45### Name 46 47This is the key into the message registry, and is the Message property of the 48OpenBMC event log that the PEL is being created from. 49 50```json 51"Name": "xyz.openbmc_project.Power.Fault" 52``` 53 54### Subsystem 55 56This field is part of the PEL User Header section, and is used to specify the 57subsystem pertaining to the error. It is an enumeration that maps to the actual 58PEL value. If the subsystem isn't known ahead of time, it can be passed in at 59the time of PEL creation using the 'PEL_SUBSYSTEM' AdditionalData field. In this 60case, 'Subsystem' isn't required, though 'PossibleSubsystems' is. 61 62```json 63"Subsystem": "power_supply" 64``` 65 66### PossibleSubsystems 67 68This field is used by scripts that build documentation from the message registry 69to know which subsystems are possible for an error when it can't be hardcoded 70using the 'Subsystem' field. It is mutually exclusive with the 'Subsystem' 71field. 72 73```json 74"PossibleSubsystems": ["memory", "processor"] 75``` 76 77### Severity 78 79This field is part of the PEL User Header section, and is used to specify the 80PEL severity. It is an optional field, if it isn't specified, then the severity 81of the OpenBMC event log will be converted into a PEL severity value. 82 83It can either be the plain severity value, or an array of severity values that 84are based on system type, where an entry without a system type will match 85anything unless another entry has a matching system type. 86 87```json 88"Severity": "unrecoverable" 89``` 90 91```json 92Severity": 93[ 94 { 95 "System": "system1", 96 "SevValue": "recovered" 97 }, 98 { 99 "Severity": "unrecoverable" 100 } 101] 102``` 103 104The above example shows that on system 'system1' the severity will be recovered, 105and on every other system it will be unrecoverable. 106 107### Mfg Severity 108 109This is an optional field and is used to override the Severity field when a 110specific manufacturing isolation mode is enabled. It has the same format as 111Severity. 112 113```json 114"MfgSeverity": "unrecoverable" 115``` 116 117### Event Scope 118 119This field is part of the PEL User Header section, and is used to specify the 120event scope, as defined by the PEL spec. It is optional and defaults to "entire 121platform". 122 123```json 124"EventScope": "entire_platform" 125``` 126 127### Event Type 128 129This field is part of the PEL User Header section, and is used to specify the 130event type, as defined by the PEL spec. It is optional and defaults to "not 131applicable" for non-informational logs, and "misc_information_only" for 132informational ones. 133 134```json 135"EventType": "na" 136``` 137 138### Action Flags 139 140This field is part of the PEL User Header section, and is used to specify the 141PEL action flags, as defined by the PEL spec. It is an array of enumerations. 142 143The action flags can usually be deduced from other PEL fields, such as the 144severity or if there are any callouts. As such, this is an optional field and if 145not supplied the code will fill them in based on those fields. 146 147In fact, even if supplied here, the code may still modify them to ensure they 148are correct. The rules used for this are 149[here](../README.md#action-flags-and-event-type-rules). 150 151```json 152"ActionFlags": ["service_action", "report", "call_home"] 153``` 154 155### Mfg Action Flags 156 157This is an optional field and is used to override the Action Flags field when a 158specific manufacturing isolation mode is enabled. 159 160```json 161"MfgActionFlags": ["service_action", "report", "call_home"] 162``` 163 164### Component ID 165 166This is the component ID of the PEL creator, in the form 0xYY00. For `BD` SRCs, 167this is an optional field and if not present the value will be taken from the 168upper byte of the reason code. If present for `BD` SRCs, then this byte must 169match the upper byte of the reason code. 170 171```json 172"ComponentID": "0x5500" 173``` 174 175### SRC Type 176 177This specifies the type of SRC to create. The type is the first 2 characters of 178the 8 character ASCII string field of the PEL. The allowed types are `BD`, for 179the standard OpenBMC error, and `11`, for power related errors. It is optional 180and if not specified will default to `BD`. 181 182Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where: 183 184- BD = SRC type 185- BB = PEL subsystem as mentioned above 186- CCCC SRC reason code 187 188For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code. 189 190```json 191"Type": "11" 192``` 193 194### SRC Reason Code 195 196This is the 4 character value in the latter half of the SRC ASCII string. It is 197treated as a 2 byte hex value, such as 0x5678. For `BD` SRCs, the first byte is 198the same as the first byte of the component ID field in the Private Header 199section that represents the creator's component ID. 200 201```json 202"ReasonCode": "0x5544" 203``` 204 205### SRC Symptom ID Fields 206 207The symptom ID is in the Extended User Header section and is defined in the PEL 208spec as the unique event signature string. It always starts with the ASCII 209string. This field in the message registry allows one to choose which SRC words 210to use in addition to the ASCII string field to form the symptom ID. All words 211are separated by underscores. If not specified, the code will choose a default 212format, which may depend on the SRC type. 213 214For example: ["SRCWord3", "SRCWord9"] would be: `<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, 215which could look like: `B181320_00000050_49000000`. 216 217```json 218"SymptomIDFields": ["SRCWord3", "SRCWord9"] 219``` 220 221### SRC words 6 to 9 222 223In a PEL, these SRC words are free format and can be filled in by the user as 224desired. On the BMC, the source of these words is the AdditionalData fields in 225the event log. The message registry provides a way for the log creator to 226specify which AdditionalData property field to get the data from, and also to 227define what the SRC word means for use by parsers. If not specified, these SRC 228words will be set to zero in the PEL. 229 230```json 231"Words6to9": 232{ 233 "6": 234 { 235 "description": "Failing unit number", 236 "AdditionalDataPropSource": "PS_NUM" 237 } 238} 239``` 240 241### SRC Deconfig Flag 242 243Bit 6 in hex word 5 of the SRC means that one or more called out resources have 244been deconfigured, and this flag can be used to set that bit. The only other way 245to set it is by indicating it when 246[passing in the callouts via JSON](../README.md#callouts). 247 248This is looked at by the software that creates the periodic PELs that indicate a 249system is running with deconfigured hardware. 250 251```json 252"DeconfigFlag": true 253``` 254 255### SRC Checkstop Flag 256 257This is used to indicate the PEL is for a hardware checkstop, and causes bit 0 258in hex word 5 of the SRC to be set. 259 260```json 261"CheckstopFlag": true 262``` 263 264### Documentation Fields 265 266The documentation fields are used by PEL parsers to display a human readable 267description of a PEL. They are also the source for the Redfish event log 268messages. 269 270#### Message 271 272This field is used by the BMC's PEL parser as the description of the error log. 273It will also be used in Redfish event logs. It supports argument substitution 274using the %1, %2, etc placeholders allowing any of the SRC user data words 6 - 9 275to be displayed as part of the message. If the placeholders are used, then the 276`MessageArgSources` property must be present to say which SRC words to use for 277each placeholder. 278 279```json 280"Message": "Processor %1 had %2 errors" 281``` 282 283#### MessageArgSources 284 285This optional field is required when the Message field contains the %X 286placeholder arguments. It is an array that says which SRC words to get the 287placeholders from. In the example below, SRC word 6 would be used for %1, and 288SRC word 7 for %2. 289 290```json 291"MessageArgSources": 292[ 293 "SRCWord6", "SRCWord7" 294] 295``` 296 297#### Description 298 299A short description of the error. This is required by the Redfish schema to 300generate a Redfish message entry, but is not used in Redfish or PEL output. 301 302```json 303"Description": "A power fault" 304``` 305 306#### Notes 307 308This is an optional free format text field for keeping any notes for the 309registry entry, as comments are not allowed in JSON. It is an array of strings 310for easier readability of long fields. 311 312```json 313"Notes": [ 314 "This entry is for every type of power fault.", 315 "There is probably a hardware failure." 316] 317``` 318 319### Callout Fields 320 321The callout fields allow one to specify the PEL callouts (either a hardware FRU, 322a symbolic FRU, or a maintenance procedure) in the entry for a particular error. 323These callouts can vary based on system type, as well as a user specified 324AdditionalData property field. Callouts will be added to the PEL in the order 325they are listed in the JSON. If a callout is passed into the error, say with 326CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL before the 327callouts in the registry. 328 329There is room for up to 10 callouts in a PEL. 330 331The callouts based on system type can be added in two ways, by using either a 332key called `System` or by `Systems`. 333 334The `System` key will accept the system name as a string and the user can add 335the callouts specific to that system under the `System`. 336 337Suppose if multiple systems have same callouts, the `Systems` key can be used. 338The `Systems` can accept the system names as an array of strings and the list of 339callouts common to those systems can be listed under the key. 340 341Available maintenance procedures are listed [here][1] and in the source code 342[here][2]. 343 344[1]: 345 https://github.com/ibm-openbmc/openpower-pel-parsers/blob/master/modules/calloutparsers/ocallouts/ocallouts.py 346[2]: 347 https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/pel_values.cpp 348 349If a procedure is needed that doesn't exist yet, please contact the owner of 350this code for instructions. 351 352#### Callouts example based on the system type 353 354```json 355"Callouts": 356[ 357 { 358 "System": "system1", 359 "CalloutList": 360 [ 361 { 362 "Priority": "high", 363 "LocCode": "P1-C1" 364 }, 365 { 366 "Priority": "low", 367 "LocCode": "P1" 368 } 369 ] 370 }, 371 { 372 "CalloutList": 373 [ 374 { 375 "Priority": "high", 376 "Procedure": "BMC0002" 377 } 378 ] 379 380 } 381] 382 383``` 384 385The above example shows that on system `system1`, the FRU at location P1-C1 will 386be called out with a priority of high, and the FRU at P1 with a priority of low. 387On every other system, the maintenance procedure BMC0002 is called out. 388 389#### Callouts example based on the Systems type 390 391```json 392"Callouts": 393[ 394 { 395 "Systems": ["system1", "system2"], 396 "CalloutList": 397 [ 398 { 399 "Priority": "high", 400 "LocCode": "P1-C1" 401 }, 402 { 403 "Priority": "low", 404 "LocCode": "P1" 405 } 406 ] 407 }, 408 { 409 "System": "system1", 410 "CalloutList": 411 [ 412 { 413 "Priority": "low", 414 "SymbolicFRU": "service_docs" 415 }, 416 { 417 "Priority": "low", 418 "SymbolicFRUTrusted": "air_mover", 419 "UseInventoryLocCode": true 420 } 421 ] 422 }, 423 { 424 "CalloutList": 425 [ 426 { 427 "Priority": "medium", 428 "Procedure": "BMC0001" 429 } 430 ] 431 } 432] 433``` 434 435The above example shows that on `system1`, the FRU at location P1-C1, P1, 436service_docs and air_mover will be called out. For `system2`, the FRU at 437location P1-C1, P1 will be called out. On every other system, the maintenance 438procedure BMC0001 is called out. 439 440#### Callouts example based on an AdditionalData field 441 442```json 443"CalloutsUsingAD": 444{ 445 "ADName": "PROC_NUM", 446 "CalloutsWithTheirADValues": 447 [ 448 { 449 "ADValue": "0", 450 "Callouts": 451 [ 452 { 453 "CalloutList": 454 [ 455 { 456 "Priority": "high", 457 "LocCode": "P1-C5" 458 } 459 ] 460 } 461 ] 462 }, 463 { 464 "ADValue": "1", 465 "Callouts": 466 [ 467 { 468 "CalloutList": 469 [ 470 { 471 "Priority": "high", 472 "LocCode": "P1-C6" 473 } 474 ] 475 } 476 ] 477 } 478 ] 479} 480 481``` 482 483This example shows that the callouts were selected based on the 'PROC_NUM' 484AdditionalData field. When PROC_NUM was 0, the FRU at P1-C5 was called out. When 485it was 1, P1-C6 was called out. Note that the same 'Callouts' array is used as 486in the previous example, so these callouts can also depend on the system type. 487 488If it's desired to use a different set of callouts when there isn't a match on 489the AdditionalData field, one can use CalloutsWhenNoADMatch. In the following 490example, the 'air_mover' callout will be added if 'PROC_NUM' isn't 0. 491'CalloutsWhenNoADMatch' has the same schema as the 'Callouts' section. 492 493```json 494"CalloutsUsingAD": 495{ 496 "ADName": "PROC_NUM", 497 "CalloutsWithTheirADValues": 498 [ 499 { 500 "ADValue": "0", 501 "Callouts": 502 [ 503 { 504 "CalloutList": 505 [ 506 { 507 "Priority": "high", 508 "LocCode": "P1-C5" 509 } 510 ] 511 } 512 ] 513 }, 514 ], 515 "CalloutsWhenNoADMatch": [ 516 { 517 "CalloutList": [ 518 { 519 "Priority": "high", 520 "SymbolicFRU": "air_mover" 521 } 522 ] 523 } 524 ] 525} 526 527``` 528 529#### CalloutType 530 531This field can be used to modify the failing component type field in the callout 532when the default doesn\'t fit: 533 534```json 535{ 536 537 "Priority": "high", 538 "Procedure": "FIXIT22" 539 "CalloutType": "config_procedure" 540} 541``` 542 543The defaults are: 544 545- Normal hardware FRU: hardware_fru 546- Symbolic FRU: symbolic_fru 547- Procedure: maint_procedure 548 549#### Symbolic FRU callouts with dynamic trusted location codes 550 551A special case is when one wants to use a symbolic FRU callout with a trusted 552location code, but the location code to use isn\'t known until runtime. This 553means it can\'t be specified using the 'LocCode' key in the registry. 554 555In this case, one should use the 'SymbolicFRUTrusted' key along with the 556'UseInventoryLocCode' key, and then pass in the inventory item that has the 557desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the 558AdditionalData property. The code will then look up the location code for that 559passed in inventory FRU and place it in the symbolic FRU callout. The normal FRU 560callout with that inventory item will not be created. The symbolic FRU must be 561the first callout in the registry for this to work. 562 563```json 564{ 565 "Priority": "high", 566 "SymbolicFRUTrusted": "AIR_MOVR", 567 "UseInventoryLocCode": true 568} 569``` 570 571### Capturing the Journal 572 573The PEL daemon can be told to capture pieces of the journal in PEL UserData 574sections. This could be useful for debugging problems where a BMC dump which 575would also contain the journal isn't available. 576 577The 'JournalCapture' field has two formats, one that will create one UserData 578section with the previous N lines of the journal, and another that can capture 579any number of journal snippets based on the journal's SYSLOG_IDENTIFIER field. 580 581```json 582"JournalCapture": { 583 "NumLines": 30 584} 585``` 586 587```json 588"JournalCapture": 589{ 590 "Sections": [ 591 { 592 "SyslogID": "phosphor-bmc-state-manager", 593 "NumLines": 20 594 }, 595 { 596 "SyslogID": "phosphor-log-manager", 597 "NumLines": 15 598 } 599 ] 600} 601``` 602 603The first example will capture the previous 30 lines from the journal into a 604single UserData section. 605 606The second example will create two UserData sections, the first with the most 607recent 20 lines from phosphor-bmc-state-manager, and the second with 15 lines 608from phosphor-log-manager. 609 610If a UserData section would make the PEL exceed its maximum size of 16KB, it 611will be dropped. 612 613## Modifying and Testing 614 615The general process for adding new entries to the message registry is: 616 6171. Update message_registry.json to add the new errors. 6182. If a new component ID is used (usually the first byte of the SRC reason 619 code), document it in O_component_ids.json. 6203. Validate the file. It must be valid JSON and obey the schema. The 621 `validate_registry.py` script in `extensions/openpower-pels/registry/tools` 622 will validate both, though it requires the python-jsonschema package to do 623 the schema validation. This script is also run to validate the message 624 registry as part of CI testing. 625 626 ```sh 627 ./tools/validate_registry.py -s schema/schema.json -r message_registry.json 628 ``` 629 6304. One can test what PELs are generated from these new entries without writing 631 any code to create the corresponding event logs: 632 633 1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on 634 the BMC. That directory may need to be created. 635 2. Use busctl to call the Create method to create an event log corresponding 636 to the message registry entry under test. 637 638 ```sh 639 busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \ 640 xyz.openbmc_project.Logging.Create Create ssa{ss} \ 641 xyz.openbmc_project.Common.Error.Timeout \ 642 xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5" 643 ``` 644 645 3. Check the PEL that was created using peltool. 646 4. When finished, delete the file from `/etc/phosphor-logging/`. 647