1# Platform Event Log Message Registry 2On the BMC, PELs are created from the standard event logs provided by 3phosphor-logging using a message registry that provides the PEL related fields. 4The message registry is a JSON file. 5 6## Contents 7* [Component IDs](#component-ids) 8* [Message Registry](#message-registry-fields) 9* [Modifying and Testing](#modifying-and-testing) 10 11## Component IDs 12A component ID is a 2 byte value of the form 0xYY00 used in a PEL to: 131. Provide the upper byte (the YY from above) of an SRC reason code in `BD` 14 SRCs. 152. Reside in the section header of the Private Header PEL section to specify 16 the error log creator's component ID. 173. Reside in the section header of the User Header section to specify the error 18 log committer's component ID. 194. Reside in the section header in the User Data section to specify which 20 parser to call to parse that section. 21 22Component IDs are specified in the message registry either as the upper byte of 23the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID` 24field. 25 26Component IDs will be unique on a per-repository basis for errors unique to 27that repository. When the same errors are created by multiple repositories, 28those errors will all share the same component ID. The master list of 29component IDs is [here](ComponentIDs.md). 30 31## Message Registry Fields 32The message registry schema is [here](schema/schema.json), and the message 33registry itself is [here](message_registry.json). The schema will be validated 34either during a bitbake build or during CI, or eventually possibly both. 35 36In the message registry, there are fields for specifying: 37 38### Name 39This is the key into the message registry, and is the Message property 40of the OpenBMC event log that the PEL is being created from. 41 42``` 43"Name": "xyz.openbmc_project.Power.Fault" 44``` 45 46### Subsystem 47This field is part of the PEL User Header section, and is used to specify 48the subsystem pertaining to the error. It is an enumeration that maps to the 49actual PEL value. If the subsystem isn't known ahead of time, it can be passed 50in at the time of PEL creation using the 'PEL\_SUBSYSTEM' AdditionalData field. 51In this case, 'Subsystem' isn't required, though 'PossibleSubsystems' is. 52 53``` 54"Subsystem": "power_supply" 55``` 56 57### PossibleSubsystems 58This field is used by scripts that build documentation from the message 59registry to know which subsystems are possible for an error when it can't be 60hardcoded using the 'Subsystem' field. It is mutually exclusive with the 61'Subsystem' field. 62 63``` 64"PossibleSubsystems": ["memory", "processor"] 65``` 66 67### Severity 68This field is part of the PEL User Header section, and is used to specify 69the PEL severity. It is an optional field, if it isn't specified, then the 70severity of the OpenBMC event log will be converted into a PEL severity value. 71 72It can either be the plain severity value, or an array of severity values that 73are based on system type, where an entry without a system type will match 74anything unless another entry has a matching system type. 75 76``` 77"Severity": "unrecoverable" 78``` 79 80``` 81Severity": 82[ 83 { 84 "System": "system1", 85 "SevValue": "recovered" 86 }, 87 { 88 "Severity": "unrecoverable" 89 } 90] 91``` 92The above example shows that on system 'system1' the severity will be 93recovered, and on every other system it will be unrecoverable. 94 95### Mfg Severity 96This is an optional field and is used to override the Severity field when a 97specific manufacturing isolation mode is enabled. It has the same format as 98Severity. 99 100``` 101"MfgSeverity": "unrecoverable" 102``` 103 104### Event Scope 105This field is part of the PEL User Header section, and is used to specify 106the event scope, as defined by the PEL spec. It is optional and defaults to 107"entire platform". 108 109``` 110"EventScope": "entire_platform" 111``` 112 113### Event Type 114This field is part of the PEL User Header section, and is used to specify 115the event type, as defined by the PEL spec. It is optional and defaults to 116"not applicable" for non-informational logs, and "misc_information_only" for 117informational ones. 118 119``` 120"EventType": "na" 121``` 122 123### Action Flags 124This field is part of the PEL User Header section, and is used to specify the 125PEL action flags, as defined by the PEL spec. It is an array of enumerations. 126 127The action flags can usually be deduced from other PEL fields, such as the 128severity or if there are any callouts. As such, this is an optional field and 129if not supplied the code will fill them in based on those fields. 130 131In fact, even if supplied here, the code may still modify them to ensure they 132are correct. The rules used for this are 133[here](../README.md#action-flags-and-event-type-rules). 134 135``` 136"ActionFlags": ["service_action", "report", "call_home"] 137``` 138 139### Mfg Action Flags 140This is an optional field and is used to override the Action Flags field when a 141specific manufacturing isolation mode is enabled. 142 143``` 144"MfgActionFlags": ["service_action", "report", "call_home"] 145``` 146 147### Component ID 148This is the component ID of the PEL creator, in the form 0xYY00. For `BD` 149SRCs, this is an optional field and if not present the value will be taken from 150the upper byte of the reason code. If present for `BD` SRCs, then this byte 151must match the upper byte of the reason code. 152 153``` 154"ComponentID": "0x5500" 155``` 156 157### SRC Type 158This specifies the type of SRC to create. The type is the first 2 characters 159of the 8 character ASCII string field of the PEL. The allowed types are `BD`, 160for the standard OpenBMC error, and `11`, for power related errors. It is 161optional and if not specified will default to `BD`. 162 163Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where: 164* BD = SRC type 165* BB = PEL subsystem as mentioned above 166* CCCC SRC reason code 167 168For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code. 169 170``` 171"Type": "11" 172``` 173 174### SRC Reason Code 175This is the 4 character value in the latter half of the SRC ASCII string. It 176is treated as a 2 byte hex value, such as 0x5678. For `BD` SRCs, the first 177byte is the same as the first byte of the component ID field in the Private 178Header section that represents the creator's component ID. 179 180``` 181"ReasonCode": "0x5544" 182``` 183 184### SRC Symptom ID Fields 185The symptom ID is in the Extended User Header section and is defined in the PEL 186spec as the unique event signature string. It always starts with the ASCII 187string. This field in the message registry allows one to choose which SRC words 188to use in addition to the ASCII string field to form the symptom ID. All words 189are separated by underscores. If not specified, the code will choose a default 190format, which may depend on the SRC type. 191 192For example: ["SRCWord3", "SRCWord9"] would be: 193`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like: 194`B181320_00000050_49000000`. 195 196``` 197"SymptomIDFields": ["SRCWord3", "SRCWord9"] 198``` 199 200### SRC words 6 to 9 201In a PEL, these SRC words are free format and can be filled in by the user as 202desired. On the BMC, the source of these words is the AdditionalData fields in 203the event log. The message registry provides a way for the log creator to 204specify which AdditionalData property field to get the data from, and also to 205define what the SRC word means for use by parsers. If not specified, these SRC 206words will be set to zero in the PEL. 207 208``` 209"Words6to9": 210{ 211 "6": 212 { 213 "description": "Failing unit number", 214 "AdditionalDataPropSource": "PS_NUM" 215 } 216} 217``` 218 219### Documentation Fields 220The documentation fields are used by PEL parsers to display a human readable 221description of a PEL. They are also the source for the Redfish event log 222messages. 223 224#### Message 225This field is used by the BMC's PEL parser as the description of the error log. 226It will also be used in Redfish event logs. It supports argument substitution 227using the %1, %2, etc placeholders allowing any of the SRC user data words 6 - 2289 to be displayed as part of the message. If the placeholders are used, then 229the `MessageArgSources` property must be present to say which SRC words to use 230for each placeholder. 231 232``` 233"Message": "Processor %1 had %2 errors" 234``` 235 236#### MessageArgSources 237This optional field is required when the Message field contains the %X 238placeholder arguments. It is an array that says which SRC words to get the 239placeholders from. In the example below, SRC word 6 would be used for %1, and 240SRC word 7 for %2. 241 242``` 243"MessageArgSources": 244[ 245 "SRCWord6", "SRCWord7" 246] 247``` 248 249#### Description 250A short description of the error. This is required by the Redfish schema to generate a Redfish message entry, but is not used in Redfish or PEL output. 251 252``` 253"Description": "A power fault" 254``` 255 256#### Notes 257This is an optional free format text field for keeping any notes for the 258registry entry, as comments are not allowed in JSON. It is an array of strings 259for easier readability of long fields. 260 261``` 262"Notes": [ 263 "This entry is for every type of power fault.", 264 "There is probably a hardware failure." 265] 266``` 267 268### Callout Fields 269The callout fields allow one to specify the PEL callouts (either a hardware 270FRU, a symbolic FRU, or a maintenance procedure) in the entry for a particular 271error. These callouts can vary based on system type, as well as a user 272specified AdditionalData property field. Callouts will be added to the PEL in 273the order they are listed in the JSON. If a callout is passed into the error, 274say with CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL 275before the callouts in the registry. 276 277There is room for up to 10 callouts in a PEL. 278 279#### Callouts example based on the system type 280 281``` 282"Callouts": 283[ 284 { 285 "System": "system1", 286 "CalloutList": 287 [ 288 { 289 "Priority": "high", 290 "LocCode": "P1-C1" 291 }, 292 { 293 "Priority": "low", 294 "LocCode": "P1" 295 } 296 ] 297 }, 298 { 299 "CalloutList": 300 [ 301 { 302 "Priority": "high", 303 "Procedure": "SVCDOCS" 304 } 305 ] 306 307 } 308] 309 310``` 311 312The above example shows that on system 'system1', the FRU at location P1-C1 313will be called out with a priority of high, and the FRU at P1 with a priority 314of low. On every other system, the maintenance procedure SVCDOCS is called 315out. 316 317#### Callouts example based on an AdditionalData field 318 319``` 320"CalloutsUsingAD": 321{ 322 "ADName": "PROC_NUM", 323 "CalloutsWithTheirADValues": 324 [ 325 { 326 "ADValue": "0", 327 "Callouts": 328 [ 329 { 330 "CalloutList": 331 [ 332 { 333 "Priority": "high", 334 "LocCode": "P1-C5" 335 } 336 ] 337 } 338 ] 339 }, 340 { 341 "ADValue": "1", 342 "Callouts": 343 [ 344 { 345 "CalloutList": 346 [ 347 { 348 "Priority": "high", 349 "LocCode": "P1-C6" 350 } 351 ] 352 } 353 ] 354 } 355 ] 356} 357 358``` 359 360This example shows that the callouts were selected based on the 'PROC_NUM' 361AdditionalData field. When PROC_NUM was 0, the FRU at P1-C5 was called out. 362When it was 1, P1-C6 was called out. Note that the same 'Callouts' array is 363used as in the previous example, so these callouts can also depend on the 364system type. 365 366#### CalloutType 367This field can be used to modify the failing component type field in the 368callout when the default doesn\'t fit: 369 370``` 371{ 372 373 "Priority": "high", 374 "Procedure": "FIXIT22" 375 "CalloutType": "config_procedure" 376} 377``` 378 379The defaults are: 380- Normal hardware FRU: hardware_fru 381- Symbolic FRU: symbolic_fru 382- Procedure: maint_procedure 383 384#### Symbolic FRU callouts with dynamic trusted location codes 385 386A special case is when one wants to use a symbolic FRU callout with a trusted 387location code, but the location code to use isn\'t known until runtime. This 388means it can\'t be specified using the 'LocCode' key in the registry. 389 390In this case, one should use the 'SymbolicFRUTrusted' key along with the 391'UseInventoryLocCode' key, and then pass in the inventory item that has the 392desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the 393AdditionalData property. The code will then look up the location code for that 394passed in inventory FRU and place it in the symbolic FRU callout. The normal 395FRU callout with that inventory item will not be created. The symbolic FRU 396must be the first callout in the registry for this to work. 397 398``` 399{ 400 401 "Priority": "high", 402 "SymbolicFRUTrusted": "AIR_MOVR", 403 "UseInventoryLocCode": true 404} 405``` 406 407## Modifying and Testing 408 409The general process for adding new entries to the message registry is: 410 4111. Update message_registry.json to add the new errors. 4122. If a new component ID is used (usually the first byte of the SRC reason 413 code), document it in ComponentIDs.md. 4143. Validate the file. It must be valid JSON and obey the schema. The 415 `process_registry.py` script in `extensions/openpower-pels/registry/tools` 416 will validate both, though it requires the python-jsonschema package to do 417 the schema validation. This script is also run to validate the message 418 registry as part of CI testing. 419 420``` 421 ./tools/process_registry.py -v -s schema/schema.json -r message_registry.json 422``` 423 4244. One can test what PELs are generated from these new entries without writing 425 any code to create the corresponding event logs: 426 1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on 427 the BMC. That directory may need to be created. 428 2. Use busctl to call the Create method to create an event log 429 corresponding to the message registry entry under test. 430 431``` 432busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \ 433xyz.openbmc_project.Logging.Create Create ssa{ss} \ 434xyz.openbmc_project.Common.Error.Timeout \ 435xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5" 436``` 437 438 3. Check the PEL that was created using peltool. 439 4. When finished, delete the file from `/etc/phosphor-logging/`. 440