Name Date Size #Lines LOC

..--

schema/H--939825

tools/H--182137

B_component_ids.jsonH A D02-May-20221.9 KiB6766

O_component_ids.jsonH A D19-Mar-2024714 2423

README.mdH A D02-Dec-202418.8 KiB648514

message_registry.jsonH A D13-Jan-2025221.5 KiB6,3865,874

run-ci.shH A D08-Mar-2023222 53

README.md

1# Platform Event Log Message Registry
2
3On the BMC, PELs are created from the standard event logs provided by
4phosphor-logging using a message registry that provides the PEL related fields.
5The message registry is a JSON file.
6
7## Contents
8
9- [Component IDs](#component-ids)
10- [Message Registry](#message-registry-fields)
11- [Modifying and Testing](#modifying-and-testing)
12
13## Component IDs
14
15A component ID is a 2 byte value of the form 0xYY00 used in a PEL to:
16
171. Provide the upper byte (the YY from above) of an SRC reason code in `BD`
18   SRCs.
192. Reside in the section header of the Private Header PEL section to specify the
20   error log creator's component ID.
213. Reside in the section header of the User Header section to specify the error
22   log committer's component ID.
234. Reside in the section header in the User Data section to specify which parser
24   to call to parse that section.
25
26Component IDs are specified in the message registry either as the upper byte of
27the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID`
28field.
29
30Component IDs will be unique on a per-repository basis for errors unique to that
31repository. When the same errors are created by multiple repositories, those
32errors will all share the same component ID. The master list of component IDs is
33[here](O_component_ids.json). That file can used by PEL parsers to display a
34name for the component ID. The 'O' in the name is the creator ID value for BMC
35created PELs.
36
37## Message Registry Fields
38
39The message registry schema is [here](schema/schema.json), and the message
40registry itself is [here](message_registry.json). The schema will be validated
41either during a bitbake build or during CI, or eventually possibly both.
42
43In the message registry, there are fields for specifying:
44
45### Name
46
47This is the key into the message registry, and is the Message property of the
48OpenBMC event log that the PEL is being created from.
49
50```json
51"Name": "xyz.openbmc_project.Power.Fault"
52```
53
54### Subsystem
55
56This field is part of the PEL User Header section, and is used to specify the
57subsystem pertaining to the error. It is an enumeration that maps to the actual
58PEL value. If the subsystem isn't known ahead of time, it can be passed in at
59the time of PEL creation using the 'PEL_SUBSYSTEM' AdditionalData field. In this
60case, 'Subsystem' isn't required, though 'PossibleSubsystems' is.
61
62```json
63"Subsystem": "power_supply"
64```
65
66### PossibleSubsystems
67
68This field is used by scripts that build documentation from the message registry
69to know which subsystems are possible for an error when it can't be hardcoded
70using the 'Subsystem' field. It is mutually exclusive with the 'Subsystem'
71field.
72
73```json
74"PossibleSubsystems": ["memory", "processor"]
75```
76
77### Severity
78
79This field is part of the PEL User Header section, and is used to specify the
80PEL severity. It is an optional field, if it isn't specified, then the severity
81of the OpenBMC event log will be converted into a PEL severity value.
82
83It can either be the plain severity value, or an array of severity values that
84are based on system type, where an entry without a system type will match
85anything unless another entry has a matching system type.
86
87```json
88"Severity": "unrecoverable"
89```
90
91```json
92Severity":
93[
94    {
95        "System": "system1",
96        "SevValue": "recovered"
97    },
98    {
99        "Severity": "unrecoverable"
100    }
101]
102```
103
104The above example shows that on system 'system1' the severity will be recovered,
105and on every other system it will be unrecoverable.
106
107### Mfg Severity
108
109This is an optional field and is used to override the Severity field when a
110specific manufacturing isolation mode is enabled. It has the same format as
111Severity.
112
113```json
114"MfgSeverity": "unrecoverable"
115```
116
117### Event Scope
118
119This field is part of the PEL User Header section, and is used to specify the
120event scope, as defined by the PEL spec. It is optional and defaults to "entire
121platform".
122
123```json
124"EventScope": "entire_platform"
125```
126
127### Event Type
128
129This field is part of the PEL User Header section, and is used to specify the
130event type, as defined by the PEL spec. It is optional and defaults to "not
131applicable" for non-informational logs, and "misc_information_only" for
132informational ones.
133
134```json
135"EventType": "na"
136```
137
138### Action Flags
139
140This field is part of the PEL User Header section, and is used to specify the
141PEL action flags, as defined by the PEL spec. It is an array of enumerations.
142
143The action flags can usually be deduced from other PEL fields, such as the
144severity or if there are any callouts. As such, this is an optional field and if
145not supplied the code will fill them in based on those fields.
146
147In fact, even if supplied here, the code may still modify them to ensure they
148are correct. The rules used for this are
149[here](../README.md#action-flags-and-event-type-rules).
150
151```json
152"ActionFlags": ["service_action", "report", "call_home"]
153```
154
155### Mfg Action Flags
156
157This is an optional field and is used to override the Action Flags field when a
158specific manufacturing isolation mode is enabled.
159
160```json
161"MfgActionFlags": ["service_action", "report", "call_home"]
162```
163
164### Component ID
165
166This is the component ID of the PEL creator, in the form 0xYY00. For `BD` SRCs,
167this is an optional field and if not present the value will be taken from the
168upper byte of the reason code. If present for `BD` SRCs, then this byte must
169match the upper byte of the reason code.
170
171```json
172"ComponentID": "0x5500"
173```
174
175### SRC Type
176
177This specifies the type of SRC to create. The type is the first 2 characters of
178the 8 character ASCII string field of the PEL. The allowed types are `BD`, for
179the standard OpenBMC error, and `11`, for power related errors. It is optional
180and if not specified will default to `BD`.
181
182Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where:
183
184- BD = SRC type
185- BB = PEL subsystem as mentioned above
186- CCCC SRC reason code
187
188For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code.
189
190```json
191"Type": "11"
192```
193
194### SRC Reason Code
195
196This is the 4 character value in the latter half of the SRC ASCII string. It is
197treated as a 2 byte hex value, such as 0x5678. For `BD` SRCs, the first byte is
198the same as the first byte of the component ID field in the Private Header
199section that represents the creator's component ID.
200
201```json
202"ReasonCode": "0x5544"
203```
204
205### SRC Symptom ID Fields
206
207The symptom ID is in the Extended User Header section and is defined in the PEL
208spec as the unique event signature string. It always starts with the ASCII
209string. This field in the message registry allows one to choose which SRC words
210to use in addition to the ASCII string field to form the symptom ID. All words
211are separated by underscores. If not specified, the code will choose a default
212format, which may depend on the SRC type.
213
214For example: ["SRCWord3", "SRCWord9"] would be:
215`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like:
216`B181320_00000050_49000000`.
217
218```json
219"SymptomIDFields": ["SRCWord3", "SRCWord9"]
220```
221
222### SRC words 6 to 9
223
224In a PEL, these SRC words are free format and can be filled in by the user as
225desired. On the BMC, the source of these words is the AdditionalData fields in
226the event log. The message registry provides a way for the log creator to
227specify which AdditionalData property field to get the data from, and also to
228define what the SRC word means for use by parsers. If not specified, these SRC
229words will be set to zero in the PEL.
230
231```json
232"Words6to9":
233{
234    "6":
235    {
236        "description": "Failing unit number",
237        "AdditionalDataPropSource": "PS_NUM"
238    }
239}
240```
241
242### SRC Deconfig Flag
243
244Bit 6 in hex word 5 of the SRC means that one or more called out resources have
245been deconfigured, and this flag can be used to set that bit. The only other way
246to set it is by indicating it when
247[passing in the callouts via JSON](../README.md#callouts).
248
249This is looked at by the software that creates the periodic PELs that indicate a
250system is running with deconfigured hardware.
251
252```json
253"DeconfigFlag": true
254```
255
256### SRC Checkstop Flag
257
258This is used to indicate the PEL is for a hardware checkstop, and causes bit 0
259in hex word 5 of the SRC to be set.
260
261```json
262"CheckstopFlag": true
263```
264
265### Documentation Fields
266
267The documentation fields are used by PEL parsers to display a human readable
268description of a PEL. They are also the source for the Redfish event log
269messages.
270
271#### Message
272
273This field is used by the BMC's PEL parser as the description of the error log.
274It will also be used in Redfish event logs. It supports argument substitution
275using the %1, %2, etc placeholders allowing any of the SRC user data words 6 - 9
276to be displayed as part of the message. If the placeholders are used, then the
277`MessageArgSources` property must be present to say which SRC words to use for
278each placeholder.
279
280```json
281"Message": "Processor %1 had %2 errors"
282```
283
284#### MessageArgSources
285
286This optional field is required when the Message field contains the %X
287placeholder arguments. It is an array that says which SRC words to get the
288placeholders from. In the example below, SRC word 6 would be used for %1, and
289SRC word 7 for %2.
290
291```json
292"MessageArgSources":
293[
294    "SRCWord6", "SRCWord7"
295]
296```
297
298#### Description
299
300A short description of the error. This is required by the Redfish schema to
301generate a Redfish message entry, but is not used in Redfish or PEL output.
302
303```json
304"Description": "A power fault"
305```
306
307#### Notes
308
309This is an optional free format text field for keeping any notes for the
310registry entry, as comments are not allowed in JSON. It is an array of strings
311for easier readability of long fields.
312
313```json
314"Notes": [
315    "This entry is for every type of power fault.",
316    "There is probably a hardware failure."
317]
318```
319
320### Callout Fields
321
322The callout fields allow one to specify the PEL callouts (either a hardware FRU,
323a symbolic FRU, or a maintenance procedure) in the entry for a particular error.
324These callouts can vary based on system type, as well as a user specified
325AdditionalData property field. Callouts will be added to the PEL in the order
326they are listed in the JSON. If a callout is passed into the error, say with
327CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL before the
328callouts in the registry.
329
330There is room for up to 10 callouts in a PEL.
331
332The callouts based on system type can be added in two ways, by using either a
333key called `System` or by `Systems`.
334
335The `System` key will accept the system name as a string and the user can add
336the callouts specific to that system under the `System`.
337
338Suppose if multiple systems have same callouts, the `Systems` key can be used.
339The `Systems` can accept the system names as an array of strings and the list of
340callouts common to those systems can be listed under the key.
341
342Available maintenance procedures are listed [here][1] and in the source code
343[here][2].
344
345[1]:
346  https://github.com/ibm-openbmc/openpower-pel-parsers/blob/master/modules/calloutparsers/ocallouts/ocallouts.py
347[2]:
348  https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/pel_values.cpp
349
350If a procedure is needed that doesn't exist yet, please contact the owner of
351this code for instructions.
352
353#### Callouts example based on the system type
354
355```json
356"Callouts":
357[
358    {
359        "System": "system1",
360        "CalloutList":
361        [
362            {
363                "Priority": "high",
364                "LocCode": "P1-C1"
365            },
366            {
367                "Priority": "low",
368                "LocCode": "P1"
369            }
370        ]
371    },
372    {
373        "CalloutList":
374        [
375            {
376                "Priority": "high",
377                "Procedure": "BMC0002"
378            }
379        ]
380
381    }
382]
383
384```
385
386The above example shows that on system `system1`, the FRU at location P1-C1 will
387be called out with a priority of high, and the FRU at P1 with a priority of low.
388On every other system, the maintenance procedure BMC0002 is called out.
389
390#### Callouts example based on the Systems type
391
392```json
393"Callouts":
394[
395    {
396        "Systems": ["system1", "system2"],
397        "CalloutList":
398        [
399            {
400                "Priority": "high",
401                "LocCode": "P1-C1"
402            },
403            {
404                "Priority": "low",
405                "LocCode": "P1"
406            }
407        ]
408    },
409    {
410        "System": "system1",
411        "CalloutList":
412        [
413            {
414                "Priority": "low",
415                "SymbolicFRU": "service_docs"
416            },
417            {
418                "Priority": "low",
419                "SymbolicFRUTrusted": "air_mover",
420                "UseInventoryLocCode": true
421            }
422        ]
423    },
424    {
425        "CalloutList":
426        [
427            {
428                "Priority": "medium",
429                "Procedure": "BMC0001"
430            }
431        ]
432    }
433]
434```
435
436The above example shows that on `system1`, the FRU at location P1-C1, P1,
437service_docs and air_mover will be called out. For `system2`, the FRU at
438location P1-C1, P1 will be called out. On every other system, the maintenance
439procedure BMC0001 is called out.
440
441#### Callouts example based on an AdditionalData field
442
443```json
444"CalloutsUsingAD":
445{
446    "ADName": "PROC_NUM",
447    "CalloutsWithTheirADValues":
448    [
449        {
450            "ADValue": "0",
451            "Callouts":
452            [
453                {
454                    "CalloutList":
455                    [
456                        {
457                            "Priority": "high",
458                            "LocCode": "P1-C5"
459                        }
460                    ]
461                }
462            ]
463        },
464        {
465            "ADValue": "1",
466            "Callouts":
467            [
468                {
469                    "CalloutList":
470                    [
471                        {
472                            "Priority": "high",
473                            "LocCode": "P1-C6"
474                        }
475                    ]
476                }
477            ]
478        }
479    ]
480}
481
482```
483
484This example shows that the callouts were selected based on the 'PROC_NUM'
485AdditionalData field. When PROC_NUM was 0, the FRU at P1-C5 was called out. When
486it was 1, P1-C6 was called out. Note that the same 'Callouts' array is used as
487in the previous example, so these callouts can also depend on the system type.
488
489If it's desired to use a different set of callouts when there isn't a match on
490the AdditionalData field, one can use CalloutsWhenNoADMatch. In the following
491example, the 'air_mover' callout will be added if 'PROC_NUM' isn't 0.
492'CalloutsWhenNoADMatch' has the same schema as the 'Callouts' section.
493
494```json
495"CalloutsUsingAD":
496{
497    "ADName": "PROC_NUM",
498    "CalloutsWithTheirADValues":
499    [
500        {
501            "ADValue": "0",
502            "Callouts":
503            [
504                {
505                    "CalloutList":
506                    [
507                        {
508                            "Priority": "high",
509                            "LocCode": "P1-C5"
510                        }
511                    ]
512                }
513            ]
514        },
515    ],
516    "CalloutsWhenNoADMatch": [
517        {
518            "CalloutList": [
519                {
520                    "Priority": "high",
521                    "SymbolicFRU": "air_mover"
522                }
523            ]
524        }
525    ]
526}
527
528```
529
530#### CalloutType
531
532This field can be used to modify the failing component type field in the callout
533when the default doesn\'t fit:
534
535```json
536{
537
538    "Priority": "high",
539    "Procedure": "FIXIT22"
540    "CalloutType": "config_procedure"
541}
542```
543
544The defaults are:
545
546- Normal hardware FRU: hardware_fru
547- Symbolic FRU: symbolic_fru
548- Procedure: maint_procedure
549
550#### Symbolic FRU callouts with dynamic trusted location codes
551
552A special case is when one wants to use a symbolic FRU callout with a trusted
553location code, but the location code to use isn\'t known until runtime. This
554means it can\'t be specified using the 'LocCode' key in the registry.
555
556In this case, one should use the 'SymbolicFRUTrusted' key along with the
557'UseInventoryLocCode' key, and then pass in the inventory item that has the
558desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the
559AdditionalData property. The code will then look up the location code for that
560passed in inventory FRU and place it in the symbolic FRU callout. The normal FRU
561callout with that inventory item will not be created. The symbolic FRU must be
562the first callout in the registry for this to work.
563
564```json
565{
566  "Priority": "high",
567  "SymbolicFRUTrusted": "AIR_MOVR",
568  "UseInventoryLocCode": true
569}
570```
571
572### Capturing the Journal
573
574The PEL daemon can be told to capture pieces of the journal in PEL UserData
575sections. This could be useful for debugging problems where a BMC dump which
576would also contain the journal isn't available.
577
578The 'JournalCapture' field has two formats, one that will create one UserData
579section with the previous N lines of the journal, and another that can capture
580any number of journal snippets based on the journal's SYSLOG_IDENTIFIER field.
581
582```json
583"JournalCapture": {
584    "NumLines": 30
585}
586```
587
588```json
589"JournalCapture":
590{
591    "Sections": [
592        {
593            "SyslogID": "phosphor-bmc-state-manager",
594            "NumLines": 20
595        },
596        {
597            "SyslogID": "phosphor-log-manager",
598            "NumLines": 15
599        }
600    ]
601}
602```
603
604The first example will capture the previous 30 lines from the journal into a
605single UserData section.
606
607The second example will create two UserData sections, the first with the most
608recent 20 lines from phosphor-bmc-state-manager, and the second with 15 lines
609from phosphor-log-manager.
610
611If a UserData section would make the PEL exceed its maximum size of 16KB, it
612will be dropped.
613
614## Modifying and Testing
615
616The general process for adding new entries to the message registry is:
617
6181. Update message_registry.json to add the new errors.
6192. If a new component ID is used (usually the first byte of the SRC reason
620   code), document it in O_component_ids.json.
6213. Validate the file. It must be valid JSON and obey the schema. The
622   `validate_registry.py` script in `extensions/openpower-pels/registry/tools`
623   will validate both, though it requires the python-jsonschema package to do
624   the schema validation. This script is also run to validate the message
625   registry as part of CI testing.
626
627   ```sh
628   ./tools/validate_registry.py -s schema/schema.json -r message_registry.json
629   ```
630
6314. One can test what PELs are generated from these new entries without writing
632   any code to create the corresponding event logs:
633
634   1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on
635      the BMC. That directory may need to be created.
636   2. Use busctl to call the Create method to create an event log corresponding
637      to the message registry entry under test.
638
639      ```sh
640      busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \
641      xyz.openbmc_project.Logging.Create Create ssa{ss} \
642      xyz.openbmc_project.Common.Error.Timeout \
643      xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5"
644      ```
645
646   3. Check the PEL that was created using peltool.
647   4. When finished, delete the file from `/etc/phosphor-logging/`.
648