xref: /openbmc/phosphor-logging/extensions/openpower-pels/registry/README.md (revision f2131442a3dd9ccb44aad106aa6f4c14e3c051ba)
1# Platform Event Log Message Registry
2On the BMC, PELs are created from the standard event logs provided by
3phosphor-logging using a message registry that provides the PEL related fields.
4The message registry is a JSON file.
5
6## Contents
7* [Component IDs](#component-ids)
8* [Message Registry](#message-registry-fields)
9* [Modifying and Testing](#modifying-and-testing)
10
11## Component IDs
12A component ID is a 2 byte value of the form 0xYY00 used in a PEL to:
131. Provide the upper byte (the YY from above) of an SRC reason code in `BD`
14   SRCs.
152. Reside in the section header of the Private Header PEL section to specify
16   the error log creator's component ID.
173. Reside in the section header of the User Header section to specify the error
18   log committer's component ID.
194. Reside in the section header in the User Data section to specify which
20   parser to call to parse that section.
21
22Component IDs are specified in the message registry either as the upper byte of
23the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID`
24field.
25
26Component IDs will be unique on a per-repository basis for errors unique to
27that repository.  When the same errors are created by multiple repositories,
28those errors will all share the same component ID.  The master list of
29component IDs is [here](O_component_ids.json).  That file can used by PEL
30parsers to display a name for the component ID.  The 'O' in the name is the
31creator ID value for BMC created PELs.
32
33## Message Registry Fields
34The message registry schema is [here](schema/schema.json), and the message
35registry itself is [here](message_registry.json).  The schema will be validated
36either during a bitbake build or during CI, or eventually possibly both.
37
38In the message registry, there are fields for specifying:
39
40### Name
41This is the key into the message registry, and is the Message property
42of the OpenBMC event log that the PEL is being created from.
43
44```
45"Name": "xyz.openbmc_project.Power.Fault"
46```
47
48### Subsystem
49This field is part of the PEL User Header section, and is used to specify
50the subsystem pertaining to the error.  It is an enumeration that maps to the
51actual PEL value.  If the subsystem isn't known ahead of time, it can be passed
52in at the time of PEL creation using the 'PEL\_SUBSYSTEM' AdditionalData field.
53In this case, 'Subsystem' isn't required, though 'PossibleSubsystems' is.
54
55```
56"Subsystem": "power_supply"
57```
58
59### PossibleSubsystems
60This field is used by scripts that build documentation from the message
61registry to know which subsystems are possible for an error when it can't be
62hardcoded using the 'Subsystem' field.  It is mutually exclusive with the
63'Subsystem' field.
64
65```
66"PossibleSubsystems": ["memory", "processor"]
67```
68
69### Severity
70This field is part of the PEL User Header section, and is used to specify
71the PEL severity.  It is an optional field, if it isn't specified, then the
72severity of the OpenBMC event log will be converted into a PEL severity value.
73
74It can either be the plain severity value, or an array of severity values that
75are based on system type, where an entry without a system type will match
76anything unless another entry has a matching system type.
77
78```
79"Severity": "unrecoverable"
80```
81
82```
83Severity":
84[
85    {
86        "System": "system1",
87        "SevValue": "recovered"
88    },
89    {
90        "Severity": "unrecoverable"
91    }
92]
93```
94The above example shows that on system 'system1' the severity will be
95recovered, and on every other system it will be unrecoverable.
96
97### Mfg Severity
98This is an optional field and is used to override the Severity field when a
99specific manufacturing isolation mode is enabled.  It has the same format as
100Severity.
101
102```
103"MfgSeverity": "unrecoverable"
104```
105
106### Event Scope
107This field is part of the PEL User Header section, and is used to specify
108the event scope, as defined by the PEL spec.  It is optional and defaults to
109"entire platform".
110
111```
112"EventScope": "entire_platform"
113```
114
115### Event Type
116This field is part of the PEL User Header section, and is used to specify
117the event type, as defined by the PEL spec.  It is optional and defaults to
118"not applicable" for non-informational logs, and "misc_information_only" for
119informational ones.
120
121```
122"EventType": "na"
123```
124
125### Action Flags
126This field is part of the PEL User Header section, and is used to specify the
127PEL action flags, as defined by the PEL spec.  It is an array of enumerations.
128
129The action flags can usually be deduced from other PEL fields, such as the
130severity or if there are any callouts.  As such, this is an optional field and
131if not supplied the code will fill them in based on those fields.
132
133In fact, even if supplied here, the code may still modify them to ensure they
134are correct.  The rules used for this are
135[here](../README.md#action-flags-and-event-type-rules).
136
137```
138"ActionFlags": ["service_action", "report", "call_home"]
139```
140
141### Mfg Action Flags
142This is an optional field and is used to override the Action Flags field when a
143specific manufacturing isolation mode is enabled.
144
145```
146"MfgActionFlags": ["service_action", "report", "call_home"]
147```
148
149### Component ID
150This is the component ID of the PEL creator, in the form 0xYY00.  For `BD`
151SRCs, this is an optional field and if not present the value will be taken from
152the upper byte of the reason code.  If present for `BD` SRCs, then this byte
153must match the upper byte of the reason code.
154
155```
156"ComponentID": "0x5500"
157```
158
159### SRC Type
160This specifies the type of SRC to create.  The type is the first 2 characters
161of the 8 character ASCII string field of the PEL.  The allowed types are `BD`,
162for the standard OpenBMC error, and `11`, for power related errors.  It is
163optional and if not specified will default to `BD`.
164
165Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where:
166* BD = SRC type
167* BB = PEL subsystem as mentioned above
168* CCCC SRC reason code
169
170For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code.
171
172```
173"Type": "11"
174```
175
176### SRC Reason Code
177This is the 4 character value in the latter half of the SRC ASCII string.  It
178is treated as a 2 byte hex value, such as 0x5678.  For `BD` SRCs, the first
179byte is the same as the first byte of the component ID field in the Private
180Header section that represents the creator's component ID.
181
182```
183"ReasonCode": "0x5544"
184```
185
186### SRC Symptom ID Fields
187The symptom ID is in the Extended User Header section and is defined in the PEL
188spec as the unique event signature string.  It always starts with the ASCII
189string.  This field in the message registry allows one to choose which SRC words
190to use in addition to the ASCII string field to form the symptom ID. All words
191are separated by underscores.  If not specified, the code will choose a default
192format, which may depend on the SRC type.
193
194For example: ["SRCWord3", "SRCWord9"] would be:
195`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like:
196`B181320_00000050_49000000`.
197
198```
199"SymptomIDFields": ["SRCWord3", "SRCWord9"]
200```
201
202### SRC words 6 to 9
203In a PEL, these SRC words are free format and can be filled in by the user as
204desired.  On the BMC, the source of these words is the AdditionalData fields in
205the event log.  The message registry provides a way for the log creator to
206specify which AdditionalData property field to get the data from, and also to
207define what the SRC word means for use by parsers.  If not specified, these SRC
208words will be set to zero in the PEL.
209
210```
211"Words6to9":
212{
213    "6":
214    {
215        "description": "Failing unit number",
216        "AdditionalDataPropSource": "PS_NUM"
217    }
218}
219```
220
221### Documentation Fields
222The documentation fields are used by PEL parsers to display a human readable
223description of a PEL.  They are also the source for the Redfish event log
224messages.
225
226#### Message
227This field is used by the BMC's PEL parser as the description of the error log.
228It will also be used in Redfish event logs.  It supports argument substitution
229using the %1, %2, etc placeholders allowing any of the SRC user data words 6 -
2309 to be displayed as part of the message.  If the placeholders are used, then
231the `MessageArgSources` property must be present to say which SRC words to use
232for each placeholder.
233
234```
235"Message": "Processor %1 had %2 errors"
236```
237
238#### MessageArgSources
239This optional field is required when the Message field contains the %X
240placeholder arguments. It is an array that says which SRC words to get the
241placeholders from.  In the example below, SRC word 6 would be used for %1, and
242SRC word 7 for %2.
243
244```
245"MessageArgSources":
246[
247    "SRCWord6", "SRCWord7"
248]
249```
250
251#### Description
252A short description of the error.  This is required by the Redfish schema to generate a Redfish message entry, but is not used in Redfish or PEL output.
253
254```
255"Description": "A power fault"
256```
257
258#### Notes
259This is an optional free format text field for keeping any notes for the
260registry entry, as comments are not allowed in JSON.  It is an array of strings
261for easier readability of long fields.
262
263```
264"Notes": [
265    "This entry is for every type of power fault.",
266    "There is probably a hardware failure."
267]
268```
269
270### Callout Fields
271The callout fields allow one to specify the PEL callouts (either a hardware
272FRU, a symbolic FRU, or a maintenance procedure) in the entry for a particular
273error.  These callouts can vary based on system type, as well as a user
274specified AdditionalData property field.   Callouts will be added to the PEL in
275the order they are listed in the JSON.  If a callout is passed into the error,
276say with CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL
277before the callouts in the registry.
278
279There is room for up to 10 callouts in a PEL.
280
281#### Callouts example based on the system type
282
283```
284"Callouts":
285[
286    {
287        "System": "system1",
288        "CalloutList":
289        [
290            {
291                "Priority": "high",
292                "LocCode": "P1-C1"
293            },
294            {
295                "Priority": "low",
296                "LocCode": "P1"
297            }
298        ]
299    },
300    {
301        "CalloutList":
302        [
303            {
304                "Priority": "high",
305                "Procedure": "SVCDOCS"
306            }
307        ]
308
309    }
310]
311
312```
313
314The above example shows that on system 'system1', the FRU at location P1-C1
315will be called out with a priority of high, and the FRU at P1 with a priority
316of low.  On every other system, the maintenance procedure SVCDOCS is called
317out.
318
319#### Callouts example based on an AdditionalData field
320
321```
322"CalloutsUsingAD":
323{
324    "ADName": "PROC_NUM",
325    "CalloutsWithTheirADValues":
326    [
327        {
328            "ADValue": "0",
329            "Callouts":
330            [
331                {
332                    "CalloutList":
333                    [
334                        {
335                            "Priority": "high",
336                            "LocCode": "P1-C5"
337                        }
338                    ]
339                }
340            ]
341        },
342        {
343            "ADValue": "1",
344            "Callouts":
345            [
346                {
347                    "CalloutList":
348                    [
349                        {
350                            "Priority": "high",
351                            "LocCode": "P1-C6"
352                        }
353                    ]
354                }
355            ]
356        }
357    ]
358}
359
360```
361
362This example shows that the callouts were selected based on the 'PROC_NUM'
363AdditionalData field.  When PROC_NUM was 0, the FRU at P1-C5 was called out.
364When it was 1, P1-C6 was called out.  Note that the same 'Callouts' array is
365used as in the previous example, so these callouts can also depend on the
366system type.
367
368If it's desired to use a different set of callouts when there isn't a match
369on the AdditionalData field, one can use CalloutsWhenNoADMatch.  In the
370following example, the 'air_mover' callout will be added if 'PROC_NUM' isn't
3710.  'CalloutsWhenNoADMatch' has the same schema as the 'Callouts' section.
372
373```
374"CalloutsUsingAD":
375{
376    "ADName": "PROC_NUM",
377    "CalloutsWithTheirADValues":
378    [
379        {
380            "ADValue": "0",
381            "Callouts":
382            [
383                {
384                    "CalloutList":
385                    [
386                        {
387                            "Priority": "high",
388                            "LocCode": "P1-C5"
389                        }
390                    ]
391                }
392            ]
393        },
394    ],
395    "CalloutsWhenNoADMatch": [
396        {
397            "CalloutList": [
398                {
399                    "Priority": "high",
400                    "SymbolicFRU": "air_mover"
401                }
402            ]
403        }
404    ]
405}
406
407```
408
409#### CalloutType
410This field can be used to modify the failing component type field in the
411callout when the default doesn\'t fit:
412
413```
414{
415
416    "Priority": "high",
417    "Procedure": "FIXIT22"
418    "CalloutType": "config_procedure"
419}
420```
421
422The defaults are:
423- Normal hardware FRU: hardware_fru
424- Symbolic FRU: symbolic_fru
425- Procedure: maint_procedure
426
427#### Symbolic FRU callouts with dynamic trusted location codes
428
429A special case is when one wants to use a symbolic FRU callout with a trusted
430location code, but the location code to use isn\'t known until runtime. This
431means it can\'t be specified using the 'LocCode' key in the registry.
432
433In this case, one should use the 'SymbolicFRUTrusted' key along with the
434'UseInventoryLocCode' key, and then pass in the inventory item that has the
435desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the
436AdditionalData property.  The code will then look up the location code for that
437passed in inventory FRU and place it in the symbolic FRU callout.  The normal
438FRU callout with that inventory item will not be created.  The symbolic FRU
439must be the first callout in the registry for this to work.
440
441```
442{
443
444    "Priority": "high",
445    "SymbolicFRUTrusted": "AIR_MOVR",
446    "UseInventoryLocCode": true
447}
448```
449
450## Modifying and Testing
451
452The general process for adding new entries to the message registry is:
453
4541. Update message_registry.json to add the new errors.
4552. If a new component ID is used (usually the first byte of the SRC reason
456   code), document it in O_component_ids.json.
4573. Validate the file. It must be valid JSON and obey the schema.  The
458   `process_registry.py` script in `extensions/openpower-pels/registry/tools`
459   will validate both, though it requires the python-jsonschema package to do
460   the schema validation.  This script is also run to validate the message
461   registry as part of CI testing.
462
463```
464 ./tools/process_registry.py -v -s schema/schema.json -r message_registry.json
465```
466
4674. One can test what PELs are generated from these new entries without writing
468   any code to create the corresponding event logs:
469    1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on
470       the BMC. That directory may need to be created.
471    2. Use busctl to call the Create method to create an event log
472       corresponding to the message registry entry under test.
473
474```
475busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \
476xyz.openbmc_project.Logging.Create Create ssa{ss} \
477xyz.openbmc_project.Common.Error.Timeout \
478xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5"
479```
480
481    3. Check the PEL that was created using peltool.
482    4. When finished, delete the file from `/etc/phosphor-logging/`.
483