README.md
1# Platform Event Log Message Registry
2
3On the BMC, PELs are created from the standard event logs provided by
4phosphor-logging using a message registry that provides the PEL related fields.
5The message registry is a JSON file.
6
7## Contents
8
9- [Component IDs](#component-ids)
10- [Message Registry](#message-registry-fields)
11- [Modifying and Testing](#modifying-and-testing)
12
13## Component IDs
14
15A component ID is a 2 byte value of the form 0xYY00 used in a PEL to:
16
171. Provide the upper byte (the YY from above) of an SRC reason code in `BD`
18 SRCs.
192. Reside in the section header of the Private Header PEL section to specify the
20 error log creator's component ID.
213. Reside in the section header of the User Header section to specify the error
22 log committer's component ID.
234. Reside in the section header in the User Data section to specify which parser
24 to call to parse that section.
25
26Component IDs are specified in the message registry either as the upper byte of
27the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID`
28field.
29
30Component IDs will be unique on a per-repository basis for errors unique to that
31repository. When the same errors are created by multiple repositories, those
32errors will all share the same component ID. The master list of component IDs is
33[available](O_component_ids.json). That file can used by PEL parsers to display
34a name for the component ID. The 'O' in the name is the creator ID value for BMC
35created PELs.
36
37## Message Registry Fields
38
39The [message registry schema](schema/schema.json) and the
40[message registry](message_registry.json) is available. The schema will be
41validated either during a bitbake build or during CI, or eventually possibly
42both.
43
44In the message registry, there are fields for specifying:
45
46### Name
47
48This is the key into the message registry, and is the Message property of the
49OpenBMC event log that the PEL is being created from.
50
51```json
52"Name": "xyz.openbmc_project.Power.Fault"
53```
54
55### Subsystem
56
57This field is part of the PEL User Header section, and is used to specify the
58subsystem pertaining to the error. It is an enumeration that maps to the actual
59PEL value. If the subsystem isn't known ahead of time, it can be passed in at
60the time of PEL creation using the 'PEL_SUBSYSTEM' AdditionalData field. In this
61case, 'Subsystem' isn't required, though 'PossibleSubsystems' is.
62
63```json
64"Subsystem": "power_supply"
65```
66
67### PossibleSubsystems
68
69This field is used by scripts that build documentation from the message registry
70to know which subsystems are possible for an error when it can't be hardcoded
71using the 'Subsystem' field. It is mutually exclusive with the 'Subsystem'
72field.
73
74```json
75"PossibleSubsystems": ["memory", "processor"]
76```
77
78### Severity
79
80This field is part of the PEL User Header section, and is used to specify the
81PEL severity. It is an optional field, if it isn't specified, then the severity
82of the OpenBMC event log will be converted into a PEL severity value.
83
84It can either be the plain severity value, or an array of severity values that
85are based on system type, where an entry without a system type will match
86anything unless another entry has a matching system type.
87
88```json
89"Severity": "unrecoverable"
90```
91
92```json
93Severity":
94[
95 {
96 "System": "system1",
97 "SevValue": "recovered"
98 },
99 {
100 "Severity": "unrecoverable"
101 }
102]
103```
104
105The above example shows that on system 'system1' the severity will be recovered,
106and on every other system it will be unrecoverable.
107
108### Mfg Severity
109
110This is an optional field and is used to override the Severity field when a
111specific manufacturing isolation mode is enabled. It has the same format as
112Severity.
113
114```json
115"MfgSeverity": "unrecoverable"
116```
117
118### Event Scope
119
120This field is part of the PEL User Header section, and is used to specify the
121event scope, as defined by the PEL spec. It is optional and defaults to "entire
122platform".
123
124```json
125"EventScope": "entire_platform"
126```
127
128### Event Type
129
130This field is part of the PEL User Header section, and is used to specify the
131event type, as defined by the PEL spec. It is optional and defaults to "not
132applicable" for non-informational logs, and "misc_information_only" for
133informational ones.
134
135```json
136"EventType": "na"
137```
138
139### Action Flags
140
141This field is part of the PEL User Header section, and is used to specify the
142PEL action flags, as defined by the PEL spec. It is an array of enumerations.
143
144The action flags can usually be deduced from other PEL fields, such as the
145severity or if there are any callouts. As such, this is an optional field and if
146not supplied the code will fill them in based on those fields.
147
148In fact, even if supplied here, the code may still modify them to ensure they
149are correct. The rules used for this are in the
150[OpenPower PELs README](../README.md#action-flags-and-event-type-rules).
151
152```json
153"ActionFlags": ["service_action", "report", "call_home"]
154```
155
156### Mfg Action Flags
157
158This is an optional field and is used to override the Action Flags field when a
159specific manufacturing isolation mode is enabled.
160
161```json
162"MfgActionFlags": ["service_action", "report", "call_home"]
163```
164
165### Component ID
166
167This is the component ID of the PEL creator, in the form 0xYY00. For `BD` SRCs,
168this is an optional field and if not present the value will be taken from the
169upper byte of the reason code. If present for `BD` SRCs, then this byte must
170match the upper byte of the reason code.
171
172```json
173"ComponentID": "0x5500"
174```
175
176### SRC Type
177
178This specifies the type of SRC to create. The type is the first 2 characters of
179the 8 character ASCII string field of the PEL. The allowed types are `BD`, for
180the standard OpenBMC error, and `11`, for power related errors. It is optional
181and if not specified will default to `BD`.
182
183Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where:
184
185- BD = SRC type
186- BB = PEL subsystem as mentioned above
187- CCCC SRC reason code
188
189For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code.
190
191```json
192"Type": "11"
193```
194
195### SRC Reason Code
196
197This is the 4 character value in the latter half of the SRC ASCII string. It is
198treated as a 2 byte hex value, such as 0x5678. For `BD` SRCs, the first byte is
199the same as the first byte of the component ID field in the Private Header
200section that represents the creator's component ID.
201
202```json
203"ReasonCode": "0x5544"
204```
205
206### SRC Symptom ID Fields
207
208The symptom ID is in the Extended User Header section and is defined in the PEL
209spec as the unique event signature string. It always starts with the ASCII
210string. This field in the message registry allows one to choose which SRC words
211to use in addition to the ASCII string field to form the symptom ID. All words
212are separated by underscores. If not specified, the code will choose a default
213format, which may depend on the SRC type.
214
215For example: ["SRCWord3", "SRCWord9"] would be:
216`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like:
217`B181320_00000050_49000000`.
218
219```json
220"SymptomIDFields": ["SRCWord3", "SRCWord9"]
221```
222
223### SRC words 6 to 9
224
225In a PEL, these SRC words are free format and can be filled in by the user as
226desired. On the BMC, the source of these words is the AdditionalData fields in
227the event log. The message registry provides a way for the log creator to
228specify which AdditionalData property field to get the data from, and also to
229define what the SRC word means for use by parsers. If not specified, these SRC
230words will be set to zero in the PEL.
231
232```json
233"Words6to9":
234{
235 "6":
236 {
237 "description": "Failing unit number",
238 "AdditionalDataPropSource": "PS_NUM"
239 }
240}
241```
242
243### SRC Deconfig Flag
244
245Bit 6 in hex word 5 of the SRC means that one or more called out resources have
246been deconfigured, and this flag can be used to set that bit. The only other way
247to set it is by indicating it when
248[passing in the callouts via JSON](../README.md#callouts).
249
250This is looked at by the software that creates the periodic PELs that indicate a
251system is running with deconfigured hardware.
252
253```json
254"DeconfigFlag": true
255```
256
257### SRC Checkstop Flag
258
259This is used to indicate the PEL is for a hardware checkstop, and causes bit 0
260in hex word 5 of the SRC to be set.
261
262```json
263"CheckstopFlag": true
264```
265
266### Documentation Fields
267
268The documentation fields are used by PEL parsers to display a human readable
269description of a PEL. They are also the source for the Redfish event log
270messages.
271
272#### Message
273
274This field is used by the BMC's PEL parser as the description of the error log.
275It will also be used in Redfish event logs. It supports argument substitution
276using the %1, %2, etc placeholders allowing any of the SRC user data words 6 - 9
277to be displayed as part of the message. If the placeholders are used, then the
278`MessageArgSources` property must be present to say which SRC words to use for
279each placeholder.
280
281```json
282"Message": "Processor %1 had %2 errors"
283```
284
285#### MessageArgSources
286
287This optional field is required when the Message field contains the %X
288placeholder arguments. It is an array that says which SRC words to get the
289placeholders from. In the example below, SRC word 6 would be used for %1, and
290SRC word 7 for %2.
291
292```json
293"MessageArgSources":
294[
295 "SRCWord6", "SRCWord7"
296]
297```
298
299#### Description
300
301A short description of the error. This is required by the Redfish schema to
302generate a Redfish message entry, but is not used in Redfish or PEL output.
303
304```json
305"Description": "A power fault"
306```
307
308#### Notes
309
310This is an optional free format text field for keeping any notes for the
311registry entry, as comments are not allowed in JSON. It is an array of strings
312for easier readability of long fields.
313
314```json
315"Notes": [
316 "This entry is for every type of power fault.",
317 "There is probably a hardware failure."
318]
319```
320
321### Callout Fields
322
323The callout fields allow one to specify the PEL callouts (either a hardware FRU,
324a symbolic FRU, or a maintenance procedure) in the entry for a particular error.
325These callouts can vary based on system type, as well as a user specified
326AdditionalData property field. Callouts will be added to the PEL in the order
327they are listed in the JSON. If a callout is passed into the error, say with
328CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL before the
329callouts in the registry.
330
331There is room for up to 10 callouts in a PEL.
332
333The callouts based on system type can be added in two ways, by using either a
334key called `System` or by `Systems`.
335
336The `System` key will accept the system name as a string and the user can add
337the callouts specific to that system under the `System`.
338
339Suppose if multiple systems have same callouts, the `Systems` key can be used.
340The `Systems` can accept the system names as an array of strings and the list of
341callouts common to those systems can be listed under the key.
342
343Available maintenance procedures are listed in the [parser][1] and in the
344[source code][2].
345
346[1]:
347 https://github.com/ibm-openbmc/openpower-pel-parsers/blob/master/modules/calloutparsers/ocallouts/ocallouts.py
348[2]:
349 https://github.com/openbmc/phosphor-logging/blob/master/extensions/openpower-pels/pel_values.cpp
350
351If a procedure is needed that doesn't exist yet, please contact the owner of
352this code for instructions.
353
354#### Callouts example based on the system type
355
356```json
357"Callouts":
358[
359 {
360 "System": "system1",
361 "CalloutList":
362 [
363 {
364 "Priority": "high",
365 "LocCode": "P1-C1"
366 },
367 {
368 "Priority": "low",
369 "LocCode": "P1"
370 }
371 ]
372 },
373 {
374 "CalloutList":
375 [
376 {
377 "Priority": "high",
378 "Procedure": "BMC0002"
379 }
380 ]
381
382 }
383]
384
385```
386
387The above example shows that on system `system1`, the FRU at location P1-C1 will
388be called out with a priority of high, and the FRU at P1 with a priority of low.
389On every other system, the maintenance procedure BMC0002 is called out.
390
391#### Callouts example based on the Systems type
392
393```json
394"Callouts":
395[
396 {
397 "Systems": ["system1", "system2"],
398 "CalloutList":
399 [
400 {
401 "Priority": "high",
402 "LocCode": "P1-C1"
403 },
404 {
405 "Priority": "low",
406 "LocCode": "P1"
407 }
408 ]
409 },
410 {
411 "System": "system1",
412 "CalloutList":
413 [
414 {
415 "Priority": "low",
416 "SymbolicFRU": "service_docs"
417 },
418 {
419 "Priority": "low",
420 "SymbolicFRUTrusted": "air_mover",
421 "UseInventoryLocCode": true
422 }
423 ]
424 },
425 {
426 "CalloutList":
427 [
428 {
429 "Priority": "medium",
430 "Procedure": "BMC0001"
431 }
432 ]
433 }
434]
435```
436
437The above example shows that on `system1`, the FRU at location P1-C1, P1,
438service_docs and air_mover will be called out. For `system2`, the FRU at
439location P1-C1, P1 will be called out. On every other system, the maintenance
440procedure BMC0001 is called out.
441
442#### Callouts example based on an AdditionalData field
443
444```json
445"CalloutsUsingAD":
446{
447 "ADName": "PROC_NUM",
448 "CalloutsWithTheirADValues":
449 [
450 {
451 "ADValue": "0",
452 "Callouts":
453 [
454 {
455 "CalloutList":
456 [
457 {
458 "Priority": "high",
459 "LocCode": "P1-C5"
460 }
461 ]
462 }
463 ]
464 },
465 {
466 "ADValue": "1",
467 "Callouts":
468 [
469 {
470 "CalloutList":
471 [
472 {
473 "Priority": "high",
474 "LocCode": "P1-C6"
475 }
476 ]
477 }
478 ]
479 }
480 ]
481}
482
483```
484
485This example shows that the callouts were selected based on the 'PROC_NUM'
486AdditionalData field. When PROC_NUM was 0, the FRU at P1-C5 was called out. When
487it was 1, P1-C6 was called out. Note that the same 'Callouts' array is used as
488in the previous example, so these callouts can also depend on the system type.
489
490If it's desired to use a different set of callouts when there isn't a match on
491the AdditionalData field, one can use CalloutsWhenNoADMatch. In the following
492example, the 'air_mover' callout will be added if 'PROC_NUM' isn't 0.
493'CalloutsWhenNoADMatch' has the same schema as the 'Callouts' section.
494
495```json
496"CalloutsUsingAD":
497{
498 "ADName": "PROC_NUM",
499 "CalloutsWithTheirADValues":
500 [
501 {
502 "ADValue": "0",
503 "Callouts":
504 [
505 {
506 "CalloutList":
507 [
508 {
509 "Priority": "high",
510 "LocCode": "P1-C5"
511 }
512 ]
513 }
514 ]
515 },
516 ],
517 "CalloutsWhenNoADMatch": [
518 {
519 "CalloutList": [
520 {
521 "Priority": "high",
522 "SymbolicFRU": "air_mover"
523 }
524 ]
525 }
526 ]
527}
528
529```
530
531#### CalloutType
532
533This field can be used to modify the failing component type field in the callout
534when the default doesn\'t fit:
535
536```json
537{
538
539 "Priority": "high",
540 "Procedure": "FIXIT22"
541 "CalloutType": "config_procedure"
542}
543```
544
545The defaults are:
546
547- Normal hardware FRU: hardware_fru
548- Symbolic FRU: symbolic_fru
549- Procedure: maint_procedure
550
551#### Symbolic FRU callouts with dynamic trusted location codes
552
553A special case is when one wants to use a symbolic FRU callout with a trusted
554location code, but the location code to use isn\'t known until runtime. This
555means it can\'t be specified using the 'LocCode' key in the registry.
556
557In this case, one should use the 'SymbolicFRUTrusted' key along with the
558'UseInventoryLocCode' key, and then pass in the inventory item that has the
559desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the
560AdditionalData property. The code will then look up the location code for that
561passed in inventory FRU and place it in the symbolic FRU callout. The normal FRU
562callout with that inventory item will not be created. The symbolic FRU must be
563the first callout in the registry for this to work.
564
565```json
566{
567 "Priority": "high",
568 "SymbolicFRUTrusted": "AIR_MOVR",
569 "UseInventoryLocCode": true
570}
571```
572
573### Capturing the Journal
574
575The PEL daemon can be told to capture pieces of the journal in PEL UserData
576sections. This could be useful for debugging problems where a BMC dump which
577would also contain the journal isn't available.
578
579The 'JournalCapture' field has two formats, one that will create one UserData
580section with the previous N lines of the journal, and another that can capture
581any number of journal snippets based on the journal's SYSLOG_IDENTIFIER field.
582
583```json
584"JournalCapture": {
585 "NumLines": 30
586}
587```
588
589```json
590"JournalCapture":
591{
592 "Sections": [
593 {
594 "SyslogID": "phosphor-bmc-state-manager",
595 "NumLines": 20
596 },
597 {
598 "SyslogID": "phosphor-log-manager",
599 "NumLines": 15
600 }
601 ]
602}
603```
604
605The first example will capture the previous 30 lines from the journal into a
606single UserData section.
607
608The second example will create two UserData sections, the first with the most
609recent 20 lines from phosphor-bmc-state-manager, and the second with 15 lines
610from phosphor-log-manager.
611
612If a UserData section would make the PEL exceed its maximum size of 16KB, it
613will be dropped.
614
615## Modifying and Testing
616
617The general process for adding new entries to the message registry is:
618
6191. Update message_registry.json to add the new errors.
6202. If a new component ID is used (usually the first byte of the SRC reason
621 code), document it in O_component_ids.json.
6223. Validate the file. It must be valid JSON and obey the schema. The
623 `validate_registry.py` script in `extensions/openpower-pels/registry/tools`
624 will validate both, though it requires the python-jsonschema package to do
625 the schema validation. This script is also run to validate the message
626 registry as part of CI testing.
627
628 ```sh
629 ./tools/validate_registry.py -s schema/schema.json -r message_registry.json
630 ```
631
6324. One can test what PELs are generated from these new entries without writing
633 any code to create the corresponding event logs:
634
635 1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on
636 the BMC. That directory may need to be created.
637 2. Use busctl to call the Create method to create an event log corresponding
638 to the message registry entry under test.
639
640 ```sh
641 busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \
642 xyz.openbmc_project.Logging.Create Create ssa{ss} \
643 xyz.openbmc_project.Common.Error.Timeout \
644 xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5"
645 ```
646
647 3. Check the PEL that was created using peltool.
648 4. When finished, delete the file from `/etc/phosphor-logging/`.
649