1# Platform Event Log Message Registry
2On the BMC, PELs are created from the standard event logs provided by
3phosphor-logging using a message registry that provides the PEL related fields.
4The message registry is a JSON file.
5
6## Contents
7* [Component IDs](#component-ids)
8* [Message Registry](#message-registry-fields)
9* [Modifying and Testing](#modifying-and-testing)
10
11## Component IDs
12A component ID is a 2 byte value of the form 0xYY00 used in a PEL to:
131. Provide the upper byte (the YY from above) of an SRC reason code in `BD`
14   SRCs.
152. Reside in the section header of the Private Header PEL section to specify
16   the error log creator's component ID.
173. Reside in the section header of the User Header section to specify the error
18   log committer's component ID.
194. Reside in the section header in the User Data section to specify which
20   parser to call to parse that section.
21
22Component IDs are specified in the message registry either as the upper byte of
23the SRC reason code field for `BD` SRCs, or in the standalone `ComponentID`
24field.
25
26Component IDs will be unique on a per-repository basis for errors unique to
27that repository.  When the same errors are created by multiple repositories,
28those errors will all share the same component ID.  The master list of
29component IDs is [here](ComponentIDs.md).
30
31## Message Registry Fields
32The message registry schema is [here](schema/schema.json), and the message
33registry itself is [here](message_registry.json).  The schema will be validated
34either during a bitbake build or during CI, or eventually possibly both.
35
36In the message registry, there are fields for specifying:
37
38### Name
39This is the key into the message registry, and is the Message property
40of the OpenBMC event log that the PEL is being created from.
41
42```
43"Name": "xyz.openbmc_project.Power.Fault"
44```
45
46### Subsystem
47This field is part of the PEL User Header section, and is used to specify
48the subsystem pertaining to the error.  It is an enumeration that maps to the
49actual PEL value.
50
51```
52"Subsystem": "power_supply"
53```
54
55### Severity
56This field is part of the PEL User Header section, and is used to specify
57the PEL severity.  It is an optional field, if it isn't specified, then the
58severity of the OpenBMC event log will be converted into a PEL severity value.
59
60It can either be the plain severity value, or an array of severity values that
61are based on system type, where an entry without a system type will match
62anything unless another entry has a matching system type.
63
64```
65"Severity": "unrecoverable"
66```
67
68```
69Severity":
70[
71    {
72        "System": "system1",
73        "SevValue": "recovered"
74    },
75    {
76        "Severity": "unrecoverable"
77    }
78]
79```
80The above example shows that on system 'system1' the severity will be
81recovered, and on every other system it will be unrecoverable.
82
83### Mfg Severity
84This is an optional field and is used to override the Severity field when a
85specific manufacturing isolation mode is enabled.  It has the same format as
86Severity.
87
88```
89"MfgSeverity": "unrecoverable"
90```
91
92### Event Scope
93This field is part of the PEL User Header section, and is used to specify
94the event scope, as defined by the PEL spec.  It is optional and defaults to
95"entire platform".
96
97```
98"EventScope": "entire_platform"
99```
100
101### Event Type
102This field is part of the PEL User Header section, and is used to specify
103the event type, as defined by the PEL spec.  It is optional and defaults to
104"not applicable" for non-informational logs, and "misc_information_only" for
105informational ones.
106
107```
108"EventType": "na"
109```
110
111### Action Flags
112This field is part of the PEL User Header section, and is used to specify the
113PEL action flags, as defined by the PEL spec.  It is an array of enumerations.
114
115The action flags can usually be deduced from other PEL fields, such as the
116severity or if there are any callouts.  As such, this is an optional field and
117if not supplied the code will fill them in based on those fields.
118
119In fact, even if supplied here, the code may still modify them to ensure they
120are correct.  The rules used for this are
121[here](../README.md#action-flags-and-event-type-rules).
122
123```
124"ActionFlags": ["service_action", "report", "call_home"]
125```
126
127### Mfg Action Flags
128This is an optional field and is used to override the Action Flags field when a
129specific manufacturing isolation mode is enabled.
130
131```
132"MfgActionFlags": ["service_action", "report", "call_home"]
133```
134
135### Component ID
136This is the component ID of the PEL creator, in the form 0xYY00.  For `BD`
137SRCs, this is an optional field and if not present the value will be taken from
138the upper byte of the reason code.  If present for `BD` SRCs, then this byte
139must match the upper byte of the reason code.
140
141```
142"ComponentID": "0x5500"
143```
144
145### SRC Type
146This specifies the type of SRC to create.  The type is the first 2 characters
147of the 8 character ASCII string field of the PEL.  The allowed types are `BD`,
148for the standard OpenBMC error, and `11`, for power related errors.  It is
149optional and if not specified will default to `BD`.
150
151Note: The ASCII string for BD SRCs looks like: `BDBBCCCC`, where:
152* BD = SRC type
153* BB = PEL subsystem as mentioned above
154* CCCC SRC reason code
155
156For `11` SRCs, it looks like: `1100RRRR`, where RRRR is the SRC reason code.
157
158```
159"Type": "11"
160```
161
162### SRC Reason Code
163This is the 4 character value in the latter half of the SRC ASCII string.  It
164is treated as a 2 byte hex value, such as 0x5678.  For `BD` SRCs, the first
165byte is the same as the first byte of the component ID field in the Private
166Header section that represents the creator's component ID.
167
168```
169"ReasonCode": "0x5544"
170```
171
172### SRC Symptom ID Fields
173The symptom ID is in the Extended User Header section and is defined in the PEL
174spec as the unique event signature string.  It always starts with the ASCII
175string.  This field in the message registry allows one to choose which SRC words
176to use in addition to the ASCII string field to form the symptom ID. All words
177are separated by underscores.  If not specified, the code will choose a default
178format, which may depend on the SRC type.
179
180For example: ["SRCWord3", "SRCWord9"] would be:
181`<ASCII_STRING>_<SRCWord3>_<SRCWord9>`, which could look like:
182`B181320_00000050_49000000`.
183
184```
185"SymptomIDFields": ["SRCWord3", "SRCWord9"]
186```
187
188### SRC words 6 to 9
189In a PEL, these SRC words are free format and can be filled in by the user as
190desired.  On the BMC, the source of these words is the AdditionalData fields in
191the event log.  The message registry provides a way for the log creator to
192specify which AdditionalData property field to get the data from, and also to
193define what the SRC word means for use by parsers.  If not specified, these SRC
194words will be set to zero in the PEL.
195
196```
197"Words6to9":
198{
199    "6":
200    {
201        "description": "Failing unit number",
202        "AdditionalDataPropSource": "PS_NUM"
203    }
204}
205```
206
207### SRC Power Fault flag
208The SRC has a bit in it to indicate if the error is a power fault.  This is an
209optional field in the message registry and defaults to false.
210
211```
212"PowerFault: false
213```
214
215### Documentation Fields
216The documentation fields are used by PEL parsers to display a human readable
217description of a PEL.  They are also the source for the Redfish event log
218messages.
219
220#### Message
221This field is used by the BMC's PEL parser as the description of the error log.
222It will also be used in Redfish event logs.  It supports argument substitution
223using the %1, %2, etc placeholders allowing any of the SRC user data words 6 -
2249 to be displayed as part of the message.  If the placeholders are used, then
225the `MessageArgSources` property must be present to say which SRC words to use
226for each placeholder.
227
228```
229"Message": "Processor %1 had %2 errors"
230```
231
232#### MessageArgSources
233This optional field is required when the Message field contains the %X
234placeholder arguments. It is an array that says which SRC words to get the
235placeholders from.  In the example below, SRC word 6 would be used for %1, and
236SRC word 7 for %2.
237
238```
239"MessageArgSources":
240[
241    "SRCWord6", "SRCWord7"
242]
243```
244
245#### Description
246A short description of the error.  This is required by the Redfish schema to generate a Redfish message entry, but is not used in Redfish or PEL output.
247
248```
249"Description": "A power fault"
250```
251
252#### Notes
253This is an optional free format text field for keeping any notes for the
254registry entry, as comments are not allowed in JSON.  It is an array of strings
255for easier readability of long fields.
256
257```
258"Notes": [
259    "This entry is for every type of power fault.",
260    "There is probably a hardware failure."
261]
262```
263
264### Callout Fields
265The callout fields allow one to specify the PEL callouts (either a hardware
266FRU, a symbolic FRU, or a maintenance procedure) in the entry for a particular
267error.  These callouts can vary based on system type, as well as a user
268specified AdditionalData property field.   Callouts will be added to the PEL in
269the order they are listed in the JSON.  If a callout is passed into the error,
270say with CALLOUT_INVENTORY_PATH, then that callout will be added to the PEL
271before the callouts in the registry.
272
273There is room for up to 10 callouts in a PEL.
274
275#### Callouts example based on the system type
276
277```
278"Callouts":
279[
280    {
281        "System": "system1",
282        "CalloutList":
283        [
284            {
285                "Priority": "high",
286                "LocCode": "P1-C1"
287            },
288            {
289                "Priority": "low",
290                "LocCode": "P1"
291            }
292        ]
293    },
294    {
295        "CalloutList":
296        [
297            {
298                "Priority": "high",
299                "Procedure": "SVCDOCS"
300            }
301        ]
302
303    }
304]
305
306```
307
308The above example shows that on system 'system1', the FRU at location P1-C1
309will be called out with a priority of high, and the FRU at P1 with a priority
310of low.  On every other system, the maintenance procedure SVCDOCS is called
311out.
312
313#### Callouts example based on an AdditionalData field
314
315```
316"CalloutsUsingAD":
317{
318    "ADName": "PROC_NUM",
319    "CalloutsWithTheirADValues":
320    [
321        {
322            "ADValue": "0",
323            "Callouts":
324            [
325                {
326                    "CalloutList":
327                    [
328                        {
329                            "Priority": "high",
330                            "LocCode": "P1-C5"
331                        }
332                    ]
333                }
334            ]
335        },
336        {
337            "ADValue": "1",
338            "Callouts":
339            [
340                {
341                    "CalloutList":
342                    [
343                        {
344                            "Priority": "high",
345                            "LocCode": "P1-C6"
346                        }
347                    ]
348                }
349            ]
350        }
351    ]
352}
353
354```
355
356This example shows that the callouts were selected based on the 'PROC_NUM'
357AdditionalData field.  When PROC_NUM was 0, the FRU at P1-C5 was called out.
358When it was 1, P1-C6 was called out.  Note that the same 'Callouts' array is
359used as in the previous example, so these callouts can also depend on the
360system type.
361
362#### CalloutType
363This field can be used to modify the failing component type field in the
364callout when the default doesn\'t fit:
365
366```
367{
368
369    "Priority": "high",
370    "Procedure": "FIXIT22"
371    "CalloutType": "config_procedure"
372}
373```
374
375The defaults are:
376- Normal hardware FRU: hardware_fru
377- Symbolic FRU: symbolic_fru
378- Procedure: maint_procedure
379
380#### Symbolic FRU callouts with dynamic trusted location codes
381
382A special case is when one wants to use a symbolic FRU callout with a trusted
383location code, but the location code to use isn\'t known until runtime. This
384means it can\'t be specified using the 'LocCode' key in the registry.
385
386In this case, one should use the 'SymbolicFRUTrusted' key along with the
387'UseInventoryLocCode' key, and then pass in the inventory item that has the
388desired location code using the 'CALLOUT_INVENTORY_PATH' entry inside of the
389AdditionalData property.  The code will then look up the location code for that
390passed in inventory FRU and place it in the symbolic FRU callout.  The normal
391FRU callout with that inventory item will not be created.  The symbolic FRU
392must be the first callout in the registry for this to work.
393
394```
395{
396
397    "Priority": "high",
398    "SymbolicFRUTrusted": "AIR_MOVR",
399    "UseInventoryLocCode": true
400}
401```
402
403## Modifying and Testing
404
405The general process for adding new entries to the message registry is:
406
4071. Update message_registry.json to add the new errors.
4082. If a new component ID is used (usually the first byte of the SRC reason
409   code), document it in ComponentIDs.md.
4103. Validate the file. It must be valid JSON and obey the schema.  The
411   `process_registry.py` script in `extensions/openpower-pels/registry/tools`
412   will validate both, though it requires the python-jsonschema package to do
413   the schema validation.  This script is also run to validate the message
414   registry as part of CI testing.
415
416```
417 ./tools/process_registry.py -v -s schema/schema.json -r message_registry.json
418```
419
4204. One can test what PELs are generated from these new entries without writing
421   any code to create the corresponding event logs:
422    1. Copy the modified message_registry.json into `/etc/phosphor-logging/` on
423       the BMC. That directory may need to be created.
424    2. Use busctl to call the Create method to create an event log
425       corresponding to the message registry entry under test.
426
427```
428busctl call xyz.openbmc_project.Logging /xyz/openbmc_project/logging \
429xyz.openbmc_project.Logging.Create Create ssa{ss} \
430xyz.openbmc_project.Common.Error.Timeout \
431xyz.openbmc_project.Logging.Entry.Level.Error 1 "TIMEOUT_IN_MSEC" "5"
432```
433
434    3. Check the PEL that was created using peltool.
435    4. When finished, delete the file from `/etc/phosphor-logging/`.
436