1# OpenPower Platform Event Log (PEL) extension
2
3This extension will create PELs for every OpenBMC event log. It is also
4possible to point to the raw PEL to use in the OpenBMC event, and then that
5will be used instead of creating one.
6
7## Contents
8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log)
9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels)
10* [The PEL Message Registry](#the-pel-message-registry)
11* [Callouts](#callouts)
12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules)
13* [D-Bus Interfaces](#d-bus-interfaces)
14* [PEL Retention](#pel-retention)
15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing)
16* [Fail Boot on Host Errors](#fail-boot-on-host-errors)
17
18## Passing PEL related data within an OpenBMC event log
19
20An error log creator can pass in data that is relevant to a PEL by using
21certain keywords in the AdditionalData property of the event log.
22
23### AdditionalData keywords
24
25#### RAWPEL
26
27This keyword is used to point to an existing PEL in a binary file that should
28be associated with this event log.  The syntax is:
29```
30RAWPEL=<path to PEL File>
31e.g.
32RAWPEL="/tmp/pels/pel.5"
33```
34The code will assign its own error log ID to this PEL, and also update the
35commit timestamp field to the current time.
36
37#### POWER_THERMAL_CRITICAL_FAULT
38
39This keyword is used to set the power fault bit in PEL. The syntax is:
40```
41POWER_THERMAL_CRITICAL_FAULT=<FLAG>
42e.g.
43POWER_THERMAL_CRITICAL_FAULT=TRUE
44```
45
46Note that TRUE is the only value supported.
47
48#### SEVERITY_DETAIL
49
50This is used when the passed in event log severity determines the PEL
51severity and a more granular PEL severity is needed beyond what the normal
52event log to PEL severity conversion could give.
53
54The syntax is:
55```
56SEVERITY_DETAIL=<SEVERITY_TYPE>
57e.g.
58SEVERITY_DETAIL=SYSTEM_TERM
59```
60Option Supported:
61- SYSTEM_TERM, changes the Severity value from 0x50 to 0x51
62
63#### ESEL
64
65This keyword's data contains a full PEL in string format.  This is how hostboot
66sends down PELs when it is configured in IPMI communication mode.  The PEL is
67handled just like the PEL obtained using the RAWPEL keyword.
68
69The syntax is:
70
71```
72ESEL=
73"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..."
74```
75
76Note that there are 16 bytes of IPMI SEL data before the PEL data starts.
77
78#### _PID
79
80This keyword that contains the application's PID is added automatically by the
81phosphor-logging daemon when the `commit` or `report` APIs are used to create
82an event log, but not when the `Create` D-Bus method is used.  If a caller of
83the `Create` API wishes to have their PID captured in the PEL this should be
84used.
85
86This will be added to the PEL in a section of type User Data (UD), along with
87the application name it corresponds to.
88
89The syntax is:
90```
91_PID=<PID of application>
92e.g.
93_PID="12345"
94```
95
96#### CALLOUT_INVENTORY_PATH
97
98This is used to pass in an inventory item to use as a callout.  See [here for
99details](#passing-callouts-in-with-the-additionaldata-property)
100
101#### CALLOUT_PRIORITY
102
103This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of
104that FRU callout. If not specified, the default priority is "H"/High Priority.
105
106The possible values are:
107- "H": High Priority
108- "M": Medium Priority
109- "L": Low Priority
110
111See [here for details](#passing-callouts-in-with-the-additionaldata-property)
112
113#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
114
115This is used to pass in a device path to create callouts from.  See [here for
116details](#passing-callouts-in-with-the-additionaldata-property)
117
118#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
119
120This is used to pass in an I2C bus and address to create callouts from.  See
121[here for details](#passing-callouts-in-with-the-additionaldata-property)
122
123### FFDC Intended For UserData PEL sections
124
125When one needs to add FFDC into the PEL UserData sections, the
126`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create`
127interface must be used when creating a new event log. This method takes a list
128of files to store in the PEL UserData sections.
129
130That API is the same as the 'Create' one, except it has a new parameter:
131
132```
133std::vector<std::tuple<enum[FFDCFormat],
134                       uint8_t,
135                       uint8_t,
136                       sdbusplus::message::unix_fd>>
137```
138
139Each entry in the vector contains a file descriptor for a file that will
140be stored in a unique UserData section.  The tuple's arguments are:
141
142- enum[FFDCFormat]: The data format type, the options are:
143    - 'JSON'
144        - The parser will use nlohmann::json\'s pretty print
145    - 'CBOR'
146        - The parser will use nlohmann::json\'s pretty print
147    - 'Text'
148        - The parser will output ASCII text
149    - 'Custom'
150        - The parser will hexdump the data, unless there is a parser registered
151          for this component ID and subtype.
152- uint8_t: subType
153  - Useful for the 'custom' type.  Not used with the other types.
154- uint8_t: version
155  - The version of the data.
156  - Used for the custom type.
157  - Not planning on using for JSON/BSON unless a reason to do so appears.
158- unixfd - The file descriptor for the opened file that contains the
159    contents.  The file descriptor can be closed and the file can be deleted if
160    desired after the method call.
161
162An example of saving JSON data to a file and getting its file descriptor is:
163
164```
165nlohmann::json json = ...;
166auto jsonString = json.dump();
167FILE* fp = fopen(filename, "w");
168fwrite(jsonString.data(), 1, jsonString.size(), fp);
169int fd = fileno(fp);
170```
171
172Alternatively, 'open()' can be used to obtain the file descriptor of the file.
173
174Upon receiving this data, the PEL code will create UserData sections for each
175entry in that vector with the following UserData fields:
176
177- Section header component ID:
178    - If the type field from the tuple is "custom", use the component ID from
179      the message registry.
180    - Otherwise, set the component ID to the phosphor-logging component ID so
181      that the parser knows to use the built in parsers (e.g. json) for the
182      type.
183- Section header subtype: The subtype field from the tuple.
184- Section header version: The version field from the tuple.
185- Section data: The data from the file.
186
187If there is a peltool parser registered for the custom type (method is TBD),
188that will be used by peltool to print the data, otherwise it will be hexdumped.
189
190Before adding each of these UserData sections, a check will be done to see if
191the PEL size will remain under the maximum size of 16KB.  If not, the UserData
192section will be truncated down enough so that it will fit into the 16KB.
193
194## Default UserData sections for BMC created PELs
195
196The extension code that creates PELs will add these UserData sections to every
197PEL:
198
199- The AdditionalData property contents
200  - If the AdditionalData property in the OpenBMC event log has anything in it,
201    it will be saved in a UserData section as a JSON string.
202
203- System information
204  - This section contains various pieces of system information, such as the
205    full code level and the BMC, chassis, and host state properties.
206
207## The PEL Message Registry
208
209The PEL message registry is used to create PELs from OpenBMC event logs.
210Documentation can be found [here](registry/README.md).
211
212## Callouts
213
214A callout points to a FRU, a symbolic FRU, or an isolation procedure.  There
215can be from zero to ten of them in each PEL, where they are located in the SRC
216section.
217
218There are a few different ways to add callouts to a PEL.  In all cases, the
219callouts will be sorted from highest to lowest priority within the PEL after
220they are added.
221
222### Passing callouts in with the AdditionalData property
223
224The PEL code can add callouts based on the values of special entries in the
225AdditionalData event log property.  They are:
226
227- CALLOUT_INVENTORY_PATH
228
229    This keyword is used to call out a single FRU by passing in its D-Bus
230    inventory path.  When the PEL code sees this, it will create a single FRU
231    callout, using the VPD properties (location code, FN, CCIN) from that
232    inventory item.  If that item is not a FRU itself and does not have a
233    location code, it will keep searching its parents until it finds one that
234    is.
235
236    The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY
237    keyword is also present and contains a different priority in which case it
238    will be used instead.  This can be useful when a maintenance procedure with
239    a high priority callout is specified for this error in the message registry
240    and the FRU callout needs to have a different priority.
241
242    ```
243    CALLOUT_INVENTORY_PATH=
244    "/xyz/openbmc_project/inventory/system/chassis/motherboard"
245    ```
246
247- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
248
249    These keywords are required as a pair to indicate faulty device
250    communication, usually detected by a failure accessing a device at that
251    sysfs path.  The PEL code will use a data table generated by the MRW to map
252    these device paths to FRU callout lists.  The errno value may influence the
253    callout.
254
255    I2C, FSI, FSI-I2C, and FSI-SPI paths are supported.
256
257    ```
258    CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069"
259    CALLOUT_ERRNO="2"
260    ```
261
262- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
263
264    These 3 keywords can be used to callout a failing I2C device path when the
265    full device path isn't known.  It is similar to CALLOUT_DEVICE_PATH in that
266    it will use data tables generated by the MRW to create the callouts.
267
268    CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or
269    just the bus number by itself.
270    CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number
271    if preceded with a "0x".
272
273    ```
274    CALLOUT_IIC_BUS="/dev/i2c-7"
275    CALLOUT_IIC_ADDR="81"
276    CALLOUT_ERRNO=62
277    ```
278
279### Defining callouts in the message registry
280
281Callouts can be completely defined inside that error's definition in the PEL
282message registry.  This method allows the callouts to vary based on the system
283type or on any AdditionalData item.
284
285At a high level, this involves defining a callout section inside the registry
286entry that contain the location codes or procedure names to use, along with
287their priority.  If these can vary based on system type, the type provided by
288the entity manager will be one of the keys.  If they can also depend on an
289AdditionalData entry, then that will also be a key.
290
291See the message registry [README](registry/README.md) and
292[schema](registry/schema/schema.json) for the details.
293
294### Using the message registry along with CALLOUT_ entries
295
296If the message registry entry contains a callout definition and the event log
297also contains one of aforementioned CALLOUT keys in the AdditionalData
298property, then the PEL code will first add the callouts stemming from the
299CALLOUT items, followed by the callouts from the message registry.
300
301### Specifying multiple callouts using JSON format FFDC files
302
303Multiple callouts can be passed in by the creator at the time of PEL creation.
304This is done by specifying them in a JSON file that is then passed in as an
305[FFDC file](#ffdc-intended-for-userdata-pel-sections).  The JSON will still be
306added into a PEL UserData section for debug.
307
308To specify that an FFDC file contains callouts, the format value for that FFDC
309entry must be set to JSON, and the subtype field must be set to 0xCA:
310
311```
312using FFDC = std::tuple<CreateIface::FFDCFormat,
313                        uint8_t,
314                        uint8_t,
315                        sdbusplus::message::unix_fd>;
316
317FFDC ffdc{
318    CreateIface::FFDCFormat::JSON,
319    0xCA, // Callout subtype
320    0x01, // Callout version, set to 0x01
321    fd};
322```
323
324The JSON contains an array of callouts that must be in order of highest
325priority to lowest, with a maximum of 10.  Any callouts after the 10th will
326just be thrown away as there is no room for them in the PEL. The format looks
327like:
328
329```
330[
331    {
332        // First callout
333    },
334    {
335        // Second callout
336    },
337    {
338        // Nth callout
339    }
340]
341```
342
343A callout entry can be a normal hardware callout, a maintenance procedure
344callout, or a symbolic FRU callout.  Each callout must contain a Priority
345field, where the possible values are:
346
347* "H" = High
348* "M" = Medium
349* "A" = Medium Group A
350* "B" = Medium Group B
351* "C" = Medium Group C
352* "L" = Low
353
354Either unexpanded location codes or D-Bus inventory object paths can be used to
355specify the called out part.  An unexpanded location code does not have the
356system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so
357can be either Ufcs-P1 or just P1).
358
359#### Normal hardware FRU callout
360
361Normal hardware callouts must contain either the location code or inventory
362path, and priority.  Even though the PEL code doesn't do any guarding or
363deconfiguring itself, it needs to know if either of those things occurred as
364there are status bits in the PEL to reflect them.  The Guarded and Deconfigured
365fields are used for this.  Those fields are optional and if omitted then their
366values will be false.
367
368When the inventory path of a sub-FRU is passed in, the PEL code will put the
369location code of the parent FRU into the callout.
370
371```
372{
373    "LocationCode": "P0-C1",
374    "Priority": "H"
375}
376
377{
378    "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5",
379    "Priority": "H",
380    "Deconfigured": true,
381    "Guarded": true
382}
383
384```
385
386MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally
387be added to callouts to specify failing devices on a FRU.  These may be used
388during the manufacturing test process, where there may be the ability to do
389these replacements.  There can be up to 15 MRUs, each with its own priority,
390embedded in a callout.  The possible priority values match the FRU priority
391values.
392
393Note that since JSON only supports numbers in decimal and not in hex, MRU IDs
394will show up as decimal when visually inspecting the JSON.
395
396```
397{
398    "LocationCode": "P0-C1",
399    "Priority": "H",
400    "MRUs": [
401        {
402            "ID": 1234,
403            "Priority": "H"
404        },
405        {
406            "ID": 5678,
407            "Priority": "H"
408        }
409    ]
410}
411```
412
413#### Maintenance procedure callout
414
415The LocationCode field is not used with procedure callouts.  Only the first 7
416characters of the Procedure field will be used by the PEL.
417
418```
419{
420    "Procedure": "PRONAME",
421    "Priority": "H"
422}
423```
424
425#### Symbolic FRU callout
426
427Only the first seven characters of the SymbolicFRU field will be used by the PEL.
428
429If the TrustedLocationCode field is present and set to true, this means the
430location code may be used to turn on service indicators, so the LocationCode
431field is required.  If TrustedLocationCode is false or missing, then the
432LocationCode field is optional.
433
434```
435{
436    "TrustedLocationCode": true,
437    "Location Code": "P0-C1",
438    "Priority": "H",
439    "SymbolicFRU": "FRUNAME"
440}
441```
442
443## `Action Flags` and `Event Type` Rules
444
445The `Action Flags` and `Event Type` PEL fields are optional in the message
446registry, and if not present the code will set them based on certain rules
447layed out in the PEL spec.
448
449These rules are:
4501. Always set the `Report` flag, unless the `Do Not Report` flag is already on.
4512. Always clear the `SP Call Home` flag, as that feature isn't supported.
4523. If the severity is `Non-error Event`:
453    - Clear the `Service Action` flag.
454    - Clear the `Call Home` flag.
455    - If the `Event Type` field is `Not Applicable`, change it to `Information
456      Only`.
457    - If the `Event Type` field is `Information Only` or `Tracing`, set the
458      `Hidden` flag.
4594. If the severity is `Recovered`:
460    - Set the `Hidden` flag.
461    - Clear the `Service Action` flag.
462    - Clear the `Call Home` flag.
4635. For all other severities:
464    - Clear the `Hidden` flag.
465    - Set the `Service Action` flag.
466    - Set the `Call Home` flag.
467
468Additional rules may be added in the future if necessary.
469
470## D-Bus Interfaces
471
472See the org.open_power.Logging.PEL interface definition for the most up to date
473information.
474
475## PEL Retention
476
477The PEL repository is allocated a set amount of space on the BMC.  When that
478space gets close to being full, the code will remove a percentage of PELs to
479make room for new ones.  In addition, the code will keep a cap on the total
480number of PELs allowed.  Note that removing a PEL will also remove the
481corresponding OpenBMC event log.
482
483The disk capacity limit is set to 20MB, and the number limit is 3000.
484
485The rules used to remove logs are listed below.  The checks will be run after a
486PEL has been added and the method to create the PEL has returned to the caller,
487i.e. run when control gets back to the event loop.
488
489### Removal Algorithm
490
491If the size used is 95% or under of the allocated space and under the limit on
492the number of PELs, nothing further needs to be done, otherwise continue and
493run all 5 of the following steps.  Each step itself only deletes PELs until it
494meets its requirement and then it stops.
495
496The steps are:
497
4981. Remove BMC created informational PELs until they take up 15% or less of the
499   allocated space.
500
5012. Remove BMC created non-informational PELs until they take up 30% or less of
502   the allocated space.
503
5043. Remove non-BMC created informational PELs until they take up 15% or less of
505   the allocated space.
506
5074. Remove non-BMC created non-informational PELs until they take up 30% or less
508   of the allocated space.
509
5105. After the previous 4 steps are complete, if there are still more than the
511   maximum number of PELs, remove PELs down to 80% of the maximum.
512
513PELs with associated guard records will never be deleted.  Each step above
514makes the following 4 passes, stopping as soon as its limit is reached:
515
516Pass 1. Remove HMC acknowledged PELs.<br>
517Pass 2. Remove OS acknowledged PELs.<br>
518Pass 3. Remove PHYP acknowledged PELs.<br>
519Pass 4. Remove all PELs.
520
521After all these steps, disk capacity will be at most 90% (15% + 30% + 15% +
52230%).
523
524## Adding python3 modules for PEL UserData and SRC parsing
525
526In order to support python3 modules for the parsing of PEL User Data sections
527and to decode SRC data, setuptools is used to import python3 packages from
528external repos to be included in the OpenBMC image.
529```
530Sample layout for setuptools:
531
532setup.py
533src/usr/scom/plugins/ebmc/b0300.py
534src/usr/i2c/plugins/ebmc/b0700.py
535src/build/tools/ebmc/errludP_Helpers.py
536```
537
538`setup.py` is the build script for setuptools. It contains information about the
539package (such as the name and version) as well as which code files to include.
540
541The setup.py template to be used for eBMC User Data parsers:
542```
543import os.path
544from setuptools import setup
545
546# To update this dict with new key/value pair for every component added
547# Key: The package name to be installed as
548# Value: The path containing the package's python modules
549dirmap = {
550    "b0300": "src/usr/scom/plugins/ebmc",
551    "b0700": "src/usr/i2c/plugins/ebmc",
552    "helpers": "src/build/tools/ebmc"
553}
554
555# All packages will be installed under 'udparsers' namespace
556def get_package_name(dirmap_key):
557    return "udparsers.{}".format(dirmap_key)
558
559def get_package_dirent(dirmap_item):
560    package_name = get_package_name(dirmap_item[0])
561    package_dir = dirmap_item[1]
562    return (package_name, package_dir)
563
564def get_packages():
565    return map(get_package_name, dirmap.keys())
566
567def get_package_dirs():
568    return map(get_package_dirent, dirmap.items())
569
570setup(
571        name="Hostboot",
572        version="0.1",
573        packages=list(get_packages()),
574        package_dir=dict(get_package_dirs())
575)
576```
577- User Data parser module
578  - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the
579    Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID
580    from the User Data section itself (in HEX). All should be converted to
581    lowercase.
582    - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100
583  - Function to provide: `parseUDToJson`
584    - Argument list:
585      1. (int) Sub-section type
586      2. (int) Section version
587      3. (memoryview): Data
588    - Return data:
589      1. (str) JSON string
590
591  - Sample User Data parser module:
592    ```
593    import json
594    def parseUDToJson(subType, ver, data):
595        d = dict()
596        ...
597        # Parse and populate data into dictionary
598        ...
599        jsonStr = json.dumps(d)
600        return jsonStr
601    ```
602- SRC parser module
603  - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the
604    Private Header section (in ASCII, converted to lowercase).
605    - For example: `bsrc.py` for Hostboot generated SRCs
606  - Function to provide: `parseSRCToJson`
607    - Argument list:
608      1. (str) Refcode ASCII string
609      2. (str) Hexword 2
610      3. (str) Hexword 3
611      4. (str) Hexword 4
612      5. (str) Hexword 5
613      6. (str) Hexword 6
614      7. (str) Hexword 7
615      8. (str) Hexword 8
616      9. (str) Hexword 9
617    - Return data:
618      1. (str) JSON string
619
620  - Sample SRC parser module:
621    ```
622    import json
623    def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \
624                       word8, word9):
625        d = dict()
626        ...
627        # Decode SRC data into dictionary
628        ...
629        jsonStr = json.dumps(d)
630        return jsonStr
631    ```
632
633## Fail Boot on Host Errors
634
635The fail boot on hw error [design][1] provides a function where a system owner
636can tell the firmware to fail the boot of a system if a BMC phosphor-logging
637event has a hardware callout in it.
638
639It is required that when this fail boot on hardware error setting is enabled,
640that the BMC fail the boot for **any** error from the host which satisfies the
641following criteria:
642- not SeverityType::nonError
643- has a callout of any kind from the `FailingComponentType` structure
644
645[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md
646