1# OpenPower Platform Event Log (PEL) extension
2
3This extension will create PELs for every OpenBMC event log. It is also
4possible to point to the raw PEL to use in the OpenBMC event, and then that
5will be used instead of creating one.
6
7## Contents
8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log)
9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels)
10* [The PEL Message Registry](#the-pel-message-registry)
11* [Callouts](#callouts)
12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules)
13* [D-Bus Interfaces](#d-bus-interfaces)
14* [PEL Retention](#pel-retention)
15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing)
16* [Fail Boot on Host Errors](#fail-boot-on-host-errors)
17
18## Passing PEL related data within an OpenBMC event log
19
20An error log creator can pass in data that is relevant to a PEL by using
21certain keywords in the AdditionalData property of the event log.
22
23### AdditionalData keywords
24
25#### RAWPEL
26
27This keyword is used to point to an existing PEL in a binary file that should
28be associated with this event log.  The syntax is:
29```
30RAWPEL=<path to PEL File>
31e.g.
32RAWPEL="/tmp/pels/pel.5"
33```
34The code will assign its own error log ID to this PEL, and also update the
35commit timestamp field to the current time.
36
37#### ESEL
38
39This keyword's data contains a full PEL in string format.  This is how hostboot
40sends down PELs when it is configured in IPMI communication mode.  The PEL is
41handled just like the PEL obtained using the RAWPEL keyword.
42
43The syntax is:
44
45```
46ESEL=
47"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..."
48```
49
50Note that there are 16 bytes of IPMI SEL data before the PEL data starts.
51
52#### _PID
53
54This keyword that contains the application's PID is added automatically by the
55phosphor-logging daemon when the `commit` or `report` APIs are used to create
56an event log, but not when the `Create` D-Bus method is used.  If a caller of
57the `Create` API wishes to have their PID captured in the PEL this should be
58used.
59
60This will be added to the PEL in a section of type User Data (UD), along with
61the application name it corresponds to.
62
63The syntax is:
64```
65_PID=<PID of application>
66e.g.
67_PID="12345"
68```
69
70#### CALLOUT_INVENTORY_PATH
71
72This is used to pass in an inventory item to use as a callout.  See [here for
73details](#passing-callouts-in-with-the-additionaldata-property)
74
75#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
76
77This is used to pass in a device path to create callouts from.  See [here for
78details](#passing-callouts-in-with-the-additionaldata-property)
79
80#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
81
82This is used to pass in an I2C bus and address to create callouts from.  See
83[here for details](#passing-callouts-in-with-the-additionaldata-property)
84
85### FFDC Intended For UserData PEL sections
86
87When one needs to add FFDC into the PEL UserData sections, the
88`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create`
89interface must be used when creating a new event log. This method takes a list
90of files to store in the PEL UserData sections.
91
92That API is the same as the 'Create' one, except it has a new parameter:
93
94```
95std::vector<std::tuple<enum[FFDCFormat],
96                       uint8_t,
97                       uint8_t,
98                       sdbusplus::message::unix_fd>>
99```
100
101Each entry in the vector contains a file descriptor for a file that will
102be stored in a unique UserData section.  The tuple's arguments are:
103
104- enum[FFDCFormat]: The data format type, the options are:
105    - 'JSON'
106        - The parser will use nlohmann::json\'s pretty print
107    - 'CBOR'
108        - The parser will use nlohmann::json\'s pretty print
109    - 'Text'
110        - The parser will output ASCII text
111    - 'Custom'
112        - The parser will hexdump the data, unless there is a parser registered
113          for this component ID and subtype.
114- uint8_t: subType
115  - Useful for the 'custom' type.  Not used with the other types.
116- uint8_t: version
117  - The version of the data.
118  - Used for the custom type.
119  - Not planning on using for JSON/BSON unless a reason to do so appears.
120- unixfd - The file descriptor for the opened file that contains the
121    contents.  The file descriptor can be closed and the file can be deleted if
122    desired after the method call.
123
124An example of saving JSON data to a file and getting its file descriptor is:
125
126```
127nlohmann::json json = ...;
128auto jsonString = json.dump();
129FILE* fp = fopen(filename, "w");
130fwrite(jsonString.data(), 1, jsonString.size(), fp);
131int fd = fileno(fp);
132```
133
134Alternatively, 'open()' can be used to obtain the file descriptor of the file.
135
136Upon receiving this data, the PEL code will create UserData sections for each
137entry in that vector with the following UserData fields:
138
139- Section header component ID:
140    - If the type field from the tuple is "custom", use the component ID from
141      the message registry.
142    - Otherwise, set the component ID to the phosphor-logging component ID so
143      that the parser knows to use the built in parsers (e.g. json) for the
144      type.
145- Section header subtype: The subtype field from the tuple.
146- Section header version: The version field from the tuple.
147- Section data: The data from the file.
148
149If there is a peltool parser registered for the custom type (method is TBD),
150that will be used by peltool to print the data, otherwise it will be hexdumped.
151
152Before adding each of these UserData sections, a check will be done to see if
153the PEL size will remain under the maximum size of 16KB.  If not, the UserData
154section will be truncated down enough so that it will fit into the 16KB.
155
156## Default UserData sections for BMC created PELs
157
158The extension code that creates PELs will add these UserData sections to every
159PEL:
160
161- The AdditionalData property contents
162  - If the AdditionalData property in the OpenBMC event log has anything in it,
163    it will be saved in a UserData section as a JSON string.
164
165- System information
166  - This section contains various pieces of system information, such as the
167    full code level and the BMC, chassis, and host state properties.
168
169## The PEL Message Registry
170
171The PEL message registry is used to create PELs from OpenBMC event logs.
172Documentation can be found [here](registry/README.md).
173
174## Callouts
175
176A callout points to a FRU, a symbolic FRU, or an isolation procedure.  There
177can be from zero to ten of them in each PEL, where they are located in the SRC
178section.
179
180There are a few different ways to add callouts to a PEL:
181
182### Passing callouts in with the AdditionalData property
183
184The PEL code can add callouts based on the values of special entries in the
185AdditionalData event log property.  They are:
186
187- CALLOUT_INVENTORY_PATH
188
189    This keyword is used to call out a single FRU by passing in its D-Bus
190    inventory path.  When the PEL code sees this, it will create a single high
191    priority FRU callout, using the VPD properties (location code, FN, CCIN)
192    from that inventory item.  If that item is not a FRU itself and does not
193    have a location code, it will keep searching its parents until it finds one
194    that is.
195
196    ```
197    CALLOUT_INVENTORY_PATH=
198    "/xyz/openbmc_project/inventory/system/chassis/motherboard"
199    ```
200
201- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
202
203    These keywords are required as a pair to indicate faulty device
204    communication, usually detected by a failure accessing a device at that
205    sysfs path.  The PEL code will use a data table generated by the MRW to map
206    these device paths to FRU callout lists.  The errno value may influence the
207    callout.
208
209    I2C, FSI, FSI-I2C, and FSI-SPI paths are supported.
210
211    ```
212    CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069"
213    CALLOUT_ERRNO="2"
214    ```
215
216- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
217
218    These 3 keywords can be used to callout a failing I2C device path when the
219    full device path isn't known.  It is similar to CALLOUT_DEVICE_PATH in that
220    it will use data tables generated by the MRW to create the callouts.
221
222    CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or
223    just the bus number by itself.
224    CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number
225    if preceded with a "0x".
226
227    ```
228    CALLOUT_IIC_BUS="/dev/i2c-7"
229    CALLOUT_IIC_ADDR="81"
230    CALLOUT_ERRNO=62
231    ```
232
233### Defining callouts in the message registry
234
235Callouts can be completely defined inside that error's definition in the PEL
236message registry.  This method allows the callouts to vary based on the system
237type or on any AdditionalData item.
238
239At a high level, this involves defining a callout section inside the registry
240entry that contain the location codes or procedure names to use, along with
241their priority.  If these can vary based on system type, the type provided by
242the entity manager will be one of the keys.  If they can also depend on an
243AdditionalData entry, then that will also be a key.
244
245See the message registry [README](registry/README.md) and
246[schema](registry/schema/schema.json) for the details.
247
248### Using the message registry along with CALLOUT_ entries
249
250If the message registry entry contains a callout definition and the event log
251also contains one of aforementioned CALLOUT keys in the AdditionalData
252property, then the PEL code will first add the callouts stemming from the
253CALLOUT items, followed by the callouts from the message registry.
254
255### Specifying multiple callouts using JSON format FFDC files
256
257Multiple callouts can be passed in by the creator at the time of PEL creation.
258This is done by specifying them in a JSON file that is then passed in as an
259[FFDC file](#ffdc-intended-for-userdata-pel-sections).  The JSON will still be
260added into a PEL UserData section for debug.
261
262To specify that an FFDC file contains callouts, the format value for that FFDC
263entry must be set to JSON, and the subtype field must be set to 0xCA:
264
265```
266using FFDC = std::tuple<CreateIface::FFDCFormat,
267                        uint8_t,
268                        uint8_t,
269                        sdbusplus::message::unix_fd>;
270
271FFDC ffdc{
272    CreateIface::FFDCFormat::JSON,
273    0xCA, // Callout subtype
274    0x01, // Callout version, set to 0x01
275    fd};
276```
277
278The JSON contains an array of callouts that must be in order of highest
279priority to lowest, with a maximum of 10.  Any callouts after the 10th will
280just be thrown away as there is no room for them in the PEL. The format looks
281like:
282
283```
284[
285    {
286        // First callout
287    },
288    {
289        // Second callout
290    },
291    {
292        // Nth callout
293    }
294]
295```
296
297A callout entry can be a normal hardware callout, a maintenance procedure
298callout, or a symbolic FRU callout.  Each callout must contain a Priority
299field, where the possible values are:
300
301* "H" = High
302* "M" = Medium
303* "A" = Medium Group A
304* "B" = Medium Group B
305* "C" = Medium Group C
306* "L" = Low
307
308Either unexpanded location codes or D-Bus inventory object paths can be used to
309specify the called out part.  An unexpanded location code does not have the
310system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so
311can be either Ufcs-P1 or just P1).
312
313#### Normal hardware FRU callout
314
315Normal hardware callouts must contain either the location code or inventory
316path, and priority.  Even though the PEL code doesn't do any guarding or
317deconfiguring itself, it needs to know if either of those things occurred as
318there are status bits in the PEL to reflect them.  The Guarded and Deconfigured
319fields are used for this.  Those fields are optional and if omitted then their
320values will be false.
321
322When the inventory path of a sub-FRU is passed in, the PEL code will put the
323location code of the parent FRU into the callout.
324
325```
326{
327    "LocationCode": "P0-C1",
328    "Priority": "H"
329}
330
331{
332    "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5",
333    "Priority": "H",
334    "Deconfigured": true,
335    "Guarded": true
336}
337
338```
339
340MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally
341be added to callouts to specify failing devices on a FRU.  These may be used
342during the manufacturing test process, where there may be the ability to do
343these replacements.  There can be up to 15 MRUs, each with its own priority,
344embedded in a callout.  The possible priority values match the FRU priority
345values.
346
347Note that since JSON only supports numbers in decimal and not in hex, MRU IDs
348will show up as decimal when visually inspecting the JSON.
349
350```
351{
352    "LocationCode": "P0-C1",
353    "Priority": "H",
354    "MRUs": [
355        {
356            "ID": 1234,
357            "Priority": "H"
358        },
359        {
360            "ID": 5678,
361            "Priority": "H"
362        }
363    ]
364}
365```
366
367#### Maintenance procedure callout
368
369The LocationCode field is not used with procedure callouts.  Only the first 7
370characters of the Procedure field will be used by the PEL.
371
372```
373{
374    "Procedure": "PRONAME",
375    "Priority": "H"
376}
377```
378
379#### Symbolic FRU callout
380
381Only the first seven characters of the SymbolicFRU field will be used by the PEL.
382
383If the TrustedLocationCode field is present and set to true, this means the
384location code may be used to turn on service indicators, so the LocationCode
385field is required.  If TrustedLocationCode is false or missing, then the
386LocationCode field is optional.
387
388```
389{
390    "TrustedLocationCode": true,
391    "Location Code": "P0-C1",
392    "Priority": "H",
393    "SymbolicFRU": "FRUNAME"
394}
395```
396
397## `Action Flags` and `Event Type` Rules
398
399The `Action Flags` and `Event Type` PEL fields are optional in the message
400registry, and if not present the code will set them based on certain rules
401layed out in the PEL spec.
402
403These rules are:
4041. Always set the `Report` flag, unless the `Do Not Report` flag is already on.
4052. Always clear the `SP Call Home` flag, as that feature isn't supported.
4063. If the severity is `Non-error Event`:
407    - Clear the `Service Action` flag.
408    - Clear the `Call Home` flag.
409    - If the `Event Type` field is `Not Applicable`, change it to `Information
410      Only`.
411    - If the `Event Type` field is `Information Only` or `Tracing`, set the
412      `Hidden` flag.
4134. If the severity is `Recovered`:
414    - Set the `Hidden` flag.
415    - Clear the `Service Action` flag.
416    - Clear the `Call Home` flag.
4175. For all other severities:
418    - Clear the `Hidden` flag.
419    - Set the `Service Action` flag.
420    - Set the `Call Home` flag.
421
422Additional rules may be added in the future if necessary.
423
424## D-Bus Interfaces
425
426See the org.open_power.Logging.PEL interface definition for the most up to date
427information.
428
429## PEL Retention
430
431The PEL repository is allocated a set amount of space on the BMC.  When that
432space gets close to being full, the code will remove a percentage of PELs to
433make room for new ones.  In addition, the code will keep a cap on the total
434number of PELs allowed.  Note that removing a PEL will also remove the
435corresponding OpenBMC event log.
436
437The disk capacity limit is set to 20MB, and the number limit is 3000.
438
439The rules used to remove logs are listed below.  The checks will be run after a
440PEL has been added and the method to create the PEL has returned to the caller,
441i.e. run when control gets back to the event loop.
442
443### Removal Algorithm
444
445If the size used is 95% or under of the allocated space and under the limit on
446the number of PELs, nothing further needs to be done, otherwise continue and
447run all 5 of the following steps.  Each step itself only deletes PELs until it
448meets its requirement and then it stops.
449
450The steps are:
451
4521. Remove BMC created informational PELs until they take up 15% or less of the
453   allocated space.
454
4552. Remove BMC created non-informational PELs until they take up 30% or less of
456   the allocated space.
457
4583. Remove non-BMC created informational PELs until they take up 15% or less of
459   the allocated space.
460
4614. Remove non-BMC created non-informational PELs until they take up 30% or less
462   of the allocated space.
463
4645. After the previous 4 steps are complete, if there are still more than the
465   maximum number of PELs, remove PELs down to 80% of the maximum.
466
467PELs with associated guard records will never be deleted.  Each step above
468makes the following 4 passes, stopping as soon as its limit is reached:
469
470Pass 1. Remove HMC acknowledged PELs.<br>
471Pass 2. Remove OS acknowledged PELs.<br>
472Pass 3. Remove PHYP acknowledged PELs.<br>
473Pass 4. Remove all PELs.
474
475After all these steps, disk capacity will be at most 90% (15% + 30% + 15% +
47630%).
477
478## Adding python3 modules for PEL UserData and SRC parsing
479
480In order to support python3 modules for the parsing of PEL User Data sections
481and to decode SRC data, setuptools is used to import python3 packages from
482external repos to be included in the OpenBMC image.
483```
484Sample layout for setuptools:
485
486setup.py
487src/usr/scom/plugins/ebmc/b0300.py
488src/usr/i2c/plugins/ebmc/b0700.py
489src/build/tools/ebmc/errludP_Helpers.py
490```
491
492`setup.py` is the build script for setuptools. It contains information about the
493package (such as the name and version) as well as which code files to include.
494
495The setup.py template to be used for eBMC User Data parsers:
496```
497import os.path
498from setuptools import setup
499
500# To update this dict with new key/value pair for every component added
501# Key: The package name to be installed as
502# Value: The path containing the package's python modules
503dirmap = {
504    "b0300": "src/usr/scom/plugins/ebmc",
505    "b0700": "src/usr/i2c/plugins/ebmc",
506    "helpers": "src/build/tools/ebmc"
507}
508
509# All packages will be installed under 'udparsers' namespace
510def get_package_name(dirmap_key):
511    return "udparsers.{}".format(dirmap_key)
512
513def get_package_dirent(dirmap_item):
514    package_name = get_package_name(dirmap_item[0])
515    package_dir = dirmap_item[1]
516    return (package_name, package_dir)
517
518def get_packages():
519    return map(get_package_name, dirmap.keys())
520
521def get_package_dirs():
522    return map(get_package_dirent, dirmap.items())
523
524setup(
525        name="Hostboot",
526        version="0.1",
527        packages=list(get_packages()),
528        package_dir=dict(get_package_dirs())
529)
530```
531- User Data parser module
532  - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the
533    Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID
534    from the User Data section itself (in HEX). All should be converted to
535    lowercase.
536    - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100
537  - Function to provide: `parseUDToJson`
538    - Argument list:
539      1. (int) Sub-section type
540      2. (int) Section version
541      3. (memoryview): Data
542    - Return data:
543      1. (str) JSON string
544
545  - Sample User Data parser module:
546    ```
547    import json
548    def parseUDToJson(subType, ver, data):
549        d = dict()
550        ...
551        # Parse and populate data into dictionary
552        ...
553        jsonStr = json.dumps(d)
554        return jsonStr
555    ```
556- SRC parser module
557  - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the
558    Private Header section (in ASCII, converted to lowercase).
559    - For example: `bsrc.py` for Hostboot generated SRCs
560  - Function to provide: `parseSRCToJson`
561    - Argument list:
562      1. (str) Refcode ASCII string
563      2. (str) Hexword 2
564      3. (str) Hexword 3
565      4. (str) Hexword 4
566      5. (str) Hexword 5
567      6. (str) Hexword 6
568      7. (str) Hexword 7
569      8. (str) Hexword 8
570      9. (str) Hexword 9
571    - Return data:
572      1. (str) JSON string
573
574  - Sample SRC parser module:
575    ```
576    import json
577    def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \
578                       word8, word9):
579        d = dict()
580        ...
581        # Decode SRC data into dictionary
582        ...
583        jsonStr = json.dumps(d)
584        return jsonStr
585    ```
586
587## Fail Boot on Host Errors
588
589The fail boot on hw error [design][1] provides a function where a system owner
590can tell the firmware to fail the boot of a system if a BMC phosphor-logging
591event has a hardware callout in it.
592
593It is required that when this fail boot on hardware error setting is enabled,
594that the BMC fail the boot for **any** error from the host which satisfies the
595following criteria:
596- not SeverityType::nonError
597- has a callout of any kind from the `FailingComponentType` structure
598
599[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md
600