1# OpenPower Platform Event Log (PEL) extension
2
3This extension will create PELs for every OpenBMC event log. It is also
4possible to point to the raw PEL to use in the OpenBMC event, and then that
5will be used instead of creating one.
6
7## Contents
8* [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log)
9* [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels)
10* [The PEL Message Registry](#the-pel-message-registry)
11* [Callouts](#callouts)
12* [Action Flags and Event Type Rules](#action-flags-and-event-type-rules)
13* [D-Bus Interfaces](#d-bus-interfaces)
14* [PEL Retention](#pel-retention)
15* [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing)
16* [Fail Boot on Host Errors](#fail-boot-on-host-errors)
17
18## Passing PEL related data within an OpenBMC event log
19
20An error log creator can pass in data that is relevant to a PEL by using
21certain keywords in the AdditionalData property of the event log.
22
23### AdditionalData keywords
24
25#### RAWPEL
26
27This keyword is used to point to an existing PEL in a binary file that should
28be associated with this event log.  The syntax is:
29```
30RAWPEL=<path to PEL File>
31e.g.
32RAWPEL="/tmp/pels/pel.5"
33```
34The code will assign its own error log ID to this PEL, and also update the
35commit timestamp field to the current time.
36
37#### POWER_THERMAL_CRITICAL_FAULT
38
39This keyword is used to set the power fault bit in PEL. The syntax is:
40```
41POWER_THERMAL_CRITICAL_FAULT=<FLAG>
42e.g.
43POWER_THERMAL_CRITICAL_FAULT=TRUE
44```
45
46Note that TRUE is the only value supported.
47
48#### ESEL
49
50This keyword's data contains a full PEL in string format.  This is how hostboot
51sends down PELs when it is configured in IPMI communication mode.  The PEL is
52handled just like the PEL obtained using the RAWPEL keyword.
53
54The syntax is:
55
56```
57ESEL=
58"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..."
59```
60
61Note that there are 16 bytes of IPMI SEL data before the PEL data starts.
62
63#### _PID
64
65This keyword that contains the application's PID is added automatically by the
66phosphor-logging daemon when the `commit` or `report` APIs are used to create
67an event log, but not when the `Create` D-Bus method is used.  If a caller of
68the `Create` API wishes to have their PID captured in the PEL this should be
69used.
70
71This will be added to the PEL in a section of type User Data (UD), along with
72the application name it corresponds to.
73
74The syntax is:
75```
76_PID=<PID of application>
77e.g.
78_PID="12345"
79```
80
81#### CALLOUT_INVENTORY_PATH
82
83This is used to pass in an inventory item to use as a callout.  See [here for
84details](#passing-callouts-in-with-the-additionaldata-property)
85
86#### CALLOUT_PRIORITY
87
88This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of
89that FRU callout. If not specified, the default priority is "H"/High Priority.
90
91The possible values are:
92- "H": High Priority
93- "M": Medium Priority
94- "L": Low Priority
95
96See [here for details](#passing-callouts-in-with-the-additionaldata-property)
97
98#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
99
100This is used to pass in a device path to create callouts from.  See [here for
101details](#passing-callouts-in-with-the-additionaldata-property)
102
103#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
104
105This is used to pass in an I2C bus and address to create callouts from.  See
106[here for details](#passing-callouts-in-with-the-additionaldata-property)
107
108### FFDC Intended For UserData PEL sections
109
110When one needs to add FFDC into the PEL UserData sections, the
111`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create`
112interface must be used when creating a new event log. This method takes a list
113of files to store in the PEL UserData sections.
114
115That API is the same as the 'Create' one, except it has a new parameter:
116
117```
118std::vector<std::tuple<enum[FFDCFormat],
119                       uint8_t,
120                       uint8_t,
121                       sdbusplus::message::unix_fd>>
122```
123
124Each entry in the vector contains a file descriptor for a file that will
125be stored in a unique UserData section.  The tuple's arguments are:
126
127- enum[FFDCFormat]: The data format type, the options are:
128    - 'JSON'
129        - The parser will use nlohmann::json\'s pretty print
130    - 'CBOR'
131        - The parser will use nlohmann::json\'s pretty print
132    - 'Text'
133        - The parser will output ASCII text
134    - 'Custom'
135        - The parser will hexdump the data, unless there is a parser registered
136          for this component ID and subtype.
137- uint8_t: subType
138  - Useful for the 'custom' type.  Not used with the other types.
139- uint8_t: version
140  - The version of the data.
141  - Used for the custom type.
142  - Not planning on using for JSON/BSON unless a reason to do so appears.
143- unixfd - The file descriptor for the opened file that contains the
144    contents.  The file descriptor can be closed and the file can be deleted if
145    desired after the method call.
146
147An example of saving JSON data to a file and getting its file descriptor is:
148
149```
150nlohmann::json json = ...;
151auto jsonString = json.dump();
152FILE* fp = fopen(filename, "w");
153fwrite(jsonString.data(), 1, jsonString.size(), fp);
154int fd = fileno(fp);
155```
156
157Alternatively, 'open()' can be used to obtain the file descriptor of the file.
158
159Upon receiving this data, the PEL code will create UserData sections for each
160entry in that vector with the following UserData fields:
161
162- Section header component ID:
163    - If the type field from the tuple is "custom", use the component ID from
164      the message registry.
165    - Otherwise, set the component ID to the phosphor-logging component ID so
166      that the parser knows to use the built in parsers (e.g. json) for the
167      type.
168- Section header subtype: The subtype field from the tuple.
169- Section header version: The version field from the tuple.
170- Section data: The data from the file.
171
172If there is a peltool parser registered for the custom type (method is TBD),
173that will be used by peltool to print the data, otherwise it will be hexdumped.
174
175Before adding each of these UserData sections, a check will be done to see if
176the PEL size will remain under the maximum size of 16KB.  If not, the UserData
177section will be truncated down enough so that it will fit into the 16KB.
178
179## Default UserData sections for BMC created PELs
180
181The extension code that creates PELs will add these UserData sections to every
182PEL:
183
184- The AdditionalData property contents
185  - If the AdditionalData property in the OpenBMC event log has anything in it,
186    it will be saved in a UserData section as a JSON string.
187
188- System information
189  - This section contains various pieces of system information, such as the
190    full code level and the BMC, chassis, and host state properties.
191
192## The PEL Message Registry
193
194The PEL message registry is used to create PELs from OpenBMC event logs.
195Documentation can be found [here](registry/README.md).
196
197## Callouts
198
199A callout points to a FRU, a symbolic FRU, or an isolation procedure.  There
200can be from zero to ten of them in each PEL, where they are located in the SRC
201section.
202
203There are a few different ways to add callouts to a PEL.  In all cases, the
204callouts will be sorted from highest to lowest priority within the PEL after
205they are added.
206
207### Passing callouts in with the AdditionalData property
208
209The PEL code can add callouts based on the values of special entries in the
210AdditionalData event log property.  They are:
211
212- CALLOUT_INVENTORY_PATH
213
214    This keyword is used to call out a single FRU by passing in its D-Bus
215    inventory path.  When the PEL code sees this, it will create a single FRU
216    callout, using the VPD properties (location code, FN, CCIN) from that
217    inventory item.  If that item is not a FRU itself and does not have a
218    location code, it will keep searching its parents until it finds one that
219    is.
220
221    The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY
222    keyword is also present and contains a different priority in which case it
223    will be used instead.  This can be useful when a maintenance procedure with
224    a high priority callout is specified for this error in the message registry
225    and the FRU callout needs to have a different priority.
226
227    ```
228    CALLOUT_INVENTORY_PATH=
229    "/xyz/openbmc_project/inventory/system/chassis/motherboard"
230    ```
231
232- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
233
234    These keywords are required as a pair to indicate faulty device
235    communication, usually detected by a failure accessing a device at that
236    sysfs path.  The PEL code will use a data table generated by the MRW to map
237    these device paths to FRU callout lists.  The errno value may influence the
238    callout.
239
240    I2C, FSI, FSI-I2C, and FSI-SPI paths are supported.
241
242    ```
243    CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069"
244    CALLOUT_ERRNO="2"
245    ```
246
247- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
248
249    These 3 keywords can be used to callout a failing I2C device path when the
250    full device path isn't known.  It is similar to CALLOUT_DEVICE_PATH in that
251    it will use data tables generated by the MRW to create the callouts.
252
253    CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or
254    just the bus number by itself.
255    CALLOUT_IIC_ADDR is the 7 bit address either as a decimal or a hex number
256    if preceded with a "0x".
257
258    ```
259    CALLOUT_IIC_BUS="/dev/i2c-7"
260    CALLOUT_IIC_ADDR="81"
261    CALLOUT_ERRNO=62
262    ```
263
264### Defining callouts in the message registry
265
266Callouts can be completely defined inside that error's definition in the PEL
267message registry.  This method allows the callouts to vary based on the system
268type or on any AdditionalData item.
269
270At a high level, this involves defining a callout section inside the registry
271entry that contain the location codes or procedure names to use, along with
272their priority.  If these can vary based on system type, the type provided by
273the entity manager will be one of the keys.  If they can also depend on an
274AdditionalData entry, then that will also be a key.
275
276See the message registry [README](registry/README.md) and
277[schema](registry/schema/schema.json) for the details.
278
279### Using the message registry along with CALLOUT_ entries
280
281If the message registry entry contains a callout definition and the event log
282also contains one of aforementioned CALLOUT keys in the AdditionalData
283property, then the PEL code will first add the callouts stemming from the
284CALLOUT items, followed by the callouts from the message registry.
285
286### Specifying multiple callouts using JSON format FFDC files
287
288Multiple callouts can be passed in by the creator at the time of PEL creation.
289This is done by specifying them in a JSON file that is then passed in as an
290[FFDC file](#ffdc-intended-for-userdata-pel-sections).  The JSON will still be
291added into a PEL UserData section for debug.
292
293To specify that an FFDC file contains callouts, the format value for that FFDC
294entry must be set to JSON, and the subtype field must be set to 0xCA:
295
296```
297using FFDC = std::tuple<CreateIface::FFDCFormat,
298                        uint8_t,
299                        uint8_t,
300                        sdbusplus::message::unix_fd>;
301
302FFDC ffdc{
303    CreateIface::FFDCFormat::JSON,
304    0xCA, // Callout subtype
305    0x01, // Callout version, set to 0x01
306    fd};
307```
308
309The JSON contains an array of callouts that must be in order of highest
310priority to lowest, with a maximum of 10.  Any callouts after the 10th will
311just be thrown away as there is no room for them in the PEL. The format looks
312like:
313
314```
315[
316    {
317        // First callout
318    },
319    {
320        // Second callout
321    },
322    {
323        // Nth callout
324    }
325]
326```
327
328A callout entry can be a normal hardware callout, a maintenance procedure
329callout, or a symbolic FRU callout.  Each callout must contain a Priority
330field, where the possible values are:
331
332* "H" = High
333* "M" = Medium
334* "A" = Medium Group A
335* "B" = Medium Group B
336* "C" = Medium Group C
337* "L" = Low
338
339Either unexpanded location codes or D-Bus inventory object paths can be used to
340specify the called out part.  An unexpanded location code does not have the
341system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so
342can be either Ufcs-P1 or just P1).
343
344#### Normal hardware FRU callout
345
346Normal hardware callouts must contain either the location code or inventory
347path, and priority.  Even though the PEL code doesn't do any guarding or
348deconfiguring itself, it needs to know if either of those things occurred as
349there are status bits in the PEL to reflect them.  The Guarded and Deconfigured
350fields are used for this.  Those fields are optional and if omitted then their
351values will be false.
352
353When the inventory path of a sub-FRU is passed in, the PEL code will put the
354location code of the parent FRU into the callout.
355
356```
357{
358    "LocationCode": "P0-C1",
359    "Priority": "H"
360}
361
362{
363    "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5",
364    "Priority": "H",
365    "Deconfigured": true,
366    "Guarded": true
367}
368
369```
370
371MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally
372be added to callouts to specify failing devices on a FRU.  These may be used
373during the manufacturing test process, where there may be the ability to do
374these replacements.  There can be up to 15 MRUs, each with its own priority,
375embedded in a callout.  The possible priority values match the FRU priority
376values.
377
378Note that since JSON only supports numbers in decimal and not in hex, MRU IDs
379will show up as decimal when visually inspecting the JSON.
380
381```
382{
383    "LocationCode": "P0-C1",
384    "Priority": "H",
385    "MRUs": [
386        {
387            "ID": 1234,
388            "Priority": "H"
389        },
390        {
391            "ID": 5678,
392            "Priority": "H"
393        }
394    ]
395}
396```
397
398#### Maintenance procedure callout
399
400The LocationCode field is not used with procedure callouts.  Only the first 7
401characters of the Procedure field will be used by the PEL.
402
403```
404{
405    "Procedure": "PRONAME",
406    "Priority": "H"
407}
408```
409
410#### Symbolic FRU callout
411
412Only the first seven characters of the SymbolicFRU field will be used by the PEL.
413
414If the TrustedLocationCode field is present and set to true, this means the
415location code may be used to turn on service indicators, so the LocationCode
416field is required.  If TrustedLocationCode is false or missing, then the
417LocationCode field is optional.
418
419```
420{
421    "TrustedLocationCode": true,
422    "Location Code": "P0-C1",
423    "Priority": "H",
424    "SymbolicFRU": "FRUNAME"
425}
426```
427
428## `Action Flags` and `Event Type` Rules
429
430The `Action Flags` and `Event Type` PEL fields are optional in the message
431registry, and if not present the code will set them based on certain rules
432layed out in the PEL spec.
433
434These rules are:
4351. Always set the `Report` flag, unless the `Do Not Report` flag is already on.
4362. Always clear the `SP Call Home` flag, as that feature isn't supported.
4373. If the severity is `Non-error Event`:
438    - Clear the `Service Action` flag.
439    - Clear the `Call Home` flag.
440    - If the `Event Type` field is `Not Applicable`, change it to `Information
441      Only`.
442    - If the `Event Type` field is `Information Only` or `Tracing`, set the
443      `Hidden` flag.
4444. If the severity is `Recovered`:
445    - Set the `Hidden` flag.
446    - Clear the `Service Action` flag.
447    - Clear the `Call Home` flag.
4485. For all other severities:
449    - Clear the `Hidden` flag.
450    - Set the `Service Action` flag.
451    - Set the `Call Home` flag.
452
453Additional rules may be added in the future if necessary.
454
455## D-Bus Interfaces
456
457See the org.open_power.Logging.PEL interface definition for the most up to date
458information.
459
460## PEL Retention
461
462The PEL repository is allocated a set amount of space on the BMC.  When that
463space gets close to being full, the code will remove a percentage of PELs to
464make room for new ones.  In addition, the code will keep a cap on the total
465number of PELs allowed.  Note that removing a PEL will also remove the
466corresponding OpenBMC event log.
467
468The disk capacity limit is set to 20MB, and the number limit is 3000.
469
470The rules used to remove logs are listed below.  The checks will be run after a
471PEL has been added and the method to create the PEL has returned to the caller,
472i.e. run when control gets back to the event loop.
473
474### Removal Algorithm
475
476If the size used is 95% or under of the allocated space and under the limit on
477the number of PELs, nothing further needs to be done, otherwise continue and
478run all 5 of the following steps.  Each step itself only deletes PELs until it
479meets its requirement and then it stops.
480
481The steps are:
482
4831. Remove BMC created informational PELs until they take up 15% or less of the
484   allocated space.
485
4862. Remove BMC created non-informational PELs until they take up 30% or less of
487   the allocated space.
488
4893. Remove non-BMC created informational PELs until they take up 15% or less of
490   the allocated space.
491
4924. Remove non-BMC created non-informational PELs until they take up 30% or less
493   of the allocated space.
494
4955. After the previous 4 steps are complete, if there are still more than the
496   maximum number of PELs, remove PELs down to 80% of the maximum.
497
498PELs with associated guard records will never be deleted.  Each step above
499makes the following 4 passes, stopping as soon as its limit is reached:
500
501Pass 1. Remove HMC acknowledged PELs.<br>
502Pass 2. Remove OS acknowledged PELs.<br>
503Pass 3. Remove PHYP acknowledged PELs.<br>
504Pass 4. Remove all PELs.
505
506After all these steps, disk capacity will be at most 90% (15% + 30% + 15% +
50730%).
508
509## Adding python3 modules for PEL UserData and SRC parsing
510
511In order to support python3 modules for the parsing of PEL User Data sections
512and to decode SRC data, setuptools is used to import python3 packages from
513external repos to be included in the OpenBMC image.
514```
515Sample layout for setuptools:
516
517setup.py
518src/usr/scom/plugins/ebmc/b0300.py
519src/usr/i2c/plugins/ebmc/b0700.py
520src/build/tools/ebmc/errludP_Helpers.py
521```
522
523`setup.py` is the build script for setuptools. It contains information about the
524package (such as the name and version) as well as which code files to include.
525
526The setup.py template to be used for eBMC User Data parsers:
527```
528import os.path
529from setuptools import setup
530
531# To update this dict with new key/value pair for every component added
532# Key: The package name to be installed as
533# Value: The path containing the package's python modules
534dirmap = {
535    "b0300": "src/usr/scom/plugins/ebmc",
536    "b0700": "src/usr/i2c/plugins/ebmc",
537    "helpers": "src/build/tools/ebmc"
538}
539
540# All packages will be installed under 'udparsers' namespace
541def get_package_name(dirmap_key):
542    return "udparsers.{}".format(dirmap_key)
543
544def get_package_dirent(dirmap_item):
545    package_name = get_package_name(dirmap_item[0])
546    package_dir = dirmap_item[1]
547    return (package_name, package_dir)
548
549def get_packages():
550    return map(get_package_name, dirmap.keys())
551
552def get_package_dirs():
553    return map(get_package_dirent, dirmap.items())
554
555setup(
556        name="Hostboot",
557        version="0.1",
558        packages=list(get_packages()),
559        package_dir=dict(get_package_dirs())
560)
561```
562- User Data parser module
563  - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the
564    Private Header section (in ASCII) and `zzzz` is the 2 byte Component ID
565    from the User Data section itself (in HEX). All should be converted to
566    lowercase.
567    - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100
568  - Function to provide: `parseUDToJson`
569    - Argument list:
570      1. (int) Sub-section type
571      2. (int) Section version
572      3. (memoryview): Data
573    - Return data:
574      1. (str) JSON string
575
576  - Sample User Data parser module:
577    ```
578    import json
579    def parseUDToJson(subType, ver, data):
580        d = dict()
581        ...
582        # Parse and populate data into dictionary
583        ...
584        jsonStr = json.dumps(d)
585        return jsonStr
586    ```
587- SRC parser module
588  - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the
589    Private Header section (in ASCII, converted to lowercase).
590    - For example: `bsrc.py` for Hostboot generated SRCs
591  - Function to provide: `parseSRCToJson`
592    - Argument list:
593      1. (str) Refcode ASCII string
594      2. (str) Hexword 2
595      3. (str) Hexword 3
596      4. (str) Hexword 4
597      5. (str) Hexword 5
598      6. (str) Hexword 6
599      7. (str) Hexword 7
600      8. (str) Hexword 8
601      9. (str) Hexword 9
602    - Return data:
603      1. (str) JSON string
604
605  - Sample SRC parser module:
606    ```
607    import json
608    def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \
609                       word8, word9):
610        d = dict()
611        ...
612        # Decode SRC data into dictionary
613        ...
614        jsonStr = json.dumps(d)
615        return jsonStr
616    ```
617
618## Fail Boot on Host Errors
619
620The fail boot on hw error [design][1] provides a function where a system owner
621can tell the firmware to fail the boot of a system if a BMC phosphor-logging
622event has a hardware callout in it.
623
624It is required that when this fail boot on hardware error setting is enabled,
625that the BMC fail the boot for **any** error from the host which satisfies the
626following criteria:
627- not SeverityType::nonError
628- has a callout of any kind from the `FailingComponentType` structure
629
630[1]: https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md
631