1# OpenPower Platform Event Log (PEL) extension
2
3This extension will create PELs for every OpenBMC event log. It is also possible
4to point to the raw PEL to use in the OpenBMC event, and then that will be used
5instead of creating one.
6
7## Contents
8
9- [Passing in data when creating PELs](#passing-pel-related-data-within-an-openbmc-event-log)
10- [Default UserData sections for BMC created PELs](#default-userdata-sections-for-bmc-created-pels)
11- [The PEL Message Registry](#the-pel-message-registry)
12- [Callouts](#callouts)
13- [Action Flags and Event Type Rules](#action-flags-and-event-type-rules)
14- [D-Bus Interfaces](#d-bus-interfaces)
15- [PEL Retention](#pel-retention)
16- [Adding python3 modules for PEL UserData and SRC parsing](#adding-python3-modules-for-pel-userdata-and-src-parsing)
17- [Fail Boot on Host Errors](#fail-boot-on-host-errors)
18
19## Passing PEL related data within an OpenBMC event log
20
21An error log creator can pass in data that is relevant to a PEL by using certain
22keywords in the AdditionalData property of the event log.
23
24### AdditionalData keywords
25
26#### RAWPEL
27
28This keyword is used to point to an existing PEL in a binary file that should be
29associated with this event log. The syntax is:
30
31```ascii
32RAWPEL=<path to PEL File>
33e.g.
34RAWPEL="/tmp/pels/pel.5"
35```
36
37The code will assign its own error log ID to this PEL, and also update the
38commit timestamp field to the current time.
39
40#### POWER_THERMAL_CRITICAL_FAULT
41
42This keyword is used to set the power fault bit in PEL. The syntax is:
43
44```ascii
45POWER_THERMAL_CRITICAL_FAULT=<FLAG>
46e.g.
47POWER_THERMAL_CRITICAL_FAULT=TRUE
48```
49
50Note that TRUE is the only value supported.
51
52#### SEVERITY_DETAIL
53
54This is used when the passed in event log severity determines the PEL severity
55and a more granular PEL severity is needed beyond what the normal event log to
56PEL severity conversion could give.
57
58The syntax is:
59
60```ascii
61SEVERITY_DETAIL=<SEVERITY_TYPE>
62e.g.
63SEVERITY_DETAIL=SYSTEM_TERM
64```
65
66Option Supported:
67
68- SYSTEM_TERM, changes the Severity value from 0x50 to 0x51
69
70#### ESEL
71
72This keyword's data contains a full PEL in string format. This is how hostboot
73sends down PELs when it is configured in IPMI communication mode. The PEL is
74handled just like the PEL obtained using the RAWPEL keyword.
75
76The syntax is:
77
78```ascii
79ESEL=
80"00 00 df 00 00 00 00 20 00 04 12 01 6f aa 00 00 50 48 00 30 01 00 33 00 00..."
81```
82
83Note that there are 16 bytes of IPMI SEL data before the PEL data starts.
84
85#### \_PID
86
87This keyword that contains the application's PID is added automatically by the
88phosphor-logging daemon when the `commit` or `report` APIs are used to create an
89event log, but not when the `Create` D-Bus method is used. If a caller of the
90`Create` API wishes to have their PID captured in the PEL this should be used.
91
92This will be added to the PEL in a section of type User Data (UD), along with
93the application name it corresponds to.
94
95The syntax is:
96
97```ascii
98_PID=<PID of application>
99e.g.
100_PID="12345"
101```
102
103#### CALLOUT_INVENTORY_PATH
104
105This is used to pass in an inventory item to use as a callout. See
106[here for details](#passing-callouts-in-with-the-additionaldata-property)
107
108#### CALLOUT_PRIORITY
109
110This can be used along with CALLOUT_INVENTORY_PATH to specify the priority of
111that FRU callout. If not specified, the default priority is "H"/High Priority.
112
113The possible values are:
114
115- "H": High Priority
116- "M": Medium Priority
117- "L": Low Priority
118
119See [here for details](#passing-callouts-in-with-the-additionaldata-property)
120
121#### CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
122
123This is used to pass in a device path to create callouts from. See
124[here for details](#passing-callouts-in-with-the-additionaldata-property)
125
126#### CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
127
128This is used to pass in an I2C bus and address to create callouts from. See
129[here for details](#passing-callouts-in-with-the-additionaldata-property)
130
131#### PEL_SUBSYSTEM
132
133This keyword is used to pass in the subsystem that should be associated with
134this event log. The syntax is: `PEL_SUBSYSTEM=<subsystem value in hex>` e.g.
135PEL_SUBSYSTEM=0x20
136
137### FFDC Intended For UserData PEL sections
138
139When one needs to add FFDC into the PEL UserData sections, the
140`CreateWithFFDCFiles` D-Bus method on the `xyz.openbmc_project.Logging.Create`
141interface must be used when creating a new event log. This method takes a list
142of files to store in the PEL UserData sections.
143
144That API is the same as the 'Create' one, except it has a new parameter:
145
146```cpp
147std::vector<std::tuple<enum[FFDCFormat],
148                       uint8_t,
149                       uint8_t,
150                       sdbusplus::message::unix_fd>>
151```
152
153Each entry in the vector contains a file descriptor for a file that will be
154stored in a unique UserData section. The tuple's arguments are:
155
156- enum[FFDCFormat]: The data format type, the options are:
157  - 'JSON'
158    - The parser will use nlohmann::json\'s pretty print
159  - 'CBOR'
160    - The parser will use nlohmann::json\'s pretty print
161  - 'Text'
162    - The parser will output ASCII text
163  - 'Custom'
164    - The parser will hexdump the data, unless there is a parser registered for
165      this component ID and subtype.
166- uint8_t: subType
167  - Useful for the 'custom' type. Not used with the other types.
168- uint8_t: version
169  - The version of the data.
170  - Used for the custom type.
171  - Not planning on using for JSON/BSON unless a reason to do so appears.
172- unixfd - The file descriptor for the opened file that contains the contents.
173  The file descriptor can be closed and the file can be deleted if desired after
174  the method call.
175
176An example of saving JSON data to a file and getting its file descriptor is:
177
178```cpp
179nlohmann::json json = ...;
180auto jsonString = json.dump();
181FILE* fp = fopen(filename, "w");
182fwrite(jsonString.data(), 1, jsonString.size(), fp);
183int fd = fileno(fp);
184```
185
186Alternatively, 'open()' can be used to obtain the file descriptor of the file.
187
188Upon receiving this data, the PEL code will create UserData sections for each
189entry in that vector with the following UserData fields:
190
191- Section header component ID:
192  - If the type field from the tuple is "custom", use the component ID from the
193    message registry.
194  - Otherwise, set the component ID to the phosphor-logging component ID so that
195    the parser knows to use the built in parsers (e.g. json) for the type.
196- Section header subtype: The subtype field from the tuple.
197- Section header version: The version field from the tuple.
198- Section data: The data from the file.
199
200If there is a peltool parser registered for the custom type (method is TBD),
201that will be used by peltool to print the data, otherwise it will be hexdumped.
202
203Before adding each of these UserData sections, a check will be done to see if
204the PEL size will remain under the maximum size of 16KB. If not, the UserData
205section will be truncated down enough so that it will fit into the 16KB.
206
207## Default UserData sections for BMC created PELs
208
209The extension code that creates PELs will add these UserData sections to every
210PEL:
211
212- The AdditionalData property contents
213
214  - If the AdditionalData property in the OpenBMC event log has anything in it,
215    it will be saved in a UserData section as a JSON string.
216
217- System information
218  - This section contains various pieces of system information, such as the full
219    code level and the BMC, chassis, and host state properties.
220
221## The PEL Message Registry
222
223The PEL message registry is used to create PELs from OpenBMC event logs.
224Documentation can be found [here](registry/README.md).
225
226## Callouts
227
228A callout points to a FRU, a symbolic FRU, or an isolation procedure. There can
229be from zero to ten of them in each PEL, where they are located in the SRC
230section.
231
232There are a few different ways to add callouts to a PEL. In all cases, the
233callouts will be sorted from highest to lowest priority within the PEL after
234they are added.
235
236### Passing callouts in with the AdditionalData property
237
238The PEL code can add callouts based on the values of special entries in the
239AdditionalData event log property. They are:
240
241- CALLOUT_INVENTORY_PATH
242
243  This keyword is used to call out a single FRU by passing in its D-Bus
244  inventory path. When the PEL code sees this, it will create a single FRU
245  callout, using the VPD properties (location code, FN, CCIN) from that
246  inventory item. If that item is not a FRU itself and does not have a location
247  code, it will keep searching its parents until it finds one that is.
248
249  The priority of the FRU callout will be high, unless the CALLOUT_PRIORITY
250  keyword is also present and contains a different priority in which case it
251  will be used instead. This can be useful when a maintenance procedure with a
252  high priority callout is specified for this error in the message registry and
253  the FRU callout needs to have a different priority.
254
255```ascii
256  CALLOUT_INVENTORY_PATH=
257  "/xyz/openbmc_project/inventory/system/chassis/motherboard"
258```
259
260- CALLOUT_DEVICE_PATH with CALLOUT_ERRNO
261
262  These keywords are required as a pair to indicate faulty device communication,
263  usually detected by a failure accessing a device at that sysfs path. The PEL
264  code will use a data table generated by the MRW to map these device paths to
265  FRU callout lists. The errno value may influence the callout.
266
267  I2C, FSI, FSI-I2C, and FSI-SPI paths are supported.
268
269```ascii
270  CALLOUT_DEVICE_PATH="/sys/bus/i2c/devices/3-0069"
271  CALLOUT_ERRNO="2"
272```
273
274- CALLOUT_IIC_BUS with CALLOUT_IIC_ADDR and CALLOUT_ERRNO
275
276  These 3 keywords can be used to callout a failing I2C device path when the
277  full device path isn't known. It is similar to CALLOUT_DEVICE_PATH in that it
278  will use data tables generated by the MRW to create the callouts.
279
280  CALLOUT_IIC_BUS is in the form "/dev/i2c-X" where X is the bus number, or just
281  the bus number by itself. CALLOUT_IIC_ADDR is the 7 bit address either as a
282  decimal or a hex number if preceded with a "0x".
283
284```ascii
285  CALLOUT_IIC_BUS="/dev/i2c-7"
286  CALLOUT_IIC_ADDR="81"
287  CALLOUT_ERRNO=62
288```
289
290### Defining callouts in the message registry
291
292Callouts can be completely defined inside that error's definition in the PEL
293message registry. This method allows the callouts to vary based on the system
294type or on any AdditionalData item.
295
296At a high level, this involves defining a callout section inside the registry
297entry that contain the location codes or procedure names to use, along with
298their priority. If these can vary based on system type, the type provided by the
299entity manager will be one of the keys. If they can also depend on an
300AdditionalData entry, then that will also be a key.
301
302See the message registry [README](registry/README.md) and
303[schema](registry/schema/schema.json) for the details.
304
305### Using the message registry along with CALLOUT\_ entries
306
307If the message registry entry contains a callout definition and the event log
308also contains one of aforementioned CALLOUT keys in the AdditionalData property,
309then the PEL code will first add the callouts stemming from the CALLOUT items,
310followed by the callouts from the message registry.
311
312### Specifying multiple callouts using JSON format FFDC files
313
314Multiple callouts can be passed in by the creator at the time of PEL creation.
315This is done by specifying them in a JSON file that is then passed in as an
316[FFDC file](#ffdc-intended-for-userdata-pel-sections). The JSON will still be
317added into a PEL UserData section for debug.
318
319To specify that an FFDC file contains callouts, the format value for that FFDC
320entry must be set to JSON, and the subtype field must be set to 0xCA:
321
322```cpp
323using FFDC = std::tuple<CreateIface::FFDCFormat,
324                        uint8_t,
325                        uint8_t,
326                        sdbusplus::message::unix_fd>;
327
328FFDC ffdc{
329    CreateIface::FFDCFormat::JSON,
330    0xCA, // Callout subtype
331    0x01, // Callout version, set to 0x01
332    fd};
333```
334
335The JSON contains an array of callouts that must be in order of highest priority
336to lowest, with a maximum of 10. Any callouts after the 10th will just be thrown
337away as there is no room for them in the PEL. The format looks like:
338
339```jsonl
340[
341    {
342        // First callout
343    },
344    {
345        // Second callout
346    },
347    {
348        // Nth callout
349    }
350]
351```
352
353A callout entry can be a normal hardware callout, a maintenance procedure
354callout, or a symbolic FRU callout. Each callout must contain a Priority field,
355where the possible values are:
356
357- "H" = High
358- "M" = Medium
359- "A" = Medium Group A
360- "B" = Medium Group B
361- "C" = Medium Group C
362- "L" = Low
363
364Either unexpanded location codes or D-Bus inventory object paths can be used to
365specify the called out part. An unexpanded location code does not have the
366system VPD information embedded in it, and the 'Ufcs-' prefix is optional (so
367can be either Ufcs-P1 or just P1).
368
369#### Normal hardware FRU callout
370
371Normal hardware callouts must contain either the location code or inventory
372path, and priority. Even though the PEL code doesn't do any guarding or
373deconfiguring itself, it needs to know if either of those things occurred as
374there are status bits in the PEL to reflect them. The Guarded and Deconfigured
375fields are used for this. Those fields are optional and if omitted then their
376values will be false.
377
378When the inventory path of a sub-FRU is passed in, the PEL code will put the
379location code of the parent FRU into the callout.
380
381```jsonl
382{
383    "LocationCode": "P0-C1",
384    "Priority": "H"
385}
386
387{
388    "InventoryPath": "/xyz/openbmc_project/inventory/motherboard/cpu0/core5",
389    "Priority": "H",
390    "Deconfigured": true,
391    "Guarded": true
392}
393
394```
395
396MRUs (Manufacturing Replaceable Units) are 4 byte numbers that can optionally be
397added to callouts to specify failing devices on a FRU. These may be used during
398the manufacturing test process, where there may be the ability to do these
399replacements. There can be up to 15 MRUs, each with its own priority, embedded
400in a callout. The possible priority values match the FRU priority values.
401
402Note that since JSON only supports numbers in decimal and not in hex, MRU IDs
403will show up as decimal when visually inspecting the JSON.
404
405```jsonl
406{
407  "LocationCode": "P0-C1",
408  "Priority": "H",
409  "MRUs": [
410    {
411      "ID": 1234,
412      "Priority": "H"
413    },
414    {
415      "ID": 5678,
416      "Priority": "H"
417    }
418  ]
419}
420```
421
422#### Maintenance procedure callout
423
424The LocationCode field is not used with procedure callouts. Only the first 7
425characters of the Procedure field will be used by the PEL.
426
427```jsonl
428{
429  "Procedure": "PRONAME",
430  "Priority": "H"
431}
432```
433
434#### Symbolic FRU callout
435
436Only the first seven characters of the SymbolicFRU field will be used by the
437PEL.
438
439If the TrustedLocationCode field is present and set to true, this means the
440location code may be used to turn on service indicators, so the LocationCode
441field is required. If TrustedLocationCode is false or missing, then the
442LocationCode field is optional.
443
444```jsonl
445{
446  "TrustedLocationCode": true,
447  "Location Code": "P0-C1",
448  "Priority": "H",
449  "SymbolicFRU": "FRUNAME"
450}
451```
452
453## `Action Flags` and `Event Type` Rules
454
455The `Action Flags` and `Event Type` PEL fields are optional in the message
456registry, and if not present the code will set them based on certain rules layed
457out in the PEL spec.
458
459These rules are:
460
4611. Always set the `Report` flag, unless the `Do Not Report` flag is already on.
4622. Always clear the `SP Call Home` flag, as that feature isn't supported.
4633. If the severity is `Non-error Event`:
464   - Clear the `Service Action` flag.
465   - Clear the `Call Home` flag.
466   - If the `Event Type` field is `Not Applicable`, change it to
467     `Information Only`.
468   - If the `Event Type` field is `Information Only` or `Tracing`, set the
469     `Hidden` flag.
4704. If the severity is `Recovered`:
471   - Set the `Hidden` flag.
472   - Clear the `Service Action` flag.
473   - Clear the `Call Home` flag.
4745. For all other severities:
475   - Clear the `Hidden` flag.
476   - Set the `Service Action` flag.
477   - Set the `Call Home` flag.
478
479Additional rules may be added in the future if necessary.
480
481## D-Bus Interfaces
482
483See the org.open_power.Logging.PEL interface definition for the most up to date
484information.
485
486## PEL Retention
487
488The PEL repository is allocated a set amount of space on the BMC. When that
489space gets close to being full, the code will remove a percentage of PELs to
490make room for new ones. In addition, the code will keep a cap on the total
491number of PELs allowed. Note that removing a PEL will also remove the
492corresponding OpenBMC event log.
493
494The disk capacity limit is set to 20MB, and the number limit is 3000.
495
496The rules used to remove logs are listed below. The checks will be run after a
497PEL has been added and the method to create the PEL has returned to the caller,
498i.e. run when control gets back to the event loop.
499
500### Removal Algorithm
501
502If the size used is 95% or under of the allocated space and under the limit on
503the number of PELs, nothing further needs to be done, otherwise continue and run
504all 5 of the following steps. Each step itself only deletes PELs until it meets
505its requirement and then it stops.
506
507The steps are:
508
5091. Remove BMC created informational PELs until they take up 15% or less of the
510   allocated space.
511
5122. Remove BMC created non-informational PELs until they take up 30% or less of
513   the allocated space.
514
5153. Remove non-BMC created informational PELs until they take up 15% or less of
516   the allocated space.
517
5184. Remove non-BMC created non-informational PELs until they take up 30% or less
519   of the allocated space.
520
5215. After the previous 4 steps are complete, if there are still more than the
522   maximum number of PELs, remove PELs down to 80% of the maximum.
523
524PELs with associated guard records will never be deleted. Each step above makes
525the following 4 passes, stopping as soon as its limit is reached:
526
527- Pass 1. Remove HMC acknowledged PELs.
528- Pass 2. Remove OS acknowledged PELs.
529- Pass 3. Remove PHYP acknowledged PELs.
530- Pass 4. Remove all PELs.
531
532After all these steps, disk capacity will be at most 90% (15% + 30% + 15% +
53330%).
534
535## Adding python3 modules for PEL UserData and SRC parsing
536
537In order to support python3 modules for the parsing of PEL User Data sections
538and to decode SRC data, setuptools is used to import python3 packages from
539external repos to be included in the OpenBMC image.
540
541Sample layout for setuptools:
542
543setup.py src/usr/scom/plugins/ebmc/b0300.py src/usr/i2c/plugins/ebmc/b0700.py
544src/build/tools/ebmc/errludP_Helpers.py
545
546`setup.py` is the build script for setuptools. It contains information about the
547package (such as the name and version) as well as which code files to include.
548
549The setup.py template to be used for eBMC User Data parsers:
550
551```python3
552import os.path
553from setuptools import setup
554
555# To update this dict with new key/value pair for every component added
556# Key: The package name to be installed as
557# Value: The path containing the package's python modules
558dirmap = {
559    "b0300": "src/usr/scom/plugins/ebmc",
560    "b0700": "src/usr/i2c/plugins/ebmc",
561    "helpers": "src/build/tools/ebmc"
562}
563
564# All packages will be installed under 'udparsers' namespace
565def get_package_name(dirmap_key):
566    return "udparsers.{}".format(dirmap_key)
567
568def get_package_dirent(dirmap_item):
569    package_name = get_package_name(dirmap_item[0])
570    package_dir = dirmap_item[1]
571    return (package_name, package_dir)
572
573def get_packages():
574    return map(get_package_name, dirmap.keys())
575
576def get_package_dirs():
577    return map(get_package_dirent, dirmap.items())
578
579setup(
580        name="Hostboot",
581        version="0.1",
582        packages=list(get_packages()),
583        package_dir=dict(get_package_dirs())
584)
585```
586
587- User Data parser module
588
589  - Module name: `xzzzz.py`, where `x` is the Creator Subsystem from the Private
590    Header section (in ASCII) and `zzzz` is the 2 byte Component ID from the
591    User Data section itself (in HEX). All should be converted to lowercase.
592    - For example: `b0100.py` for Hostboot created UserData with CompID 0x0100
593  - Function to provide: `parseUDToJson`
594
595    - Argument list:
596      1. (int) Sub-section type
597      2. (int) Section version
598      3. (memoryview): Data
599    - Return data:
600      1. (str) JSON string
601
602  - Sample User Data parser module:
603
604```python3
605    import json
606    def parseUDToJson(subType, ver, data):
607        d = dict()
608        ...
609        # Parse and populate data into dictionary
610        ...
611        jsonStr = json.dumps(d)
612        return jsonStr
613```
614
615- SRC parser module
616
617  - Module name: `xsrc.py`, where `x` is the Creator Subsystem from the Private
618    Header section (in ASCII, converted to lowercase).
619    - For example: `bsrc.py` for Hostboot generated SRCs
620  - Function to provide: `parseSRCToJson`
621
622    - Argument list:
623      1. (str) Refcode ASCII string
624      2. (str) Hexword 2
625      3. (str) Hexword 3
626      4. (str) Hexword 4
627      5. (str) Hexword 5
628      6. (str) Hexword 6
629      7. (str) Hexword 7
630      8. (str) Hexword 8
631      9. (str) Hexword 9
632    - Return data:
633      1. (str) JSON string
634
635  - Sample SRC parser module:
636
637    ```python3
638    import json
639    def parseSRCToJson(ascii_str, word2, word3, word4, word5, word6, word7, \
640                       word8, word9):
641        d = dict({'A': 1, 'B': 2})
642        ...
643        # Decode SRC data into dictionary
644        ...
645        jsonStr = json.dumps(d)
646        return jsonStr
647    ```
648
649## Fail Boot on Host Errors
650
651The fail boot on hw error [design][1] provides a function where a system owner
652can tell the firmware to fail the boot of a system if a BMC phosphor-logging
653event has a hardware callout in it.
654
655It is required that when this fail boot on hardware error setting is enabled,
656that the BMC fail the boot for **any** error from the host which satisfies the
657following criteria:
658
659- not SeverityType::nonError
660- has a callout of any kind from the `FailingComponentType` structure
661
662## Self Boot Engine(SBE) First Failure Data Capture(FFDC) Support
663
664During SBE chip-op failure SBE creates FFDC with custom data format. SBE FFDC
665contains different packets, which include SBE internal failure related Trace and
666user data also Hardware procedure failure FFDC created by FAPI infrastructure.
667PEL infrastructure provides support to process SBE FFDC packets created by FAPI
668infrastructure during hardware procedure execution failures, also add callouts,
669user data section information based on FAPI processing in case non FAPI based
670failure, just keeps the raw FFDC data in the user section to support SBE parser
671plugins.
672
673CreatePELWithFFDCFiles D-Bus method on the `org.open_power.Logging.PEL`
674interface must be used when creating a new event log.
675
676To specify that an FFDC file contains SBE FFDC, the format value for that FFDC
677entry must be set to "custom", and the subtype field must be set to 0xCB:
678
679```cpp
680using FFDC = std::tuple<CreateIface::FFDCFormat,
681                        uint8_t,
682                        uint8_t,
683                        sdbusplus::message::unix_fd>;
684
685FFDC ffdc{
686     CreateIface::FFDCFormat::custom,
687     0xCB, // SBE FFDC subtype
688     0x01, // SBE FFDC version, set to 0x01
689     fd};
690```
691
692"SRC6" Keyword in the additional data section should be populated with below.
693
694- [0:15] chip position (hex)
695- [16:23] command class (hex)
696- [24:31] command (hex)
697
698e.g for GetSCOM
699
700SRC6="0002A201"
701
702Note: "phal" build-time configure option should be "enabled" to enable this
703feature.
704
705## PEL Archiving
706
707When an OpenBMC event log is deleted its corresponding PEL is moved to an
708archive folder. These archived PELs will be available in BMC dump. The archive
709path: /var/lib/phosphor-logging/extensions/pels/logs/archive.
710
711Highlighted points are:
712
713- PELs whose corresponding event logs have been deleted will be available in the
714  archive folder.
715- Archive folder size is tracked along with logs folder size and if combined
716  size exceeds warning size all archived PELs will be deleted.
717- Archived PEL logs can be viewed using peltool with flag --archive.
718- If a PEL is deleted using peltool its not archived.
719
720[1]:
721  https://github.com/openbmc/docs/blob/master/designs/fail-boot-on-hw-error.md
722