xref: /openbmc/docs/designs/bios-bmc-smm-error-logging.md (revision f4febd002df578bad816239b70950f84ea4567e8)
10b7780b5SBrandon Kim# BIOS->BMC SMM Error Logging Queue Daemon
20b7780b5SBrandon Kim
30b7780b5SBrandon KimAuthor:
4*f4febd00SPatrick Williams
5*f4febd00SPatrick Williams- Brandon Kim / brandonkim@google.com / @brandonk
60b7780b5SBrandon Kim
70b7780b5SBrandon KimOther contributors:
80b7780b5SBrandon Kim
9*f4febd00SPatrick Williams- Marco Cruz-Heredia / mcruzheredia@google.com
10*f4febd00SPatrick Williams
11*f4febd00SPatrick WilliamsCreated: Mar 15, 2022
120b7780b5SBrandon Kim
130b7780b5SBrandon Kim## Problem Description
140b7780b5SBrandon Kim
150b7780b5SBrandon KimWe've identified use cases where the BIOS will go into System Management Mode
160b7780b5SBrandon Kim(SMM) to provide error logs to the BMC, requiring messages to be sent as quickly
170b7780b5SBrandon Kimas possible without a handshake / ack back from the BMC due to the time
180b7780b5SBrandon Kimconstraint that it's under. The goal of this daemon we are proposing is to
190b7780b5SBrandon Kimimplement a circular buffer over a shared BIOS->BMC buffer that the BIOS can
200b7780b5SBrandon Kimfire-and-forget.
210b7780b5SBrandon Kim
220b7780b5SBrandon Kim## Background and References
230b7780b5SBrandon Kim
240b7780b5SBrandon KimThere are various ways of communicating between the BMC and the BIOS, but there
250b7780b5SBrandon Kimare only a few that don't require a handshake and lets the data persist in
260b7780b5SBrandon Kimshared memory. These will be listed in the "Alternatives Considered" section.
270b7780b5SBrandon Kim
280b7780b5SBrandon KimDifferent BMC vendors support different methods such as Shared Memory (SHM, via
290b7780b5SBrandon KimLPC / eSPI) and P2A or PCI Mailbox, but the existing daemon that utilizes them
30*f4febd00SPatrick Williamsdo it over IPMI blob to communicate where and how much data has been transferred
31*f4febd00SPatrick Williams(see [phosphor-ipmi-flash](https://github.com/openbmc/phosphor-ipmi-flash) and
32*f4febd00SPatrick Williams[libmctp/astlpc](https://github.com/openbmc/libmctp/blob/master/docs/bindings/vendor-ibm-astlpc.md))
330b7780b5SBrandon Kim
340b7780b5SBrandon Kim## Requirements
350b7780b5SBrandon Kim
360b7780b5SBrandon KimThe fundamental requirements for this daemon are listed as follows:
370b7780b5SBrandon Kim
380b7780b5SBrandon Kim1. The BMC shall initialize the shared buffer in a way that the BIOS can
390b7780b5SBrandon Kim   recognize when it can write to the buffer
400b7780b5SBrandon Kim2. After initialization, the BIOS shall not have to wait for an ack back from
410b7780b5SBrandon Kim   the BMC before any writes to the shared buffer (**no synchronization**)
420b7780b5SBrandon Kim3. The BIOS shall be the main writer to the shared buffer, with the BMC mainly
430b7780b5SBrandon Kim   reading the payloads, only writing to the buffer to update the header
440b7780b5SBrandon Kim4. The BMC shall read new payloads from the shared buffer for further processing
450b7780b5SBrandon Kim5. The BIOS must be able to write a payload (~1KB) to the buffer within 50µs
460b7780b5SBrandon Kim
470b7780b5SBrandon KimThe shared buffer will be as big as the protocol allows for a given BMC platform
480b7780b5SBrandon Kim(for Nuvoton's PCI Mailbox for NPCM 7xx as an example, 16KB) and each of the
490b7780b5SBrandon Kimpayloads is estimated to be less than 1KB.
500b7780b5SBrandon Kim
510b7780b5SBrandon KimThis daemon assumes that no other traffic will communicate through the given
520b7780b5SBrandon Kimprotocol. The circular buffer and its header will provide some protection
530b7780b5SBrandon Kimagainst corruption, but it should not be relied upon.
540b7780b5SBrandon Kim
550b7780b5SBrandon Kim## Proposed Design
560b7780b5SBrandon Kim
570b7780b5SBrandon KimThe implementation of interfacing with the shared buffer will very closely
580b7780b5SBrandon Kimfollow [phosphor-ipmi-flash](https://github.com/openbmc/phosphor-ipmi-flash). In
590b7780b5SBrandon Kimthe future, it may be wise to extract out the PCI Mailbox, P2A and LPC as
600b7780b5SBrandon Kimseparate libraries shared between `phosphor-ipmi-flash` and this daemon to
610b7780b5SBrandon Kimreduce duplication of code.
620b7780b5SBrandon Kim
630b7780b5SBrandon KimTaken from Marco's (mcruzheredia@google.com) internal design document for the
640b7780b5SBrandon Kimcircular buffer, the data structure of its header will look like the following:
650b7780b5SBrandon Kim
660b7780b5SBrandon Kim| Name                                | Size                             | Offset      | Written by   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                 |
67*f4febd00SPatrick Williams| ----------------------------------- | -------------------------------- | ----------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
680b7780b5SBrandon Kim| BMC Interface Version               | 4 bytes                          | 0x0         | BMC at init  | Allows the BIOS to determine if it is compatible with the BMC                                                                                                                                                                                                                                                                                                                                                                               |
690b7780b5SBrandon Kim| BIOS Interface Version              | 4 bytes                          | 0x4         | BIOS at init | Allows the BMC to determine if it is compatible with the BIOS                                                                                                                                                                                                                                                                                                                                                                               |
700b7780b5SBrandon Kim| Magic Number                        | 16 bytes                         | 0x8         | BMC at init  | Magic number to set the state of the queue as described below. Written by BMC once the memory region is ready to be used. Must be checked by BIOS before logging. BMC can change this number when it suspects data corruption to prevent BIOS from writing anything during reinitialization                                                                                                                                                 |
7193f86ae2SBrandon Kim| Queue size                          | 3 bytes                          | 0x18        | BMC at init  | Indicates the size of the region allocated for the circular queue. Written by BMC on init only, should not change during runtime. **This includes the size of the header and UE region size**                                                                                                                                                                                                                                               |
72503fbd34SBrandon Kim| Uncorrectable Error region size     | 2 bytes                          | 0x1b        | BMC at init  | Indicates the size of the region reserved for Uncorrectable Error (UE) logs. Written by BMC on init only, should not change during runtime                                                                                                                                                                                                                                                                                                  |
73503fbd34SBrandon Kim| BMC flags                           | 4 bytes                          | 0x1d        | BMC          | <ul><li>BIT0 - BMC UE reserved region “switch”<ul><li>Toggled when BMC reads a UE from the reserved region.</li></ul><li>BIT1 - Overflow<ul><li>Lets BIOS know BMC has seen the overflow incident</li><li>Toggled when BMC acks the overflow incident</li></ul><li>BIT2 - BMC_READY<ul><li>BMC sets this bit once it has received any initialization information it needs to get from the BIOS before it’s ready to receive logs.</li></ul> |
74503fbd34SBrandon Kim| BMC read pointer                    | 3 bytes                          | 0x21        | BMC          | Used to allow the BIOS to detect when the BMC was unable to read the previous error logs in time to prevent the circular buffer from overflowing.                                                                                                                                                                                                                                                                                           |
75503fbd34SBrandon Kim| Padding                             | 4 bytes                          | 0x24        | Reserved     | Padding for 8 byte alignment                                                                                                                                                                                                                                                                                                                                                                                                                |
760b7780b5SBrandon Kim| BIOS flags                          | 4 bytes                          | 0x28        | BIOS         | <ul><li>BIT0 - BIOS UE reserved region “switch”<ul><li> Toggled when BIOS writes a UE to the reserved region.</li></ul><li>BIT1 - Overflow<ul><li>Lets the BMC know that it missed an error log</li><li>Toggled when BIOS sees overflow and not already overflowed</li></ul><li>BIT2 - Incomplete Initialization<ul><li>Set when BIOS has attempted to initialize but did not see BMC ack back with `BMC_READY` bit in BMC flags</li></ul>  |
77503fbd34SBrandon Kim| BIOS write pointer                  | 3 bytes                          | 0x2c        | BIOS         | Indicates where the next log will be written by BIOS. Used to tell BMC when it should read a new log                                                                                                                                                                                                                                                                                                                                        |
78503fbd34SBrandon Kim| Padding                             | 1 byte                           | 0x2f        | Reserved     | Padding for 8 byte alignment                                                                                                                                                                                                                                                                                                                                                                                                                |
790b7780b5SBrandon Kim| Uncorrectable Error reserved region | TBD1                             | 0x30        | BIOS         | Reserved region only for UE logs. This region is only used if the rest of the buffer is going to overflow and there is no unread UE log already in the region.                                                                                                                                                                                                                                                                              |
800b7780b5SBrandon Kim| Error Logs from BIOS                | Size of the Buffer - 0x30 - TBD1 | 0x30 + TBD1 | BIOS         | Logs vary by type, so each log will self-describe with a header. This region will fill up the rest of the buffer                                                                                                                                                                                                                                                                                                                            |
810b7780b5SBrandon Kim
820b7780b5SBrandon Kim### Initialization
830b7780b5SBrandon Kim
840b7780b5SBrandon KimThis daemon will first initialize the shared buffer by writing zero to the whole
850b7780b5SBrandon Kimbuffer, then initializing the header's `BMC at init` fields before writing the
860b7780b5SBrandon Kim`Magic Number`. Once the `Magic Number` is written to, the BIOS will assume that
870b7780b5SBrandon Kimthe shared buffer has been properly initialized, and will be able to start
880b7780b5SBrandon Kimwriting entries to it.
890b7780b5SBrandon Kim
900b7780b5SBrandon KimIf there are any further initialization between the BIOS and the BMC required,
910b7780b5SBrandon Kimthe BMC needs to set the `BMC_READY` bit in the BMC flags once the
920b7780b5SBrandon Kiminitialization completes. If the BIOS does not see the flag being set, the BIOS
930b7780b5SBrandon Kimshall set the `Incomplete Initialization` flag to notify the BMC to reinitialize
940b7780b5SBrandon Kimthe buffer.
950b7780b5SBrandon Kim
960b7780b5SBrandon Kim### Reading and Processing
970b7780b5SBrandon Kim
980b7780b5SBrandon KimThis daemon will poll the buffer at a set interval (the exact number will be
990b7780b5SBrandon Kimconfigurable as the processing time and performance of different platforms may
1000b7780b5SBrandon Kimrequire different polling rate) and once a new payload is detected, the payload
1010b7780b5SBrandon Kimwill be processed by a library that can also be chosen and configured at
1020b7780b5SBrandon Kimcompile-time.
1030b7780b5SBrandon Kim
1040b7780b5SBrandon KimNote that the Uncorrectable Error logs have a reserved region as they contain
1050b7780b5SBrandon Kimcritical information that we don't want to lose, and should be prioritized over
1060b7780b5SBrandon Kimnormal error logs. This reserved region will be used to log a UE log only if an
1070b7780b5SBrandon Kimoverflow of the normal error log queue is imminent and the BMC has acked that
1080b7780b5SBrandon Kimany preexisting UE log in this region has already been read using Bit0 of the
1090b7780b5SBrandon Kim`BMC flag`.
1100b7780b5SBrandon Kim
1110b7780b5SBrandon KimAn example of a processing library (and something we would like to push in our
1120b7780b5SBrandon Kiminitial version of this daemon) would be an RDE decoder for processing a subset
1130b7780b5SBrandon Kimof Redfish Device Enablement (RDE) commands, and decoding its attached Binary
1140b7780b5SBrandon KimEncoded JSON (BEJ) payloads.
1150b7780b5SBrandon Kim
1160b7780b5SBrandon Kim## Alternatives Considered
1170b7780b5SBrandon Kim
118*f4febd00SPatrick Williams- IPMI was considered, did not meet our speed requirement of writing 1KB entry
1190b7780b5SBrandon Kim  in about 50 microseconds.
120*f4febd00SPatrick Williams  - For reference, initial PCI Mailbox performance measurement showed 1KB entry
1210b7780b5SBrandon Kim    write took roughly 10 microseconds.
122*f4febd00SPatrick Williams- LPC / eSPI was also considered but our BMC's SHM buffer was limited to 4KB
1230b7780b5SBrandon Kim  which was not enough for our use case.
124*f4febd00SPatrick Williams- `libmctp` and MCTP PCIe VDM were considered.
125*f4febd00SPatrick Williams  - `libmctp`'s current implementation relies on LPC as the transport binding
1260b7780b5SBrandon Kim    and IPMI KCS for synchronization. LPC as discussed, does not fit our current
1270b7780b5SBrandon Kim    need and synchronization does not work.
128*f4febd00SPatrick Williams  - We may use MCTP PCIe VDM on our future platforms once we have more resources
1290b7780b5SBrandon Kim    with expertise both from the BMC and the BIOS side (which we currently lack)
1300b7780b5SBrandon Kim    for our current project timeline.
1310b7780b5SBrandon Kim
1320b7780b5SBrandon Kim## Impacts
1330b7780b5SBrandon Kim
1340b7780b5SBrandon KimReading from the buffer and processing it may hinder performance of the BMC,
1350b7780b5SBrandon Kimespecially if the polling rate is set too high.
1360b7780b5SBrandon Kim
1370b7780b5SBrandon Kim### Organizational
1380b7780b5SBrandon Kim
1390b7780b5SBrandon KimThis design will require 2 repositories:
140*f4febd00SPatrick Williams
1410b7780b5SBrandon Kim- bios-bmc-smm-error-logger
1420b7780b5SBrandon Kim  - This repository will implement the daemon described in this document
1430b7780b5SBrandon Kim  - Proposed maintainer: wltu@google.com , brandonkim@google.com
1440b7780b5SBrandon Kim- libbej
1450b7780b5SBrandon Kim  - This repository will follow the PLDM RDE specification as much as possible
1460b7780b5SBrandon Kim    for RDE BEJ decoding (initially, encoding may come in the future) and will
1470b7780b5SBrandon Kim    host a library written in C
1480b7780b5SBrandon Kim  - Proposed maintainer: wltu@google.com , brandonkim@google.com
1490b7780b5SBrandon Kim
1500b7780b5SBrandon Kim## Testing
1510b7780b5SBrandon Kim
1520b7780b5SBrandon KimUnit tests will cover each parts of the daemon, mainly:
1530b7780b5SBrandon Kim
154*f4febd00SPatrick Williams- Initialization
155*f4febd00SPatrick Williams- Circular buffer processing
156*f4febd00SPatrick Williams- Decoding / Processing library
157