1# Intel IPMI Platform Events parsing 2In many cases Manufacturers-specific IPMI Platfrom Events are stored in binary 3form in System Event Log making it very difficult to easily understand platfrom 4state. This document specifies a solution for presenting Manufacturer Spcific 5IPMI Platform Events in a human readable form by defining a generic framework 6for parsing and defining new messages in an easy and scallable way. 7Example of events originating from Intel Management Engine (ME) is used as 8a case-study. General design of the solution is followed by tailored-down 9implementation for OpenBMC described in detail. 10 11## Glossary 12- **IPMI** - Intelligent Platform Management Interface; standarized binary 13 protocol of communication between endpoints in datacenter `[1]` 14- **Platform Event** - specific type of IPMI binary payload, used for encoding 15 and sending asynchronous one-way messages to recipient `[1]-29.3` 16- **ME** - Intel Management Engine, autonomous subsystem used for remote 17 datacenter management `[5]` 18- **Redfish** - modern datacenter management protocol, built around REST 19 protocol and JSON format `[2]` 20- **OpenBMC** - open-source BMC implementation with Redfish-oriented 21 interface `[3]` 22 23 24## Problem statement 25IPMI is designed to be a compact and efficient binary format of data exchanged 26between entities in data-center. Recipient is responsible to receive data, 27properly analyze, parse and translate the binary representation to 28human-readable format. IPMI Platform Events is one type of these messages, 29used to inform recipient about occurence of a particular well defined situation. 30 31Part of IPMI Platform Events are standarized and described in the specification 32and already have an open-source implementation ready `[6]`, however this is only 33part of the spectrum. Increasing complexity of datacenter systems have multipled 34possible sources of events which are defined by manufacturer-specirfic 35extenstions to platform event data. One of these sources is Intel ME, which 36is able to deliver information about its own state of operation and in some 37cases notify about certain erroneous system-wide conditions, like interface 38errors. 39 40These OEM-specific messages lacks support in existing open-source 41implementations. They require manual, documentation-based `[5]` implementation, 42which is historically the source of many interpretation errors. Any document 43update requires manual code modification according to specific changes which is 44not efficient nor scalable. Furthermore - documentation is not always 45clear on event severity or possible resolution actions. 46 47## Solution 48Generic OEM-agnostic algorithm is proposed to achieve human-readable output 49for binary IPMI Platform Event. 50 51In general, each event consists of predefined payload: 52```ascii 53[GeneratorID][SensorNumber][EventType][EventData[2]] 54``` 55where: 56- `GeneratorID` - used to determine source of the event, 57- `SensorNumber` - generator-specific unique sensor number, 58- `EventType` - sensor-specific group of events, 59- `EventData` - array with detailed event data. 60 61One might observe, that each consecutive event field is narrowing down the 62domain of event interpretations, starting with `GeneratorID` at the top, ending 63with `EventData` at the end of a `decision tree`. Software should be able 64to determine meaning of the event by using the `divide and conquer` approach 65for predefined list of well known event definitions. One should notice the fact, 66that such decision tree might be also needed for breakdown of `EventData`, 67as in many OEM-specific IPMI implementations that is also the case. 68 69Implementation should be therefore a series of filters with increasing 70specialization on each level. Recursive algorithm for this will look like 71the following: 72```ascii 73 +-------------+ +*Step 1* + 74 | +---------+ | | | 75 | |Currently| | |Analyze and choose | 76+----> |analyzed +------------>+proper 'subtree' parser| 77| | |chunk | | | | 78| | +---------+ | + + +---------+ 79| | +---------+ | |Remainder| 80| | |Remainder| | | | 81| | | | | +*Step 2* + | | 82| | | | | | | | | 83| | | +---------------------------------------------->+ +---+ 84| | | | | |'Cut' the remainder | | | | 85| | | | | |and go back to Step 1 | | | | 86| | | | | + + | | | 87| | +---------+ | | | | 88| +-------------+ +---------+ | 89| | 90| | 91+------------------------------------------------------------------------------+ 92``` 93Described process will be repeated until there is nothing to break-down and 94singular unique event interpretation will be determined (an `EventId`). 95 96Not all event data is a decision point - certain chunks of data should be kept 97as-is or formatted in certain way, to be introduced in human-readable `Message`. 98Parser operation should also include a logic for extracting `Parameters` during the traversal process. 99 100Effectively, both `EventId` and an optional collection of `Parameters` should be 101then used as input for lookup mechanic to generate final `Event Message`. 102Each message consists of following entries: 103- `EventId` - associated unique event, 104- `Severity` - determines how severely this particular event might affect usual 105 datacenter operation, 106- `Resolution` - suggested steps to mitigate possible problem, 107- `Message` - human-readable message, possibly with predefined placeholders for 108 `Parameters`. 109 110### Example 111Example of such message parsing process is shown below: 112```ascii 113 +-------------+ 114 |[GeneratorId]| 115 |0x2C (ME) | 116 +------+------+ 117 | 118 +------v---------+ 119 |[SensorNumber] | 120. . . . |0x17 (ME Health)| 121 +------+---------+ 122 | 123 +------v---------+ 124 |[EventType] | 125. . . . |0x00 (FW Status)| 126 +------+---------+ 127 | 128 +------v-------------------+ 129 |[EventData[0]] | +-------------------------------------------+ 130. . . . |0x0A (FlashWearoutWarning)+------+ |ParsedEvent| | 131 +------+-------------------+ | +-----------+ | 132 | +---->'EventId' = FlashWearoutWarning | 133 +------v----------+ +---->'Parameters' = [ toDecimal(EventData[1]) ] | 134 |[EventData[1]] | | | | 135 |0x## (Percentage)+---------------+ +-------------------------------------------+ 136 +-----------------+ 137``` 138, determined `ParsedEvent` might be then passed to lookup mechanism, 139which contains human-readable information for each `EventId`: 140```ascii 141+------------------------------------------------+ 142|+------------------------------------------------+ 143||+------------------------------------------------+ 144||| EventId: FlashWearoutWarning | 145||| Severity: Warning | 146||| Resolution: No immediate repair action needed | 147||| Message: Warning threshold for number of flash | 148||| operations has been exceeded. Current | 149||| percentage of write operations | 150+|| capacity: %1 | 151 +| | 152 +------------------------------------------------+ 153 154``` 155 156## Solution in OpenBMC 157Proposed algorithm is delivered as part of open-source OpenBMC project `[3]`. 158As this software stack is built with micro-service architecture in mind, 159the implementation had to be divided into multiple parts: 160- IPMI Platform Event payload unpacking (`[7]`) 161 - `openbmc/intel-ipmi-oem/src/sensorcommands.cpp` 162 - `openbmc/intel-ipmi-oem/src/ipmi_to_redfish_hooks.cpp` 163- Intel ME event parsing 164 - `openbmc/intel-ipmi-oem/src/me_to_redfish_hooks.cpp` 165- Detected events storage (`[4]`) 166 - `systemd journal` 167- Human-readable message lookup (`[2], [8]`) 168 - `MessageRegistry in bmcweb` 169 - `openbmc/bmcweb/redfish-core/include/registries/openbmc_message_registry.hpp` 170 171### OpenBMC flow 172#### Event arrival 1731. IPMI driver notifies `intel-ipmi-oem` about incoming `Platform Event` 174 (NetFn=0x4, Cmd=0x2) 175 - Proper command handler in `intel-ipmi-oem/src/sensorcommands.cpp` 176 is notified 1772. Message is forwarded to `intel-ipmi-oem/src/ipmi_to_redfish_hooks.cpp` 178 as call to `sel::checkRedfishHooks` 179 - `sel::checkRedfishHooks` analyzes the data, `BIOS` events are handled 180 in-place, while `ME` events are delegated to `intel-ipmi-oem/src/me_to_redfish_hooks.cpp` 1813. `me::messageHook` is called with the payload. Parsing algorithm 182 determines final `EventId` and `Parameters` 183 - `me::utils::storeRedfishEvent(EventId, Parameters)` is called, 184 it stores event securely in `system journal` 185 186#### Platform Event payload parsing 187Each IPMI Platform Event is parsed using aforementioned `me::messageHook` 188handler. Implementation of the proposed algorithm is the following: 189 190##### 1. Determine EventType 191Based on `EventType` proper designated handler is called. 192```cpp 193namespace me { 194static bool messageHook(const SELData& selData, std::string& eventId, 195 std::vector<std::string>& parameters) 196{ 197 const HealthEventType healthEventType = 198 static_cast<HealthEventType>(selData.offset); 199 200 switch (healthEventType) 201 { 202 case HealthEventType::FirmwareStatus: 203 return fw_status::messageHook(selData, eventId, parameters); 204 break; 205 206 case HealthEventType::SmbusLinkFailure: 207 return smbus_failure::messageHook(selData, eventId, parameters); 208 break; 209 } 210 return false; 211} 212} 213``` 214##### 2. Call designated handler 215Example of handler for `FirmwareStatus`, tailored down to essential distinctive 216use cases: 217```cpp 218namespace fw_status { 219static bool messageHook(const SELData& selData, std::string& eventId, 220 std::vector<std::string>& parameters) 221{ 222 // Maps EventData[0] to either a resolution or further action 223 static const boost::container::flat_map< 224 uint8_t, 225 std::pair<std::string, std::optional<std::variant<utils::ParserFunc, 226 utils::MessageMap>>>> 227 eventMap = { 228 // EventData[0]=0 229 // > MessageId=MERecoveryGpioForced 230 {0x00, {"MERecoveryGpioForced", {}}}, 231 232 // EventData[0]=3 233 // > call specific handler do determine MessageId and Parameters 234 {0x03, {{}, flash_state::messageHook}}, 235 236 // EventData[0]=7 237 // > MessageId=MEManufacturingError 238 // > Use manufacturingError map to translate EventData[1] to string 239 // and add it to Parameters collection 240 {0x07, {"MEManufacturingError", manufacturingError}}, 241 242 // EventData[0]=9 243 // > MessageId=MEFirmwareException 244 // > Use a function to log specified byte of payload as Parameter 245 // in chosen format. Here it stores 2-nd byte in hex format. 246 {0x09, {"MEFirmwareException", utils::logByteHex<2>}} 247 248 return utils::genericMessageHook(eventMap, selData, eventId, parameters); 249} 250 251// Maps EventData[1] to specified message 252static const boost::container::flat_map<uint8_t, std::string> 253 manufacturingError = { 254 {0x00, "Generic error"}, 255 {0x01, "Wrong or missing VSCC table"}}}; 256} 257``` 258 259##### 3. Store parsed log in system 260Cascading calls of functions, logging utilities and map resolutions are 261resulting in populating both `std::string& eventId` and 262`std::vector<std::string>& parameters`. This data is then used to form a valid 263system log and stored in system journal. 264 265#### Event data listing 266Event data is accessible as `Redfish` resources in two places: 267- `MessageRegistry` - stores all event 'metadata' 268 (severity, resolution notes, messageId) 269- `EventLog` - lists all detected events in the system in processed, 270 human-readable form 271 272##### MessageRegistry 273Implementation of `bmcweb` [MessageRegistry](http://redfish.dmtf.org/schemas/v1/MessageRegistry.json) 274contents can be found at `openbmc/bmcweb/redfish-core/include/registries/openbmc_message_registry.hpp`. 275 276**Intel-specific events have proper prefix in MessageId: either 'BIOS' or 'ME'.** 277 278It can be read by the user by calling `GET` on Redfish resource: 279`/redfish/v1/Registries/OpenBMC/OpenBMC`. It contains JSON array of entries 280in standard Redfish format, like so: 281```json 282"MEFlashWearOutWarning": { 283 "Description": "Indicates that Intel ME has reached certain threshold of flash write operations.", 284 "Message": "Warning threshold for number of flash operations has been exceeded. Current percentage of write operations capacity: %1", 285 "NumberOfArgs": 1, 286 "ParamTypes": [ 287 "number" 288 ], 289 "Resolution": "No immediate repair action needed.", 290 "Severity": "Warning" 291} 292``` 293 294##### EventLog 295System-wide [EventLog](http://redfish.dmtf.org/schemas/v1/LogService.json) 296is implemented in `bmcweb` at `openbmc/bmcweb/redfish-core/lib/log_services.hpp`. 297 298It can be read by the user by calling `GET` on Redfish resource: 299`/redfish/v1/Systems/system/LogServices/EventLog`. It contains JSON array 300of log entries in standard Redfish format, like so: 301```json 302{ 303 "@odata.id": "/redfish/v1/Systems/system/LogServices/EventLog/Entries/37331", 304 "@odata.type": "#LogEntry.v1_4_0.LogEntry", 305 "Created": "1970-01-01T10:22:11+00:00", 306 "EntryType": "Event", 307 "Id": "37331", 308 "Message": "Warning threshold for number of flash operations has been exceeded. Current percentage of write operations capacity: 50", 309 "MessageArgs": [ 310 "50" 311 ], 312 "MessageId": "OpenBMC.0.1.MEFlashWearOutWarning", 313 "Name": "System Event Log Entry", 314 "Severity": "Warning" 315} 316``` 317 318## References 3191. [IPMI Specification v2.0](https://www.intel.pl/content/www/pl/pl/products/docs/servers/ipmi/ipmi-second-gen-interface-spec-v2-rev1-1.html) 3202. [DMTF Redfish Schema Guide](https://www.dmtf.org/sites/default/files/standards/documents/DSP2046_2019.3.pdf) 3213. [OpenBMC](https://github.com/openbmc) 3224. [OpenBMC Redfish Event logging](https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md) 3235. [Intel ME External Interfaces Specification](https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/intel-power-node-manager-v3-spec.pdf) 3246. [ipmitool](https://github.com/ipmitool/ipmitool) 3257. [OpenBMC Intel IPMI support](https://github.com/openbmc/intel-ipmi-oem) 3268. [OpenBMC BMCWeb](https://github.com/openbmc/bmcweb) 327