1# Intel IPMI Platform Events parsing 2 3In many cases Manufacturers-specific IPMI Platfrom Events are stored in binary 4form in System Event Log making it very difficult to easily understand platfrom 5state. This document specifies a solution for presenting Manufacturer Spcific 6IPMI Platform Events in a human readable form by defining a generic framework 7for parsing and defining new messages in an easy and scallable way. Example of 8events originating from Intel Management Engine (ME) is used as a case-study. 9General design of the solution is followed by tailored-down implementation for 10OpenBMC described in detail. 11 12## Glossary 13 14- **IPMI** - Intelligent Platform Management Interface; standarized binary 15 protocol of communication between endpoints in datacenter `[1]` 16- **Platform Event** - specific type of IPMI binary payload, used for encoding 17 and sending asynchronous one-way messages to recipient `[1]-29.3` 18- **ME** - Intel Management Engine, autonomous subsystem used for remote 19 datacenter management `[5]` 20- **Redfish** - modern datacenter management protocol, built around REST 21 protocol and JSON format `[2]` 22- **OpenBMC** - open-source BMC implementation with Redfish-oriented interface 23 `[3]` 24 25## Problem statement 26 27IPMI is designed to be a compact and efficient binary format of data exchanged 28between entities in data-center. Recipient is responsible to receive data, 29properly analyze, parse and translate the binary representation to 30human-readable format. IPMI Platform Events is one type of these messages, used 31to inform recipient about occurence of a particular well defined situation. 32 33Part of IPMI Platform Events are standarized and described in the specification 34and already have an open-source implementation ready `[6]`, however this is only 35part of the spectrum. Increasing complexity of datacenter systems have multipled 36possible sources of events which are defined by manufacturer-specirfic 37extenstions to platform event data. One of these sources is Intel ME, which is 38able to deliver information about its own state of operation and in some cases 39notify about certain erroneous system-wide conditions, like interface errors. 40 41These OEM-specific messages lacks support in existing open-source 42implementations. They require manual, documentation-based `[5]` implementation, 43which is historically the source of many interpretation errors. Any document 44update requires manual code modification according to specific changes which is 45not efficient nor scalable. Furthermore - documentation is not always clear on 46event severity or possible resolution actions. 47 48## Solution 49 50Generic OEM-agnostic algorithm is proposed to achieve human-readable output for 51binary IPMI Platform Event. 52 53In general, each event consists of predefined payload: 54 55```ascii 56[GeneratorID][SensorNumber][EventType][EventData[2]] 57``` 58 59where: 60 61- `GeneratorID` - used to determine source of the event, 62- `SensorNumber` - generator-specific unique sensor number, 63- `EventType` - sensor-specific group of events, 64- `EventData` - array with detailed event data. 65 66One might observe, that each consecutive event field is narrowing down the 67domain of event interpretations, starting with `GeneratorID` at the top, ending 68with `EventData` at the end of a `decision tree`. Software should be able to 69determine meaning of the event by using the `divide and conquer` approach for 70predefined list of well known event definitions. One should notice the fact, 71that such decision tree might be also needed for breakdown of `EventData`, as in 72many OEM-specific IPMI implementations that is also the case. 73 74Implementation should be therefore a series of filters with increasing 75specialization on each level. Recursive algorithm for this will look like the 76following: 77 78```ascii 79 +-------------+ +*Step 1* + 80 | +---------+ | | | 81 | |Currently| | |Analyze and choose | 82+----> |analyzed +------------>+proper 'subtree' parser| 83| | |chunk | | | | 84| | +---------+ | + + +---------+ 85| | +---------+ | |Remainder| 86| | |Remainder| | | | 87| | | | | +*Step 2* + | | 88| | | | | | | | | 89| | | +---------------------------------------------->+ +---+ 90| | | | | |'Cut' the remainder | | | | 91| | | | | |and go back to Step 1 | | | | 92| | | | | + + | | | 93| | +---------+ | | | | 94| +-------------+ +---------+ | 95| | 96| | 97+------------------------------------------------------------------------------+ 98``` 99 100Described process will be repeated until there is nothing to break-down and 101singular unique event interpretation will be determined (an `EventId`). 102 103Not all event data is a decision point - certain chunks of data should be kept 104as-is or formatted in certain way, to be introduced in human-readable `Message`. 105Parser operation should also include a logic for extracting `Parameters` during 106the traversal process. 107 108Effectively, both `EventId` and an optional collection of `Parameters` should be 109then used as input for lookup mechanic to generate final `Event Message`. Each 110message consists of following entries: 111 112- `EventId` - associated unique event, 113- `Severity` - determines how severely this particular event might affect usual 114 datacenter operation, 115- `Resolution` - suggested steps to mitigate possible problem, 116- `Message` - human-readable message, possibly with predefined placeholders for 117 `Parameters`. 118 119### Example 120 121Example of such message parsing process is shown below: 122 123```ascii 124 +-------------+ 125 |[GeneratorId]| 126 |0x2C (ME) | 127 +------+------+ 128 | 129 +------v---------+ 130 |[SensorNumber] | 131. . . . |0x17 (ME Health)| 132 +------+---------+ 133 | 134 +------v---------+ 135 |[EventType] | 136. . . . |0x00 (FW Status)| 137 +------+---------+ 138 | 139 +------v-------------------+ 140 |[EventData[0]] | +-------------------------------------------+ 141. . . . |0x0A (FlashWearoutWarning)+------+ |ParsedEvent| | 142 +------+-------------------+ | +-----------+ | 143 | +---->'EventId' = FlashWearoutWarning | 144 +------v----------+ +---->'Parameters' = [ toDecimal(EventData[1]) ] | 145 |[EventData[1]] | | | | 146 |0x## (Percentage)+---------------+ +-------------------------------------------+ 147 +-----------------+ 148``` 149 150, determined `ParsedEvent` might be then passed to lookup mechanism, which 151contains human-readable information for each `EventId`: 152 153```ascii 154+------------------------------------------------+ 155|+------------------------------------------------+ 156||+------------------------------------------------+ 157||| EventId: FlashWearoutWarning | 158||| Severity: Warning | 159||| Resolution: No immediate repair action needed | 160||| Message: Warning threshold for number of flash | 161||| operations has been exceeded. Current | 162||| percentage of write operations | 163+|| capacity: %1 | 164 +| | 165 +------------------------------------------------+ 166 167``` 168 169## Solution in OpenBMC 170 171Proposed algorithm is delivered as part of open-source OpenBMC project `[3]`. As 172this software stack is built with micro-service architecture in mind, the 173implementation had to be divided into multiple parts: 174 175- IPMI Platform Event payload unpacking (`[7]`) 176 - `openbmc/intel-ipmi-oem/src/sensorcommands.cpp` 177 - `openbmc/intel-ipmi-oem/src/ipmi_to_redfish_hooks.cpp` 178- Intel ME event parsing 179 - `openbmc/intel-ipmi-oem/src/me_to_redfish_hooks.cpp` 180- Detected events storage (`[4]`) 181 - `systemd journal` 182- Human-readable message lookup (`[2], [8]`) 183 - `MessageRegistry in bmcweb` 184 - `openbmc/bmcweb/redfish-core/include/registries/openbmc_message_registry.hpp` 185 186### OpenBMC flow 187 188#### Event arrival 189 1901. IPMI driver notifies `intel-ipmi-oem` about incoming `Platform Event` 191 (NetFn=0x4, Cmd=0x2) 192 - Proper command handler in `intel-ipmi-oem/src/sensorcommands.cpp` is 193 notified 1942. Message is forwarded to `intel-ipmi-oem/src/ipmi_to_redfish_hooks.cpp` as 195 call to `sel::checkRedfishHooks` 196 - `sel::checkRedfishHooks` analyzes the data, `BIOS` events are handled 197 in-place, while `ME` events are delegated to 198 `intel-ipmi-oem/src/me_to_redfish_hooks.cpp` 1993. `me::messageHook` is called with the payload. Parsing algorithm determines 200 final `EventId` and `Parameters` 201 - `me::utils::storeRedfishEvent(EventId, Parameters)` is called, it stores 202 event securely in `system journal` 203 204#### Platform Event payload parsing 205 206Each IPMI Platform Event is parsed using aforementioned `me::messageHook` 207handler. Implementation of the proposed algorithm is the following: 208 209##### 1. Determine EventType 210 211Based on `EventType` proper designated handler is called. 212 213```cpp 214namespace me { 215static bool messageHook(const SELData& selData, std::string& eventId, 216 std::vector<std::string>& parameters) 217{ 218 const HealthEventType healthEventType = 219 static_cast<HealthEventType>(selData.offset); 220 221 switch (healthEventType) 222 { 223 case HealthEventType::FirmwareStatus: 224 return fw_status::messageHook(selData, eventId, parameters); 225 break; 226 227 case HealthEventType::SmbusLinkFailure: 228 return smbus_failure::messageHook(selData, eventId, parameters); 229 break; 230 } 231 return false; 232} 233} 234``` 235 236##### 2. Call designated handler 237 238Example of handler for `FirmwareStatus`, tailored down to essential distinctive 239use cases: 240 241```cpp 242namespace fw_status { 243static bool messageHook(const SELData& selData, std::string& eventId, 244 std::vector<std::string>& parameters) 245{ 246 // Maps EventData[0] to either a resolution or further action 247 static const boost::container::flat_map< 248 uint8_t, 249 std::pair<std::string, std::optional<std::variant<utils::ParserFunc, 250 utils::MessageMap>>>> 251 eventMap = { 252 // EventData[0]=0 253 // > MessageId=MERecoveryGpioForced 254 {0x00, {"MERecoveryGpioForced", {}}}, 255 256 // EventData[0]=3 257 // > call specific handler do determine MessageId and Parameters 258 {0x03, {{}, flash_state::messageHook}}, 259 260 // EventData[0]=7 261 // > MessageId=MEManufacturingError 262 // > Use manufacturingError map to translate EventData[1] to string 263 // and add it to Parameters collection 264 {0x07, {"MEManufacturingError", manufacturingError}}, 265 266 // EventData[0]=9 267 // > MessageId=MEFirmwareException 268 // > Use a function to log specified byte of payload as Parameter 269 // in chosen format. Here it stores 2-nd byte in hex format. 270 {0x09, {"MEFirmwareException", utils::logByteHex<2>}} 271 272 return utils::genericMessageHook(eventMap, selData, eventId, parameters); 273} 274 275// Maps EventData[1] to specified message 276static const boost::container::flat_map<uint8_t, std::string> 277 manufacturingError = { 278 {0x00, "Generic error"}, 279 {0x01, "Wrong or missing VSCC table"}}}; 280} 281``` 282 283##### 3. Store parsed log in system 284 285Cascading calls of functions, logging utilities and map resolutions are 286resulting in populating both `std::string& eventId` and 287`std::vector<std::string>& parameters`. This data is then used to form a valid 288system log and stored in system journal. 289 290#### Event data listing 291 292Event data is accessible as `Redfish` resources in two places: 293 294- `MessageRegistry` - stores all event 'metadata' (severity, resolution notes, 295 messageId) 296- `EventLog` - lists all detected events in the system in processed, 297 human-readable form 298 299##### MessageRegistry 300 301Implementation of `bmcweb` 302[MessageRegistry](http://redfish.dmtf.org/schemas/v1/MessageRegistry.json) 303contents can be found at 304`openbmc/bmcweb/redfish-core/include/registries/openbmc_message_registry.hpp`. 305 306**Intel-specific events have proper prefix in MessageId: either 'BIOS' or 307'ME'.** 308 309It can be read by the user by calling `GET` on Redfish resource: 310`/redfish/v1/Registries/OpenBMC/OpenBMC`. It contains JSON array of entries in 311standard Redfish format, like so: 312 313```json 314"MEFlashWearOutWarning": { 315 "Description": "Indicates that Intel ME has reached certain threshold of flash write operations.", 316 "Message": "Warning threshold for number of flash operations has been exceeded. Current percentage of write operations capacity: %1", 317 "NumberOfArgs": 1, 318 "ParamTypes": [ 319 "number" 320 ], 321 "Resolution": "No immediate repair action needed.", 322 "Severity": "Warning" 323} 324``` 325 326##### EventLog 327 328System-wide [EventLog](http://redfish.dmtf.org/schemas/v1/LogService.json) is 329implemented in `bmcweb` at `openbmc/bmcweb/redfish-core/lib/log_services.hpp`. 330 331It can be read by the user by calling `GET` on Redfish resource: 332`/redfish/v1/Systems/system/LogServices/EventLog`. It contains JSON array of log 333entries in standard Redfish format, like so: 334 335```json 336{ 337 "@odata.id": "/redfish/v1/Systems/system/LogServices/EventLog/Entries/37331", 338 "@odata.type": "#LogEntry.v1_4_0.LogEntry", 339 "Created": "1970-01-01T10:22:11+00:00", 340 "EntryType": "Event", 341 "Id": "37331", 342 "Message": "Warning threshold for number of flash operations has been exceeded. Current percentage of write operations capacity: 50", 343 "MessageArgs": ["50"], 344 "MessageId": "OpenBMC.0.1.MEFlashWearOutWarning", 345 "Name": "System Event Log Entry", 346 "Severity": "Warning" 347} 348``` 349 350## References 351 3521. [IPMI Specification v2.0](https://www.intel.pl/content/www/pl/pl/products/docs/servers/ipmi/ipmi-second-gen-interface-spec-v2-rev1-1.html) 3532. [DMTF Redfish Schema Guide](https://www.dmtf.org/sites/default/files/standards/documents/DSP2046_2019.3.pdf) 3543. [OpenBMC](https://github.com/openbmc) 3554. [OpenBMC Redfish Event logging](https://github.com/openbmc/docs/blob/master/architecture/redfish-logging-in-bmcweb.md) 3565. [Intel ME External Interfaces Specification](https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/intel-power-node-manager-v3-spec.pdf) 3576. [ipmitool](https://github.com/ipmitool/ipmitool) 3587. [OpenBMC Intel IPMI support](https://github.com/openbmc/intel-ipmi-oem) 3598. [OpenBMC BMCWeb](https://github.com/openbmc/bmcweb) 360