1# PLDM stack on OpenBMC 2 3Author: Deepak Kodihalli <dkodihal@linux.vnet.ibm.com> <dkodihal> 4 5Created: 2019-01-22 6 7## Problem Description 8 9On OpenBMC, in-band IPMI is currently the primary industry-standard means of 10communication between the BMC and the Host firmware. We've started hitting some 11inherent limitations of IPMI on OpenPOWER servers: a limited number of sensors, 12and a lack of a generic control mechanism (sensors are a generic monitoring 13mechanism) are the major ones. There is a need to improve upon the communication 14protocol, but at the same time inventing a custom protocol is undesirable. 15 16This design aims to employ Platform Level Data Model (PLDM), a standard 17application layer communication protocol defined by the DMTF. PLDM draws inputs 18from IPMI, but it overcomes most of the latter's limitations. PLDM is also 19designed to run on standard transport protocols, for e.g. MCTP (also designed by 20the DMTF). MCTP provides for a common transport layer over several physical 21channels, by defining hardware bindings. The solution of PLDM over MCTP also 22helps overcome some of the limitations of the hardware channels that IPMI uses. 23 24PLDM's purpose is to enable all sorts of "inside the box communication": BMC - 25Host, BMC - BMC, BMC - Network Controller and BMC - Other (for e.g. sensor) 26devices. 27 28## Background and References 29 30PLDM is designed to be an effective interface and data model that provides 31efficient access to low-level platform inventory, monitoring, control, event, 32and data/parameters transfer functions. For example, temperature, voltage, or 33fan sensors can have a PLDM representation that can be used to monitor and 34control the platform using a set of PLDM messages. PLDM defines data 35representations and commands that abstract the platform management hardware. 36 37PLDM groups commands under broader functions, and defines separate 38specifications for each of these functions (also called PLDM "Types"). The 39currently defined Types (and corresponding specs) are : PLDM base (with 40associated IDs and states specs), BIOS, FRU, Platform monitoring and control, 41Firmware Update and SMBIOS. All these specifications are available at: 42 43https://www.dmtf.org/standards/pmci 44 45Some of the reasons PLDM sounds promising (some of these are advantages over 46IPMI): 47 48- Common in-band communication protocol. 49 50- Already existing PLDM Type specifications that cover the most common 51 communication requirements. Up to 64 PLDM Types can be defined (the last one 52 is OEM). At the moment, 6 are defined. Each PLDM type can house up to 256 PLDM 53 commands. 54 55- PLDM sensors are 2 bytes in length. 56 57- PLDM introduces the concept of effecters - a control mechanism. Both sensors 58 and effecters are associated to entities (similar to IPMI, entities can be 59 physical or logical), where sensors are a mechanism for monitoring and 60 effecters are a mechanism for control. Effecters can be numeric or state 61 based. PLDM defines commonly used entities and their IDs, but there 8K slots 62 available to define OEM entities. 63 64- A very active PLDM related working group in the DMTF. 65 66The plan is to run PLDM over MCTP. MCTP is defined in a spec of its own, and a 67proposal on the MCTP design is in discussion already. There's going to be an 68intermediate PLDM over MCTP binding layer, which lets us send PLDM messages over 69MCTP. This is defined in a spec of its own, and the design for this binding will 70be proposed separately. 71 72## Requirements 73 74How different BMC applications make use of PLDM messages is outside the scope of 75this requirements doc. The requirements listed here are related to the PLDM 76protocol stack and the request/response model: 77 78- Marshalling and unmarshalling of PLDM messages, defined in various PLDM Type 79 specs, must be implemented. This can of course be staged based on the need of 80 specific Types and functions. Since this is just encoding and decoding PLDM 81 messages, this can be a library that could shared between the BMC, and other 82 firmware stacks. The specifics of each PLDM Type (such as FRU table 83 structures, sensor PDR structures, etc) are implemented by this lib. 84 85- Mapping PLDM concepts to native OpenBMC concepts must be implemented. For 86 e.g.: mapping PLDM sensors to phosphor-hwmon hosted D-Bus objects, mapping 87 PLDM FRU data to D-Bus objects hosted by phosphor-inventory-manager, etc. The 88 mapping shouldn't be restrictive to D-Bus alone (meaning it shouldn't be 89 necessary to put objects on the Bus just to serve PLDM requests, a problem 90 that exists with phosphor-host-ipmid today). Essentially these are platform 91 specific PLDM message handlers. 92 93- The BMC should be able to act as a PLDM responder as well as a PLDM requester. 94 As a PLDM requester, the BMC can monitor/control other devices. As a PLDM 95 responder, the BMC can react to PLDM messages directed to it via requesters in 96 the platform. 97 98- As a PLDM requester, the BMC must be able to discover other PLDM enabled 99 components in the platform. 100 101- As a PLDM requester, the BMC must be able to send simultaneous messages to 102 different responders. 103 104- As a PLDM requester, the BMC must be able to handle out of order responses. 105 106- As a PLDM responder, the BMC may simultaneously respond to messages from 107 different requesters, but the spec doesn't mandate this. In other words the 108 responder could be single-threaded. 109 110- It should be possible to plug-in OEM PLDM types/functions into the PLDM stack. 111 112## Proposed Design 113 114This document covers the architectural, interface, and design details. It 115provides recommendations for implementations, but implementation details are 116outside the scope of this document. 117 118The design aims at having a single PLDM daemon serve both the requester and 119responder functions, and having transport specific endpoints to communicate on 120different channels. 121 122The design enables concurrency aspects of the requester and responder functions, 123but the goal is to employ asynchronous IO and event loops, instead of multiple 124threads, wherever possible. 125 126The following are high level structural elements of the design: 127 128### PLDM encode/decode libraries 129 130This library would take a PLDM message, decode it and extract the different 131fields of the message. Conversely, given a PLDM Type, command code, and the 132command's data fields, it would make a PLDM message. The thought is to design 133this as a common library, that can be used by the BMC and other firmware stacks, 134because it's the encode/decode and protocol piece (and not the handling of a 135message). 136 137### PLDM provider libraries 138 139These libraries would implement the platform specific handling of incoming PLDM 140requests (basically helping with the PLDM responder implementation, see next 141bullet point), so for instance they would query D-Bus objects (or even something 142like a JSON file) to fetch platform specific information to respond to the PLDM 143message. They would link with the encode/decode lib. 144 145It should be possible to plug-in a provider library, that lets someone add 146functionality for new PLDM (standard as well as OEM) Types. The libraries would 147implement a "register" API to plug-in handlers for specific PLDM messages. 148Something like: 149 150template <typename Handler, typename... args> auto register(uint8_t type, 151uint8_t command, Handler handler); 152 153This allows for providing a strongly-typed C++ handler registration scheme. It 154would also be possible to validate the parameters passed to the handler at 155compile time. 156 157### Request/Response Model 158 159The PLDM daemon links with the encode/decode and provider libs. The daemon would 160have to implement the following functions: 161 162#### Receiver/Responder 163 164The receiver wakes up on getting notified of incoming PLDM messages (via D-Bus 165signal or callback from the transport layer) from a remote PLDM device. If the 166message type is "Request" it would route them to a PLDM provider library. Via 167the library, asynchronous D-Bus calls (using sdbusplus-asio) would be made, so 168that the receiver can register a handler for the D-Bus response, instead of 169having to wait for the D-Bus response. This way it can go back to listening for 170incoming PLDM messages. 171 172In the D-Bus response handler, the receiver will send out the PLDM response 173message via the transport's send message API. If the transport's send message 174API blocks for a considerably long duration, then it would have to be run in a 175thread of it's own. 176 177If the incoming PLDM message is of type "Response", then the receiver emits a 178D-Bus signal pointing to the response message. Any time the message is too large 179to fit in a D-Bus payload, the message is written to a file, and a read-only 180file descriptor pointing to that file is contained in the D-Bus signal. 181 182#### Requester 183 184Designing the BMC as a PLDM requester is interesting. We haven't had this with 185IPMI, because the BMC was typically an IPMI server. PLDM requester functions 186will be spread across multiple OpenBMC applications (instead of a single big 187requester app) - based on the responder they're talking to and the high level 188function they implement. For example, there could be an app that lets the BMC 189upgrade firmware for other devices using PLDM - this would be a generic app in 190the sense that the same set of commands might have to be run irrespective of the 191device on the other side. There could also be an app that does fan control on a 192remote device, based on sensors from that device and algorithms specific to that 193device. 194 195##### Proposed requester design 196 197A requester app/flow comprises of the following : 198 199- Linkage with a PLDM encode/decode library, to be able to pack PLDM requests 200 and unpack PLDM responses. 201 202- A D-Bus API to generate a unique PLDM instance id. The id needs to be unique 203 across all outgoing PLDM messages (from potentially different processes). This 204 needs to be on D-Bus because the id needs to be unique across PLDM requester 205 app processes. 206 207- A requester client API that provides blocking and non-blocking functions to 208 transfer a PLDM request message and to receive the corresponding response 209 message, over MCTP (the blocking send() will return a PLDM response). This 210 will be a thin wrapper over the socket API provided by the mctp demux daemon. 211 This will provide APIs for common tasks so that the same may not be 212 re-implemented in each PLDM requester app. This set of API will be built into 213 the encode/decode library (so libpldm would house encode/decode APIs, and 214 based on a compile time flag, the requester APIs as well). A PLDM requester 215 app can choose to not use the client requester APIs, and instead can directly 216 talk to the MCTP demux daemon. 217 218##### Proposed requester design - flow diagrams 219 220a) With blocking API 221 222``` 223+---------------+ +----------------+ +----------------+ +-----------------+ 224|BMC requester/ | |PLDM requester | |PLDM responder | |PLDM Daemon | 225|client app | |lib (part of | | | | | 226| | |libpldm) | | | | | 227+-------+-------+ +-------+--------+ +--------+-------+ +---------+-------+ 228 | | | | 229 |App starts | | | 230 | | | | 231 +------------------------------->setup connection with | | 232 |init(non_block=false) |MCTP daemon | | 233 | | | | 234 +<-------+return_code+----------+ | | 235 | | | | 236 | | | | 237 | | | | 238 +------------------------------>+ | | 239 |encode_pldm_cmd(cmd code, args)| | | 240 | | | | 241 +<----+returns pldm_msg+--------+ | | 242 | | | | 243 | | | | 244 |----------------------------------------------------------------------------------------------->| 245 |DBus.getPLDMInstanceId() | | | 246 | | | | 247 |<-------------------------returns PLDM instance id----------------------------------------------| 248 | | | | 249 +------------------------------>+ | | 250 |send_msg(mctp_eids, pldm_msg) +----------------------------->+ | 251 | |write msg to MCTP socket | | 252 | +----------------------------->+ | 253 | |call blocking recv() on socket| | 254 | | | | 255 | +<-+returns pldm_response+-----+ | 256 | | | | 257 | +----+ | | 258 | | | verify eids, instance id| | 259 | +<---+ | | 260 | | | | 261 +<--+returns pldm_response+-----+ | | 262 | | | | 263 | | | | 264 | | | | 265 +------------------------------>+ | | 266 |decode_pldm_cmd(pldm_resp, | | | 267 | output args) | | | 268 | | | | 269 +------------------------------>+ | | 270 |close_connection() | | | 271 + + + + 272``` 273 274b) With non-blocking API 275 276``` 277+---------------+ +----------------+ +----------------+ +---------------+ 278|BMC requester/ | |PLDM requester | |PLDM responder | |PLDM daemon | 279|client app | |lib (part of | | | | | 280| | |libpldm) | | | | | 281+-------+-------+ +-------+--------+ +--------+-------+ +--------+------+ 282 | | | | 283 |App starts | | | 284 | | | | 285 +------------------------------->setup connection with | | 286 |init(non_block=true |MCTP daemon | | 287 | int* o_mctp_fd) | | | 288 | | | | 289 +<-------+return_code+----------+ | | 290 | | | | 291 | | | | 292 | | | | 293 +------------------------------>+ | | 294 |encode_pldm_cmd(cmd code, args)| | | 295 | | | | 296 +<----+returns pldm_msg+--------+ | | 297 | | | | 298 |-------------------------------------------------------------------------------------------->| 299 |DBus.getPLDMInstanceId() | | | 300 | | | | 301 |<-------------------------returns PLDM instance id-------------------------------------------| 302 | | | | 303 | | | | 304 +------------------------------>+ | | 305 |send_msg(eids, pldm_msg, +----------------------------->+ | 306 | non_block=true) |write msg to MCTP socket | | 307 | +<---+return_code+-------------+ | 308 +<-+returns rc, doesn't block+--+ | | 309 | | | | 310 +------+ | | | 311 | |Add EPOLLIN on mctp_fd | | | 312 | |to self.event_loop | | | 313 +<-----+ | | | 314 | + | | 315 +<----------------------+PLDM response msg written to mctp_fd+-+ | 316 | + | | 317 +------+EPOLLIN on mctp_fd | | | 318 | |received | | | 319 | | | | | 320 +<-----+ | | | 321 | | | | 322 +------------------------------>+ | | 323 |decode_pldm_cmd(pldm_response) | | | 324 | | | | 325 +------------------------------>+ | | 326 |close_connection() | | | 327 + + + + 328``` 329 330##### Alternative to the proposed requester design 331 332a) Define D-Bus interfaces to send and receive PLDM messages : 333 334``` 335method sendPLDM(uint8 mctp_eid, uint8 msg[]) 336 337signal recvPLDM(uint8 mctp_eid, uint8 pldm_instance_id, uint8 msg[]) 338``` 339 340PLDM requester apps can then invoke the above applications. While this 341simplifies things for the user, it has two disadvantages : 342 343- the app implementing such an interface could be a single point of failure, 344 plus sending messages concurrently would be a challenge. 345- the message payload could be large (several pages), and copying the same for 346 D-Bus transfers might be undesirable. 347 348### Multiple transport channels 349 350The PLDM daemon might have to talk to remote PLDM devices via different 351channels. While a level of abstraction might be provided by MCTP, the PLDM 352daemon would have to implement a D-Bus interface to target a specific transport 353channel, so that requester apps on the BMC can send messages over that 354transport. Also, it should be possible to plug-in platform specific D-Bus 355objects that implement an interface to target a platform specific transport. 356 357### Processing PLDM FRU information sent down by the host firmware 358 359Note: while this is specific to the host BMC communication, most of this might 360apply to processing PLDM FRU information received from a device connected to the 361BMC as well. 362 363The requirement is for the BMC to consume PLDM FRU information received from the 364host firmware and then have the same exposed via Redfish. An example can be the 365host firmware sending down processor and core information via PLDM FRU commands, 366and the BMC making this information available via the Processor and 367ProcessorCollection schemas. 368 369This design is built around the pldmd and entity-manager applications on the 370BMC: 371 372- The pldmd asks the host firmware's PLDM stack for the host's FRU record table, 373 by sending it the PLDM GetFRURecordTable command. The pldmd should send this 374 command if the host indicates support for the PLDM FRU spec. The pldmd 375 receives a PLDM FRU record table from the host firmware ( 376 www.dmtf.org/sites/default/files/standards/documents/DSP0257_1.0.0.pdf). The 377 daemon parses the FRU record table and hosts raw PLDM FRU information on 378 D-Bus. It will house the PLDM FRU properties for a certain FRU under an 379 xyz.openbmc_project.Inventory.Source.PLDM.FRU D-Bus interface, and house the 380 PLDM entity info extracted from the FRU record set PDR under an 381 xyz.openbmc_project.Source.PLDM.Entity interface. 382 383- Configurations can be written for entity-manager to probe an interface like 384 xyz.openbmc_project.Inventory.Source.PLDM.FRU, and create FRU inventory D-Bus 385 objects. Inventory interfaces from the xyz.openbmc_project. Inventory 386 namespace can be applied on these objects, by converting PLDM FRU property 387 values into xyz.openbmc_project.Invnetory.Decorator.Asset property values, 388 such as Part Number and Serial Number, in the entity manager configuration 389 file. Bmcweb can find these FRU inventory objects based on D-Bus interfaces, 390 as it does today. 391 392## Alternatives Considered 393 394Continue using IPMI, but start making more use of OEM extensions to suit the 395requirements of new platforms. However, given that the IPMI standard is no 396longer under active development, we would likely end up with a large amount of 397platform-specific customisations. This also does not solve the hardware channel 398issues in a standard manner. On OpenPOWER hardware at least, we've started to 399hit some of the limitations of IPMI (for example, we have need for >255 400sensors). 401 402## Impacts 403 404Development would be required to implement the PLDM protocol, the 405request/response model, and platform specific handling. Low level design is 406required to implement the protocol specifics of each of the PLDM Types. Such low 407level design is not included in this proposal. 408 409Design and development needs to involve the firmware stacks of management 410controllers and management devices of a platform management subsystem. 411 412## Testing 413 414Testing can be done without having to depend on the underlying transport layer. 415 416The responder function can be tested by mocking a requester and the transport 417layer: this would essentially test the protocol handling and platform specific 418handling. The requester function can be tested by mocking a responder: this 419would test the instance id handling and the send/receive functions. 420 421APIs from the shared libraries can be tested via fuzzing. 422