1# Using io_uring in BMCs for asynchronous sensor reads 2 3Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com)) 4 5Other contributors: 6- Brandon Kim ([brandonkim@google.com](mailto:brandonkim@google.com), brandonk) 7- William A. Kennington III ([wak@google.com](mailto:wak@google.com), wak) 8 9Created: June 9, 2021 10 11## Problem Description 12 13Currently, OpenBMC has code that performs I2C reads for sensors that may take 14longer than desired. These IO operations are currently synchronous, and 15therefore may block other functions such as IPMI. This project will involve 16going through OpenBMC repositories (specifically 17[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this 18drawback currently, and adding an asynchronous interface using the new io_uring 19library. 20 21## Background and References 22 23io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1 24Linux kernel, 5.10 is preferred). It is an upgrade from the previous 25asynchronous IO called AIO, which had its limitations in context of its usage 26in sensor reads for OpenBMC. 27 28[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a 29method for preventing sensors from blocking all other sensor reads and D-Bus if 30they do not report failures quickly enough in the phosphor-hwmon repository 31([link to change](https://gerrit.openbmc.org/c/openbmc/phosphor-hwmon/+/24337)). 32Internal Google BMC efforts have also focused on introducing the io_uring 33library to its code. 34 35## Requirements 36 37By using io_uring, the asynchronous sensor reads will need to maintain the same 38accuracy as the current, synchronous reads in each of the daemons. Potential 39OpenBMC repositories that will benefit from this library include: 40 41* [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon) 42* [phosphor-nvme](https://github.com/openbmc/phosphor-nvme) 43* [dbus-sensors](https://github.com/openbmc/dbus-sensors) 44* any other appropriate repository 45 46The focus of this project is to add asynchronous sensor reads to the 47phosphor-hwmon repository, which is easier to implement than adding 48asynchronous sensor reads into dbus-sensors. 49 50Users will need the ability to choose whether they want to utilize this new 51asynchronous method of reading sensors, or remain with the traditional, 52synchronous method. In addition, the performance improvement from using the new 53io_uring library will need to be calculated for each daemon. 54 55## Proposed Design 56 57In the phosphor-hwmon repository, the primary files that will require 58modification are 59[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp) 60and 61[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp), 62as well the addition of a caching layer for the results from the sensor reads. 63 64In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs 65entries, iterates through all sensors and calls `_ioAccess->read(...)` for each 66one; this operation is potentially blocking. 67 68The refactor will maintain this loop over all sensors, but instead make the 69read operation non-blocking by using an io_uring wrapper. A caching layer will 70be used to store the read results, which will be the main access point for 71obtaining sensor reads in mainloop.cpp. 72 73``` 74 Interface Layer 75+--------------------------------------------+ 76| | 77| +------------+ +-------------+ | 78| | | | | | 79| | Redfish | | IPMI | | 80| | | | | | 81| +-----+------+ +-------+-----+ | 82| ^ ^ | 83+---------|------------------------|---------+ 84 | | 85 v v 86+---------+------------------------+---------+ 87| | 88| DBus | 89| | 90+---------^------------------------^---------+ 91 | | 92 +-------v-------+ +--------v-------+ 93 | | | | 94 |phosphor-hwmon | | dbus-sensors | 95 | | | | 96 +-------^-------+ +--------^-------+ 97 | <--------------------- | <------- caching layer at this level 98 +--------v------------------------v--------+ 99 | | 100 | Linux kernel | 101 | | 102 +----------^---------------------^---------+ 103 | | 104 +----v-----+ +-----v----+ 105 | | | | 106 |i2c sensor| |i2c sensor| 107 | | | | 108 +----------+ +----------+ 109 110``` 111 112Using a flag variable (most likely to be placed in the .conf files of each hwmon 113sensor), users will be able to determine whether or not to utilize this new 114io_uring implementation for compatibility reasons. 115 116## Detailed Design 117 118The read cache is implemented using an `unordered_map` of {sensor hwmon path: 119read result}. The read result is a struct that keeps track of any necessary 120information for processing the read values and handling errors. Such 121information includes open file descriptor from the `open()` system call, 122number of retries remaining for reading this sensor when errors occur, etc. 123 124Each call to access the read value of a particular sesnor in the read cache 125will not only return the cached value but will also submit a SQE (submission 126queue event) to io_uring for that sensor; this SQE acts as a read request 127that will be sent to the kernel. The implementation maintains a set of sensors 128that keeps track of any pre-existing submissions so that multiple SQEs for the 129same sensor do not get submitted and overlap; the set entries will be cleared 130upon successful return of the read result using the CQE (completion queue 131event). The CQE will then be processed, and its information will update the 132cache map. 133 134The asynchronous nature of this implementation comes from sending all possible 135SQE requests, a non-blocking operation, at once instead of being blocked by 136slow sensor reads in the synchronous implementation. The kernel will process 137these requests, and before the next iteration of sensor reads the cache will 138attempt to process any returned CQEs, a non-blocking operation as well. 139 140Simply put, an access to some "Sensor A" in the read cache will create an 141underlying read request that makes a best effort to update the value of 142"Sensor A" before the next time the sensor read loop (currently 1 s by default) 143gets the value of "Sensor A" through the cache. 144 145## Alternatives Considered 146 147Linux does have a native asynchronous IO interface, simply dubbed AIO; however, 148there are a number of limitations. The biggest limitation of AIO is that it 149only supports true asynchronous IO for un-buffered reads. Furthermore, there 150are a number of ways that the IO submission can end up blocking - for example, 151if metadata is required to perform IO. Additionally, the memory costs of AIO 152are more expensive than those of io_uring. 153 154For these primary reasons, the native AIO library will not be considered for 155this implementation of asynchronous reads. 156 157## Impacts 158 159This project would impact all OpenBMC developers of openbmc/phosphor-hwmon 160initially. It has improved the latency performance of phosphor-hwmon; 161throughput has also been shown to increase (note that throughput profiling 162was more arbitrary than latency profiling). These performance changes will 163have to be calculated in further detail across different machines. 164 165There will be no security impact. 166 167## Testing 168 169The change will utilize the gTest framework to ensure the original functionality 170of the code in the repository modified stays the same. 171