1# Using io_uring in BMCs for asynchronous sensor reads 2 3Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com)) 4 5Primary assignee: Brandon Kim 6([brandonkim@google.com](mailto:brandonkim@google.com), brandonk) 7 8Other contributors: William A. Kennington III 9([wak@google.com](mailto:wak@google.com), wak) 10 11Created: June 9, 2021 12 13## Problem Description 14 15Currently, OpenBMC has code that performs I2C reads for sensors that may take 16longer than desired. These IO operations are currently synchronous, and 17therefore may block other functions such as IPMI. This project will involve 18going through OpenBMC repositories (specifically 19[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this 20drawback currently, and adding an asynchronous interface using the new io_uring 21library. 22 23## Background and References 24 25io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1 26Linux kernel, 5.10 is preferred). It is an upgrade from the previous 27asynchronous IO called AIO, which had its limitations in context of its usage 28in sensor reads for OpenBMC. 29 30[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a 31method for preventing sensors from blocking all other sensor reads and D-Bus if 32they do not report failures quickly enough in the phosphor-hwmon repository 33([link to change](https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-hwmon/+/24337)). 34Internal Google BMC efforts have also focused on introducing the io_uring 35library to its code. 36 37## Requirements 38 39By using io_uring, the asynchronous sensor reads will need to maintain the same 40accuracy as the current, synchronous reads in each of the daemons. Potential 41OpenBMC repositories that will benefit from this library include: 42 43* [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon) 44* [phosphor-nvme](https://github.com/openbmc/phosphor-nvme) 45* [dbus-sensors](https://github.com/openbmc/dbus-sensors) 46* any other appropriate repository 47 48The focus of this project is to add asynchronous sensor reads to the 49phosphor-hwmon repository, which is easier to implement than adding 50asynchronous sensor reads into dbus-sensors. 51 52Users will need the ability to choose whether they want to utilize this new 53asynchronous method of reading sensors, or remain with the traditional, 54synchronous method. In addition, the performance improvement from using the new 55io_uring library will need to be calculated for each daemon. 56 57## Proposed Design 58 59In the phosphor-hwmon repository, the primary files that will require 60modification are 61[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp) 62and 63[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp), 64as well the addition of a caching layer for the results from the sensor reads. 65 66In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs 67entries, iterates through all sensors and calls `_ioAccess->read(...)` for each 68one; this operation is potentially blocking. 69 70The refactor will maintain this loop over all sensors, but instead make the 71read operation non-blocking by using an io_uring wrapper. A caching layer will 72be used to store the read results, which will be the main access point for 73obtaining sensor reads in mainloop.cpp. 74 75``` 76 Interface Layer 77+--------------------------------------------+ 78| | 79| +------------+ +-------------+ | 80| | | | | | 81| | Redfish | | IPMI | | 82| | | | | | 83| +-----+------+ +-------+-----+ | 84| ^ ^ | 85+---------|------------------------|---------+ 86 | | 87 v v 88+---------+------------------------+---------+ 89| | 90| DBus | 91| | 92+---------^------------------------^---------+ 93 | | 94 +-------v-------+ +--------v-------+ 95 | | | | 96 |phosphor-hwmon | | dbus-sensors | 97 | | | | 98 +-------^-------+ +--------^-------+ 99 | <--------------------- | <------- caching layer at this level 100 +--------v------------------------v--------+ 101 | | 102 | Linux kernel | 103 | | 104 +----------^---------------------^---------+ 105 | | 106 +----v-----+ +-----v----+ 107 | | | | 108 |i2c sensor| |i2c sensor| 109 | | | | 110 +----------+ +----------+ 111 112``` 113 114Using a flag variable (most likely to be placed in the .conf files of each hwmon 115sensor), users will be able to determine whether or not to utilize this new 116io_uring implementation for compatibility reasons. 117 118## Detailed Design 119 120The read cache is implemented using an `unordered_map` of {sensor hwmon path: 121read result}. The read result is a struct that keeps track of any necessary 122information for processing the read values and handling errors. Such 123information includes open file descriptor from the `open()` system call, 124number of retries remaining for reading this sensor when errors occur, etc. 125 126Each call to access the read value of a particular sesnor in the read cache 127will not only return the cached value but will also submit a SQE (submission 128queue event) to io_uring for that sensor; this SQE acts as a read request 129that will be sent to the kernel. The implementation maintains a set of sensors 130that keeps track of any pre-existing submissions so that multiple SQEs for the 131same sensor do not get submitted and overlap; the set entries will be cleared 132upon successful return of the read result using the CQE (completion queue 133event). The CQE will then be processed, and its information will update the 134cache map. 135 136The asynchronous nature of this implementation comes from sending all possible 137SQE requests, a non-blocking operation, at once instead of being blocked by 138slow sensor reads in the synchronous implementation. The kernel will process 139these requests, and before the next iteration of sensor reads the cache will 140attempt to process any returned CQEs, a non-blocking operation as well. 141 142Simply put, an access to some "Sensor A" in the read cache will create an 143underlying read request that makes a best effort to update the value of 144"Sensor A" before the next time the sensor read loop (currently 1 s by default) 145gets the value of "Sensor A" through the cache. 146 147## Alternatives Considered 148 149Linux does have a native asynchronous IO interface, simply dubbed AIO; however, 150there are a number of limitations. The biggest limitation of AIO is that it 151only supports true asynchronous IO for un-buffered reads. Furthermore, there 152are a number of ways that the IO submission can end up blocking - for example, 153if metadata is required to perform IO. Additionally, the memory costs of AIO 154are more expensive than those of io_uring. 155 156For these primary reasons, the native AIO library will not be considered for 157this implementation of asynchronous reads. 158 159## Impacts 160 161This project would impact all OpenBMC developers of openbmc/phosphor-hwmon 162initially. It has improved the latency performance of phosphor-hwmon; 163throughput has also been shown to increase (note that throughput profiling 164was more arbitrary than latency profiling). These performance changes will 165have to be calculated in further detail across different machines. 166 167There will be no security impact. 168 169## Testing 170 171The change will utilize the gTest framework to ensure the original functionality 172of the code in the repository modified stays the same. 173