1# Using io_uring in BMCs for asynchronous sensor reads
2
3Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com))
4
5Primary assignee: Brandon Kim
6([brandonkim@google.com](mailto:brandonkim@google.com), brandonk)
7
8Other contributors: William A. Kennington III
9([wak@google.com](mailto:wak@google.com), wak)
10
11Created: June 9, 2021
12
13## Problem Description
14
15Currently, OpenBMC has code that performs I2C reads for sensors that may take
16longer than desired. These IO operations are currently synchronous, and
17therefore may block other functions such as IPMI. This project will involve
18going through OpenBMC repositories (specifically
19[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this
20drawback currently, and adding an asynchronous interface using the new io_uring
21library.
22
23## Background and References
24
25io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1
26Linux kernel, 5.10 is preferred). It is an upgrade from the previous
27asynchronous IO called AIO, which had its limitations in context of its usage
28in sensor reads for OpenBMC.
29
30[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a
31method for preventing sensors from blocking all other sensor reads and D-Bus if
32they do not report failures quickly enough in the phosphor-hwmon repository
33([link to change](https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-hwmon/+/24337)).
34Internal Google BMC efforts have also focused on introducing the io_uring
35library to its code.
36
37## Requirements
38
39By using io_uring, the asynchronous sensor reads will need to maintain the same
40accuracy as the current, synchronous reads in each of the daemons. Potential
41OpenBMC repositories that will benefit from this library include:
42
43*   [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)
44*   [phosphor-nvme](https://github.com/openbmc/phosphor-nvme)
45*   [dbus-sensors](https://github.com/openbmc/dbus-sensors)
46*   any other appropriate repository
47
48The focus of this project is to add asynchronous sensor reads to the
49phosphor-hwmon repository, which is easier to implement than adding
50asynchronous sensor reads into dbus-sensors.
51
52Users will need the ability to choose whether they want to utilize this new
53asynchronous method of reading sensors, or remain with the traditional,
54synchronous method. In addition, the performance improvement from using the new
55io_uring library will need to be calculated for each daemon.
56
57## Proposed Design
58
59In the phosphor-hwmon repository, the primary files that will require
60modification are
61[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp)
62and
63[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp),
64as well the addition of a caching layer for the results from the sensor reads.
65
66In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs
67entries, iterates through all sensors and calls `_ioAccess->read(...)` for each
68one; this operation is potentially blocking.
69
70The refactor will maintain this loop over all sensors, but instead make the
71read operation non-blocking by using an io_uring wrapper. A caching layer will
72be used to store the read results, which will be the main access point for
73obtaining sensor reads in mainloop.cpp.
74
75```
76               Interface Layer
77+--------------------------------------------+
78|                                            |
79|   +------------+         +-------------+   |
80|   |            |         |             |   |
81|   |   Redfish  |         |     IPMI    |   |
82|   |            |         |             |   |
83|   +-----+------+         +-------+-----+   |
84|         ^                        ^         |
85+---------|------------------------|---------+
86          |                        |
87          v                        v
88+---------+------------------------+---------+
89|                                            |
90|                  DBus                      |
91|                                            |
92+---------^------------------------^---------+
93          |                        |
94  +-------v-------+       +--------v-------+
95  |               |       |                |
96  |phosphor-hwmon |       |  dbus-sensors  |
97  |               |       |                |
98  +-------^-------+       +--------^-------+
99          | <--------------------- | <------- caching layer at this level
100 +--------v------------------------v--------+
101 |                                          |
102 |               Linux kernel               |
103 |                                          |
104 +----------^---------------------^---------+
105            |                     |
106       +----v-----+         +-----v----+
107       |          |         |          |
108       |i2c sensor|         |i2c sensor|
109       |          |         |          |
110       +----------+         +----------+
111
112```
113
114Using a flag variable (most likely to be placed in the .conf files of each hwmon
115sensor), users will be able to determine whether or not to utilize this new
116io_uring implementation for compatibility reasons.
117
118## Detailed Design
119
120The read cache is implemented using an `unordered_map` of {sensor hwmon path:
121read result}. The read result is a struct that keeps track of any necessary
122information for processing the read values and handling errors. Such
123information includes open file descriptor from the `open()` system call,
124number of retries remaining for reading this sensor when errors occur, etc.
125
126Each call to access the read value of a particaular sesnor in the read cache
127will not only return the cached value but will also submit a SQE (submission
128queue event) to io_uring for that sensor; this SQE acts as a read request
129that will be sent to the kernel. The implementation maintains a set of sensors
130that keeps track of any pre-existing submissions so that multiple SQEs for the
131same sensor do not get submitted and overlap; the set entries will be cleared
132upon successful return of the read result using the CQE (completion queue
133event). The CQE will then be processed, and its information will update the
134cache map.
135
136The asynchronous nature of this implementation comes from sending all possible
137SQE requests, a non-blocking operation, at once instead of being blocked by
138slow sensor reads in the synchronous implementation. The kernel will process
139these requests, and before the next iteration of sensor reads the cache will
140attempt to process any returned CQEs, a non-blocking operation as well.
141
142Simply put, an access to some "Sensor A" in the read cache will create an
143underlying read request that makes a best effort to update the value of
144"Sensor A" before the next time the sensor read loop (currently 1 s by default)
145gets the value of "Sensor A" through the cache.
146
147## Alternatives Considered
148
149Linux does have a native asynchronous IO interface, simply dubbed AIO; however,
150there are a number of limitations. The biggest limitation of AIO is that it
151only supports true asynchronous IO for un-buffered reads. Furthermore, there
152are a number of ways that the IO submission can end up blocking - for example,
153if metadata is required to perform IO. Additionally, the memory costs of AIO
154are more expensive than those of io_uring.
155
156For these primary reasons, the native AIO library will not be considered for
157this implementation of asynchronous reads.
158
159## Impacts
160
161This project would impact all OpenBMC developers of openbmc/phosphor-hwmon
162initially. It has improved the latency performance of phosphor-hwmon;
163throughput has also been shown to increase (note that throughput profiling
164was more arbitrary than latency profiling). These performance changes will
165have to be calculated in further detail across different machines.
166
167There will be no security impact.
168
169## Testing
170
171The change will utilize the gTest framework to ensure the original functionality
172of the code in the repository modified stays the same.
173