xref: /openbmc/docs/designs/phosphor-hwmon-io-uring.md (revision f4febd002df578bad816239b70950f84ea4567e8)
1# Using io_uring in BMCs for asynchronous sensor reads
2
3Author: Jerry Zhu ([jerryzhu@google.com](mailto:jerryzhu@google.com))
4
5Other contributors:
6
7- Brandon Kim ([brandonkim@google.com](mailto:brandonkim@google.com), brandonk)
8- William A. Kennington III ([wak@google.com](mailto:wak@google.com), wak)
9
10Created: June 9, 2021
11
12## Problem Description
13
14Currently, OpenBMC has code that performs I2C reads for sensors that may take
15longer than desired. These IO operations are currently synchronous, and
16therefore may block other functions such as IPMI. This project will involve
17going through OpenBMC repositories (specifically
18[phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)) that may have this
19drawback currently, and adding an asynchronous interface using the new io_uring
20library.
21
22## Background and References
23
24io_uring is a new asynchronous framework for Linux I/O interface (added to 5.1
25Linux kernel, 5.10 is preferred). It is an upgrade from the previous
26asynchronous IO called AIO, which had its limitations in context of its usage in
27sensor reads for OpenBMC.
28
29[brandonkim@google.com](mailto:brandonkim@google.com) has previously created a
30method for preventing sensors from blocking all other sensor reads and D-Bus if
31they do not report failures quickly enough in the phosphor-hwmon repository
32([link to change](https://gerrit.openbmc.org/c/openbmc/phosphor-hwmon/+/24337)).
33Internal Google BMC efforts have also focused on introducing the io_uring
34library to its code.
35
36## Requirements
37
38By using io_uring, the asynchronous sensor reads will need to maintain the same
39accuracy as the current, synchronous reads in each of the daemons. Potential
40OpenBMC repositories that will benefit from this library include:
41
42- [phosphor-hwmon](https://github.com/openbmc/phosphor-hwmon)
43- [phosphor-nvme](https://github.com/openbmc/phosphor-nvme)
44- [dbus-sensors](https://github.com/openbmc/dbus-sensors)
45- any other appropriate repository
46
47The focus of this project is to add asynchronous sensor reads to the
48phosphor-hwmon repository, which is easier to implement than adding asynchronous
49sensor reads into dbus-sensors.
50
51Users will need the ability to choose whether they want to utilize this new
52asynchronous method of reading sensors, or remain with the traditional,
53synchronous method. In addition, the performance improvement from using the new
54io_uring library will need to be calculated for each daemon.
55
56## Proposed Design
57
58In the phosphor-hwmon repository, the primary files that will require
59modification are
60[sensor.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/sensor.hpp)
61and
62[mainloop.cpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.cpp)/[.hpp](https://github.com/openbmc/phosphor-hwmon/blob/master/mainloop.hpp),
63as well the addition of a caching layer for the results from the sensor reads.
64
65In mainloop.cpp currently, the `read()` function, which reads hwmon sysfs
66entries, iterates through all sensors and calls `_ioAccess->read(...)` for each
67one; this operation is potentially blocking.
68
69The refactor will maintain this loop over all sensors, but instead make the read
70operation non-blocking by using an io_uring wrapper. A caching layer will be
71used to store the read results, which will be the main access point for
72obtaining sensor reads in mainloop.cpp.
73
74```
75               Interface Layer
76+--------------------------------------------+
77|                                            |
78|   +------------+         +-------------+   |
79|   |            |         |             |   |
80|   |   Redfish  |         |     IPMI    |   |
81|   |            |         |             |   |
82|   +-----+------+         +-------+-----+   |
83|         ^                        ^         |
84+---------|------------------------|---------+
85          |                        |
86          v                        v
87+---------+------------------------+---------+
88|                                            |
89|                  DBus                      |
90|                                            |
91+---------^------------------------^---------+
92          |                        |
93  +-------v-------+       +--------v-------+
94  |               |       |                |
95  |phosphor-hwmon |       |  dbus-sensors  |
96  |               |       |                |
97  +-------^-------+       +--------^-------+
98          | <--------------------- | <------- caching layer at this level
99 +--------v------------------------v--------+
100 |                                          |
101 |               Linux kernel               |
102 |                                          |
103 +----------^---------------------^---------+
104            |                     |
105       +----v-----+         +-----v----+
106       |          |         |          |
107       |i2c sensor|         |i2c sensor|
108       |          |         |          |
109       +----------+         +----------+
110
111```
112
113Using a flag variable (most likely to be placed in the .conf files of each hwmon
114sensor), users will be able to determine whether or not to utilize this new
115io_uring implementation for compatibility reasons.
116
117## Detailed Design
118
119The read cache is implemented using an `unordered_map` of {sensor hwmon path:
120read result}. The read result is a struct that keeps track of any necessary
121information for processing the read values and handling errors. Such information
122includes open file descriptor from the `open()` system call, number of retries
123remaining for reading this sensor when errors occur, etc.
124
125Each call to access the read value of a particular sesnor in the read cache will
126not only return the cached value but will also submit a SQE (submission queue
127event) to io_uring for that sensor; this SQE acts as a read request that will be
128sent to the kernel. The implementation maintains a set of sensors that keeps
129track of any pre-existing submissions so that multiple SQEs for the same sensor
130do not get submitted and overlap; the set entries will be cleared upon
131successful return of the read result using the CQE (completion queue event). The
132CQE will then be processed, and its information will update the cache map.
133
134The asynchronous nature of this implementation comes from sending all possible
135SQE requests, a non-blocking operation, at once instead of being blocked by slow
136sensor reads in the synchronous implementation. The kernel will process these
137requests, and before the next iteration of sensor reads the cache will attempt
138to process any returned CQEs, a non-blocking operation as well.
139
140Simply put, an access to some "Sensor A" in the read cache will create an
141underlying read request that makes a best effort to update the value of "Sensor
142A" before the next time the sensor read loop (currently 1 s by default) gets the
143value of "Sensor A" through the cache.
144
145## Alternatives Considered
146
147Linux does have a native asynchronous IO interface, simply dubbed AIO; however,
148there are a number of limitations. The biggest limitation of AIO is that it only
149supports true asynchronous IO for un-buffered reads. Furthermore, there are a
150number of ways that the IO submission can end up blocking - for example, if
151metadata is required to perform IO. Additionally, the memory costs of AIO are
152more expensive than those of io_uring.
153
154For these primary reasons, the native AIO library will not be considered for
155this implementation of asynchronous reads.
156
157## Impacts
158
159This project would impact all OpenBMC developers of openbmc/phosphor-hwmon
160initially. It has improved the latency performance of phosphor-hwmon; throughput
161has also been shown to increase (note that throughput profiling was more
162arbitrary than latency profiling). These performance changes will have to be
163calculated in further detail across different machines.
164
165There will be no security impact.
166
167## Testing
168
169The change will utilize the gTest framework to ensure the original functionality
170of the code in the repository modified stays the same.
171