xref: /openbmc/docs/designs/external-sensor.md (revision 61475153)
1# ExternalSensor in dbus-sensors
2
3Author: Josh Lehan[^1]
4
5Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu
6
7Created: March 19, 2021
8
9## Introduction
10
11In OpenBMC, the _dbus-sensors_[^2] package contains a suite of sensor daemons.
12Each daemon monitors a particular type of sensor. This document provides
13rationale and motivation for adding _ExternalSensor_, another sensor daemon, and
14gives some example usages of it.
15
16## Motivation
17
18There are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor
19daemon?
20
21*   Most of the existing sensor daemons are tied to one particular physical
22    quantity they are measuring, such as temperature, and are hardcoded as such.
23    An externally-updated sensor has no such limitation, and should be flexible
24    enough to measure any physical quantity currently supported by OpenBMC.
25
26*   Essentially all of the existing sensor daemons obtain the sensor values they
27    publish to D-Bus by reading from local hardware (typically by reading from
28    virtual files provided by the _hwmon_[^3] subsystem of the Linux kernel).
29    None of the daemons are currently designed with the intention of accepting
30    values pushed in from an external source. Although there is some debugging
31    functionality to add this feature to other sensor daemons[^25], it is not
32    the primary purpose for which they were designed.
33
34*   Even if the debugging functionality of an existing daemon were to be used,
35    the daemon would still need a valid configuration tied to recognized
36    hardware, as detected by _entity-manager_[^4], in order for the daemon to
37    properly initialize itself and participate in the OpenBMC software stack.
38
39*   For the same reason it is desirable for existing sensor daemons to detect
40    and properly indicate failures of their underlying hardware, it is desirable
41    for _ExternalSensor_ to detect and properly indicate loss of timely sensor
42    updates from their external source. This is a new feature, and does not
43    cleanly fit into the architecture of any existing sensor daemon, thus a new
44    daemon is the correct choice for this behavior.
45
46For these reasons, _ExternalSensor_ has been added[^5], as the eleventh sensor
47daemon in _dbus-sensors_.
48
49## Design
50
51After some discussion, a proof-of-concept _HostSensor_[^6] was published. This
52was a stub, but it revealed the minimal implementation that would still be
53capable of fully initializing and participating in the OpenBMC software stack.
54_ExternalSensor_ was formed by using this example _HostSensor_, and also one of
55the simplest existing sensor daemons, _HwmonTempSensor_[^7], as references to
56build upon.
57
58As written, after validating parameters during initialization, there is
59essentially no work for _ExternalSensor_ to do. The main loop is mostly idle,
60remaining blocked in the Boost ASIO[^8] library, handling D-Bus requests as they
61come in. This utilizes the functionality in the underlying _Sensor_[^9] class,
62which already contains the D-Bus hooks necessary to receive values from the
63external source.
64
65An example external source is the IPMI service[^10], receiving values from the
66host via the IPMI "Set Sensor Reading" command[^11]. _ExternalSensor_ is
67intended to be source-agnostic, so it does not matter if this is IPMI or
68Redfish[^12] or something else in the future, as long as they are received
69similarly over D-Bus.
70
71### Timeout
72
73The timeout feature is the primary feature which distinguishes _ExternalSensor_
74from other sensor daemons. Once an external source starts providing updates, the
75external source is expected to continue to provide timely updates. Each update
76will be properly published onto D-Bus, in the usual way done by all sensor
77daemons, as a floating-point value.
78
79A timer is used, the same Boost ASIO[^13] timer mechanism used by other sensor
80daemons to poll their hardware, but in this case, is used to manage how long it
81has been since the last known good external update. When the timer expires, the
82sensor value will be deemed stale, and will be replaced with floating-point
83quiet _NaN_[^14].
84
85### NaN
86
87The advantage of floating-point _NaN_ is that it is a drop-in replacement for
88the valid floating-point value of the sensor. A subtle difference of the earlier
89OpenBMC sensor "Value" schema change, from integer to floating-point, is that
90the field is essentially now nullable. Instead of having to arbitrarily choose
91an arbitrary integer value to indicate "not valid", such as -1 or 9999 or
92whatever, floating-point explicitly has _NaN_ to indicate this. So, there is no
93possibility of confusion that this will be mistaken for a valid sensor value, as
94_NaN_ is literally _not a number_, and thus can not be misparsed as a valid
95sensor reading. It thus saves having to add a second field to reliably indicate
96validity, which would break the existing schema[^15].
97
98An alternative to using _NaN_ for staleness indication would have been to use a
99timestamp, which would introduce the complication of having to parse and compare
100timestamps within OpenBMC, and all the subtle difficulties thereof[^16]. What's
101more, adding a second field might require a second D-Bus message to update, and
102D-Bus messages are computationally expensive[^17] and should be used sparingly.
103Periodic things like sensors, which send out regular updates, could easily lead
104to frequent D-Bus traffic and thus should be kept as minimal as practical. And
105finally, changing the Value schema would cause a large blast radius, both in
106design and in code, necessitating a large refactoring effort well beyond the
107scope of what is needed for _ExternalSensor_.
108
109### Configuration
110
111Configuring a sensor for use with _ExternalSensor_ should be done in the usual
112way[^18] that is done for use with other sensor daemons, namely, a JSON
113dictionary that is an element of the "Exposes" array within a JSON configuration
114file to be read by _entity-manager_. In that JSON dictionary, the valid names
115are listed below. All of these are mandatory parameters, unless mentioned as
116optional. For fields listed as "Numeric" below, this means that it can be either
117integer or valid floating-point.
118
119*   "Name": String. The sensor name, which this sensor will be known as. A
120    mandatory component of the `entity-manager` configuration, and the resulting
121    D-Bus object path.
122
123*   "Units": String. This parameter is unique to _ExternalSensor_. As
124    _ExternalSensor_ is not tied to any particular physical hardware, it can
125    measure any physical quantity supported by OpenBMC. This string will be
126    translated to another string via a lookup table[^19], and forms another
127    mandatory component of the D-Bus object path.
128
129*   "MinValue": Numeric. The minimum valid value for this sensor. Although not
130    used by _ExternalSensor_ directly, it is a valuable hint for services such
131    as IPMI, which need to know the minimum and maximum valid sensor values in
132    order to scale their reporting range accurately. As _ExternalSensor_ is not
133    tied to one particular physical quantity, there is no suitable default value
134    for minimum and maximum. Thus, unlike other sensor daemons where this
135    parameter is optional, in _ExternalSensor_ it is mandatory.
136
137*   "MaxValue": Numeric. The maximum valid value for this sensor. It is treated
138    similarly to "MinValue".
139
140*   "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the
141    timeout value, in seconds. If this amount of time elapses with no new
142    updates received over D-Bus from the external source, this sensor will be
143    deemed stale. The value of this sensor will be replaced with floating-point
144    _NaN_, as described above. This field is optional. If not given, the timeout
145    feature will be disabled for this sensor (so it will never be deemed stale).
146
147*   "Type": String. Must be exactly "ExternalSensor". This string is used by
148    _ExternalSensor_ to obtain configuration information from _entity-manager_
149    during initialization. This string is what differentiates JSON stanzas
150    intended for _ExternalSensor_ versus JSON stanzas intended for other
151    _dbus-sensors_ sensor daemons.
152
153*   "Thresholds": JSON dictionary. This field is optional. It is passed through
154    to the main _Sensor_ class during initialization, similar to other sensor
155    daemons. Other than that, it is not used by _ExternalSensor_.
156
157*   "PowerState": String. This field is optional. Similarly to "Thresholds", it
158    is passed through to the main _Sensor_ class during initialization.
159
160Here is an example. The sensor created by this stanza will form this object
161path: /xyz/openbmc_project/sensors/temperature/HostDevTemp
162
163```
164        {
165            "Name": "HostDevTemp",
166            "Units": "DegreesC",
167            "MinValue": -16.0,
168            "MaxValue": 111.5,
169            "Timeout": 4.0,
170            "Type": "ExternalSensor"
171        },
172```
173
174There can be multiple _ExternalSensor_ sensors in the configuration. There is no
175set limit on the number of sensors, except what is supported by a service such
176as IPMI.
177
178## Implementation
179
180As it stands now, _ExternalSensor_ is up and running[^20]. However, the timeout
181feature was originally implemented at the IPMI layer. Upon further
182investigation, it was found that IPMI was the wrong place for this feature, and
183that it should be moved within _ExternalSensor_ itself[^21]. It was originally
184thought that the timeout feature would be a useful enhancement available to all
185IPMI sensors, however, expected usage of almost all external sensor updates is a
186one-shot adjustment (for example, somebody wishes to change a voltage regulator
187setting, or fan speed setting). In this case, the timeout feature would not only
188not be necessary, it would get in the way and require additional coding[^22] to
189compensate for the unexpected _NaN_ value. Only sensors intended for use with
190_ExternalSensor_ are expected to receive continuous periodic updates from an
191external source, so it makes sense to move this timeout feature into
192_ExternalSensor_. This change also has the advantage of making _ExternalSensor_
193not dependent on IPMI as the only source of external updates.
194
195A challenge of generalizing the timeout feature into _ExternalSensor_, however,
196was that the existing _Sensor_ base class did not currently allow its existing
197D-Bus setter hook to be customized. This feature was straightforward to
198add[^23]. One limitation was that the existing _Sensor_ class, by design,
199dropped updates that duplicated the existing sensor value. For use with
200_ExternalSensor_, we want to recognize all updates received, even duplicates, as
201they are important to pet the watchdog, to avoid inadvertently triggering the
202timeout feature. However, it is still important to avoid needlessly sending the
203D-Bus _PropertiesChanged_ event for duplicate readings.
204
205The timeout value was originally a compiled-in constant. If _ExternalSensor_ is
206to succeed as a general-purpose tool, this must be configurable. It was
207straightforward to add another configurable parameter[^24] to accept this
208timeout value, as shown in "Parameters" above.
209
210The hardest task of all, however, was getting it accepted upstream. If you are
211reading this, then most likely, it was successful!
212
213## Footnotes
214
215[^1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com
216[^2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md
217[^3]: https://www.kernel.org/doc/html/latest/hwmon/index.html
218[^4]: https://github.com/openbmc/entity-manager/blob/master/README.md
219[^5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
220[^6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476
221[^7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp
222[^8]: https://think-async.com/Asio/
223[^9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp
224[^10]: https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md
225[^11]: https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html
226[^12]: https://www.dmtf.org/standards/redfish
227[^13]: https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html
228[^14]: https://anniecherkaev.com/the-secret-life-of-nan
229[^15]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml
230[^16]: https://cr.yp.to/proto/utctai.html
231[^17]: https://github.com/openbmc/openbmc/issues/1892
232[^18]: https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md
233[^19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp
234[^20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
235[^21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398
236[^22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294
237[^23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394
238[^24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397
239[^25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177
240