xref: /openbmc/docs/designs/external-sensor.md (revision d045c8aa)
1# ExternalSensor in dbus-sensors
2
3Author: Josh Lehan[^1]
4
5Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu
6
7Created: March 19, 2021
8
9## Introduction
10
11In OpenBMC, the _dbus-sensors_[^2] package contains a suite of sensor daemons.
12Each daemon monitors a particular type of sensor. This document provides
13rationale and motivation for adding _ExternalSensor_, another sensor daemon, and
14gives some example usages of it.
15
16## Motivation
17
18There are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor
19daemon?
20
21- Most of the existing sensor daemons are tied to one particular physical
22  quantity they are measuring, such as temperature, and are hardcoded as such.
23  An externally-updated sensor has no such limitation, and should be flexible
24  enough to measure any physical quantity currently supported by OpenBMC.
25
26- Essentially all of the existing sensor daemons obtain the sensor values they
27  publish to D-Bus by reading from local hardware (typically by reading from
28  virtual files provided by the _hwmon_[^3] subsystem of the Linux kernel). None
29  of the daemons are currently designed with the intention of accepting values
30  pushed in from an external source. Although there is some debugging
31  functionality to add this feature to other sensor daemons[^25], it is not the
32  primary purpose for which they were designed.
33
34- Even if the debugging functionality of an existing daemon were to be used, the
35  daemon would still need a valid configuration tied to recognized hardware, as
36  detected by _entity-manager_[^4], in order for the daemon to properly
37  initialize itself and participate in the OpenBMC software stack.
38
39- For the same reason it is desirable for existing sensor daemons to detect and
40  properly indicate failures of their underlying hardware, it is desirable for
41  _ExternalSensor_ to detect and properly indicate loss of timely sensor updates
42  from their external source. This is a new feature, and does not cleanly fit
43  into the architecture of any existing sensor daemon, thus a new daemon is the
44  correct choice for this behavior.
45
46For these reasons, _ExternalSensor_ has been added[^5], as the eleventh sensor
47daemon in _dbus-sensors_.
48
49## Design
50
51After some discussion, a proof-of-concept _HostSensor_[^6] was published. This
52was a stub, but it revealed the minimal implementation that would still be
53capable of fully initializing and participating in the OpenBMC software stack.
54_ExternalSensor_ was formed by using this example _HostSensor_, and also one of
55the simplest existing sensor daemons, _HwmonTempSensor_[^7], as references to
56build upon.
57
58As written, after validating parameters during initialization, there is
59essentially no work for _ExternalSensor_ to do. The main loop is mostly idle,
60remaining blocked in the Boost ASIO[^8] library, handling D-Bus requests as they
61come in. This utilizes the functionality in the underlying _Sensor_[^9] class,
62which already contains the D-Bus hooks necessary to receive values from the
63external source.
64
65An example external source is the IPMI service[^10], receiving values from the
66host via the IPMI "Set Sensor Reading" command[^11]. _ExternalSensor_ is
67intended to be source-agnostic, so it does not matter if this is IPMI or
68Redfish[^12] or something else in the future, as long as they are received
69similarly over D-Bus.
70
71### Timeout
72
73The timeout feature is the primary feature which distinguishes _ExternalSensor_
74from other sensor daemons. Once an external source starts providing updates, the
75external source is expected to continue to provide timely updates. Each update
76will be properly published onto D-Bus, in the usual way done by all sensor
77daemons, as a floating-point value.
78
79A timer is used, the same Boost ASIO[^13] timer mechanism used by other sensor
80daemons to poll their hardware, but in this case, is used to manage how long it
81has been since the last known good external update. When the timer expires, the
82sensor value will be deemed stale, and will be replaced with floating-point
83quiet _NaN_[^14].
84
85### NaN
86
87The advantage of floating-point _NaN_ is that it is a drop-in replacement for
88the valid floating-point value of the sensor. A subtle difference of the earlier
89OpenBMC sensor "Value" schema change, from integer to floating-point, is that
90the field is essentially now nullable. Instead of having to arbitrarily choose
91an arbitrary integer value to indicate "not valid", such as -1 or 9999 or
92whatever, floating-point explicitly has _NaN_ to indicate this. So, there is no
93possibility of confusion that this will be mistaken for a valid sensor value, as
94_NaN_ is literally _not a number_, and thus can not be misparsed as a valid
95sensor reading. It thus saves having to add a second field to reliably indicate
96validity, which would break the existing schema[^15].
97
98An alternative to using _NaN_ for staleness indication would have been to use a
99timestamp, which would introduce the complication of having to parse and compare
100timestamps within OpenBMC, and all the subtle difficulties thereof[^16]. What's
101more, adding a second field might require a second D-Bus message to update, and
102D-Bus messages are computationally expensive[^17] and should be used sparingly.
103Periodic things like sensors, which send out regular updates, could easily lead
104to frequent D-Bus traffic and thus should be kept as minimal as practical. And
105finally, changing the Value schema would cause a large blast radius, both in
106design and in code, necessitating a large refactoring effort well beyond the
107scope of what is needed for _ExternalSensor_.
108
109### Configuration
110
111Configuring a sensor for use with _ExternalSensor_ should be done in the usual
112way[^18] that is done for use with other sensor daemons, namely, a JSON
113dictionary that is an element of the "Exposes" array within a JSON configuration
114file to be read by _entity-manager_. In that JSON dictionary, the valid names
115are listed below. All of these are mandatory parameters, unless mentioned as
116optional. For fields listed as "Numeric" below, this means that it can be either
117integer or valid floating-point.
118
119- "Name": String. The sensor name, which this sensor will be known as. A
120  mandatory component of the `entity-manager` configuration, and the resulting
121  D-Bus object path.
122
123- "Units": String. This parameter is unique to _ExternalSensor_. As
124  _ExternalSensor_ is not tied to any particular physical hardware, it can
125  measure any physical quantity supported by OpenBMC. This string will be
126  translated to another string via a lookup table[^19], and forms another
127  mandatory component of the D-Bus object path.
128
129- "MinValue": Numeric. The minimum valid value for this sensor. Although not
130  used by _ExternalSensor_ directly, it is a valuable hint for services such as
131  IPMI, which need to know the minimum and maximum valid sensor values in order
132  to scale their reporting range accurately. As _ExternalSensor_ is not tied to
133  one particular physical quantity, there is no suitable default value for
134  minimum and maximum. Thus, unlike other sensor daemons where this parameter is
135  optional, in _ExternalSensor_ it is mandatory.
136
137- "MaxValue": Numeric. The maximum valid value for this sensor. It is treated
138  similarly to "MinValue".
139
140- "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the
141  timeout value, in seconds. If this amount of time elapses with no new updates
142  received over D-Bus from the external source, this sensor will be deemed
143  stale. The value of this sensor will be replaced with floating-point _NaN_, as
144  described above. This field is optional. If not given, the timeout feature
145  will be disabled for this sensor (so it will never be deemed stale).
146
147- "Type": String. Must be exactly "ExternalSensor". This string is used by
148  _ExternalSensor_ to obtain configuration information from _entity-manager_
149  during initialization. This string is what differentiates JSON stanzas
150  intended for _ExternalSensor_ versus JSON stanzas intended for other
151  _dbus-sensors_ sensor daemons.
152
153- "Thresholds": JSON dictionary. This field is optional. It is passed through to
154  the main _Sensor_ class during initialization, similar to other sensor
155  daemons. Other than that, it is not used by _ExternalSensor_.
156
157- "PowerState": String. This field is optional. Similarly to "Thresholds", it is
158  passed through to the main _Sensor_ class during initialization.
159
160Here is an example. The sensor created by this stanza will form this object
161path: /xyz/openbmc_project/sensors/temperature/HostDevTemp
162
163```
164        {
165            "Name": "HostDevTemp",
166            "Units": "DegreesC",
167            "MinValue": -16.0,
168            "MaxValue": 111.5,
169            "Timeout": 4.0,
170            "Type": "ExternalSensor"
171        },
172```
173
174There can be multiple _ExternalSensor_ sensors in the configuration. There is no
175set limit on the number of sensors, except what is supported by a service such
176as IPMI.
177
178## Implementation
179
180As it stands now, _ExternalSensor_ is up and running[^20]. However, the timeout
181feature was originally implemented at the IPMI layer. Upon further
182investigation, it was found that IPMI was the wrong place for this feature, and
183that it should be moved within _ExternalSensor_ itself[^21]. It was originally
184thought that the timeout feature would be a useful enhancement available to all
185IPMI sensors, however, expected usage of almost all external sensor updates is a
186one-shot adjustment (for example, somebody wishes to change a voltage regulator
187setting, or fan speed setting). In this case, the timeout feature would not only
188not be necessary, it would get in the way and require additional coding[^22] to
189compensate for the unexpected _NaN_ value. Only sensors intended for use with
190_ExternalSensor_ are expected to receive continuous periodic updates from an
191external source, so it makes sense to move this timeout feature into
192_ExternalSensor_. This change also has the advantage of making _ExternalSensor_
193not dependent on IPMI as the only source of external updates.
194
195A challenge of generalizing the timeout feature into _ExternalSensor_, however,
196was that the existing _Sensor_ base class did not currently allow its existing
197D-Bus setter hook to be customized. This feature was straightforward to
198add[^23]. One limitation was that the existing _Sensor_ class, by design,
199dropped updates that duplicated the existing sensor value. For use with
200_ExternalSensor_, we want to recognize all updates received, even duplicates, as
201they are important to pet the watchdog, to avoid inadvertently triggering the
202timeout feature. However, it is still important to avoid needlessly sending the
203D-Bus _PropertiesChanged_ event for duplicate readings.
204
205The timeout value was originally a compiled-in constant. If _ExternalSensor_ is
206to succeed as a general-purpose tool, this must be configurable. It was
207straightforward to add another configurable parameter[^24] to accept this
208timeout value, as shown in "Parameters" above.
209
210The hardest task of all, however, was getting it accepted upstream. If you are
211reading this, then most likely, it was successful!
212
213## Footnotes
214
215[^1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com
216[^2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md
217[^3]: https://www.kernel.org/doc/html/latest/hwmon/index.html
218[^4]: https://github.com/openbmc/entity-manager/blob/master/README.md
219[^5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
220[^6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476
221[^7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp
222[^8]: https://think-async.com/Asio/
223[^9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp
224[^10]:
225    https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md
226
227[^11]:
228    https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html
229
230[^12]: https://www.dmtf.org/standards/redfish
231[^13]:
232    https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html
233
234[^14]: https://anniecherkaev.com/the-secret-life-of-nan
235[^15]:
236    https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml
237
238[^16]: https://cr.yp.to/proto/utctai.html
239[^17]: https://github.com/openbmc/openbmc/issues/1892
240[^18]:
241    https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md
242
243[^19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp
244[^20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
245[^21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398
246[^22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294
247[^23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394
248[^24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397
249[^25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177
250