xref: /openbmc/docs/designs/external-sensor.md (revision ba560cc31297caddfc157c540ae9e6d760d630e5)
1# ExternalSensor in dbus-sensors
2
3Author: Josh Lehan[1]
4
5Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu
6
7Created: March 19, 2021
8
9## Introduction
10
11In OpenBMC, the _dbus-sensors_[2] package contains a suite of sensor daemons.
12Each daemon monitors a particular type of sensor. This document provides
13rationale and motivation for adding _ExternalSensor_, another sensor daemon, and
14gives some example usages of it.
15
16## Motivation
17
18There are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor
19daemon?
20
21- Most of the existing sensor daemons are tied to one particular physical
22  quantity they are measuring, such as temperature, and are hardcoded as such.
23  An externally-updated sensor has no such limitation, and should be flexible
24  enough to measure any physical quantity currently supported by OpenBMC.
25
26- Essentially all of the existing sensor daemons obtain the sensor values they
27  publish to D-Bus by reading from local hardware (typically by reading from
28  virtual files provided by the _hwmon_[3] subsystem of the Linux kernel). None
29  of the daemons are currently designed with the intention of accepting values
30  pushed in from an external source. Although there is some debugging
31  functionality to add this feature to other sensor daemons[25], it is not the
32  primary purpose for which they were designed.
33
34- Even if the debugging functionality of an existing daemon were to be used, the
35  daemon would still need a valid configuration tied to recognized hardware, as
36  detected by _entity-manager_[4], in order for the daemon to properly
37  initialize itself and participate in the OpenBMC software stack.
38
39- For the same reason it is desirable for existing sensor daemons to detect and
40  properly indicate failures of their underlying hardware, it is desirable for
41  _ExternalSensor_ to detect and properly indicate loss of timely sensor updates
42  from their external source. This is a new feature, and does not cleanly fit
43  into the architecture of any existing sensor daemon, thus a new daemon is the
44  correct choice for this behavior.
45
46For these reasons, _ExternalSensor_ has been added[5], as the eleventh sensor
47daemon in _dbus-sensors_.
48
49## Design
50
51After some discussion, a proof-of-concept _HostSensor_[6] was published. This
52was a stub, but it revealed the minimal implementation that would still be
53capable of fully initializing and participating in the OpenBMC software stack.
54_ExternalSensor_ was formed by using this example _HostSensor_, and also one of
55the simplest existing sensor daemons, _HwmonTempSensor_[7], as references to
56build upon.
57
58As written, after validating parameters during initialization, there is
59essentially no work for _ExternalSensor_ to do. The main loop is mostly idle,
60remaining blocked in the Boost ASIO[8] library, handling D-Bus requests as they
61come in. This utilizes the functionality in the underlying _Sensor_[9] class,
62which already contains the D-Bus hooks necessary to receive values from the
63external source.
64
65An example external source is the IPMI service[10], receiving values from the
66host via the IPMI "Set Sensor Reading" command[11]. _ExternalSensor_ is intended
67to be source-agnostic, so it does not matter if this is IPMI or Redfish[12] or
68something else in the future, as long as they are received similarly over D-Bus.
69
70### Timeout
71
72The timeout feature is the primary feature which distinguishes _ExternalSensor_
73from other sensor daemons. Once an external source starts providing updates, the
74external source is expected to continue to provide timely updates. Each update
75will be properly published onto D-Bus, in the usual way done by all sensor
76daemons, as a floating-point value.
77
78A timer is used, the same Boost ASIO[13] timer mechanism used by other sensor
79daemons to poll their hardware, but in this case, is used to manage how long it
80has been since the last known good external update. When the timer expires, the
81sensor value will be deemed stale, and will be replaced with floating-point
82quiet _NaN_[14].
83
84### NaN
85
86The advantage of floating-point _NaN_ is that it is a drop-in replacement for
87the valid floating-point value of the sensor. A subtle difference of the earlier
88OpenBMC sensor "Value" schema change, from integer to floating-point, is that
89the field is essentially now nullable. Instead of having to arbitrarily choose
90an arbitrary integer value to indicate "not valid", such as -1 or 9999 or
91whatever, floating-point explicitly has _NaN_ to indicate this. So, there is no
92possibility of confusion that this will be mistaken for a valid sensor value, as
93_NaN_ is literally _not a number_, and thus can not be misparsed as a valid
94sensor reading. It thus saves having to add a second field to reliably indicate
95validity, which would break the existing schema[15].
96
97An alternative to using _NaN_ for staleness indication would have been to use a
98timestamp, which would introduce the complication of having to parse and compare
99timestamps within OpenBMC, and all the subtle difficulties thereof[16]. What's
100more, adding a second field might require a second D-Bus message to update, and
101D-Bus messages are computationally expensive[17] and should be used sparingly.
102Periodic things like sensors, which send out regular updates, could easily lead
103to frequent D-Bus traffic and thus should be kept as minimal as practical. And
104finally, changing the Value schema would cause a large blast radius, both in
105design and in code, necessitating a large refactoring effort well beyond the
106scope of what is needed for _ExternalSensor_.
107
108### Configuration
109
110Configuring a sensor for use with _ExternalSensor_ should be done in the usual
111way[18] that is done for use with other sensor daemons, namely, a JSON
112dictionary that is an element of the "Exposes" array within a JSON configuration
113file to be read by _entity-manager_. In that JSON dictionary, the valid names
114are listed below. All of these are mandatory parameters, unless mentioned as
115optional. For fields listed as "Numeric" below, this means that it can be either
116integer or valid floating-point.
117
118- "Name": String. The sensor name, which this sensor will be known as. A
119  mandatory component of the `entity-manager` configuration, and the resulting
120  D-Bus object path.
121
122- "Units": String. This parameter is unique to _ExternalSensor_. As
123  _ExternalSensor_ is not tied to any particular physical hardware, it can
124  measure any physical quantity supported by OpenBMC. This string will be
125  translated to another string via a lookup table[19], and forms another
126  mandatory component of the D-Bus object path.
127
128- "MinValue": Numeric. The minimum valid value for this sensor. Although not
129  used by _ExternalSensor_ directly, it is a valuable hint for services such as
130  IPMI, which need to know the minimum and maximum valid sensor values in order
131  to scale their reporting range accurately. As _ExternalSensor_ is not tied to
132  one particular physical quantity, there is no suitable default value for
133  minimum and maximum. Thus, unlike other sensor daemons where this parameter is
134  optional, in _ExternalSensor_ it is mandatory.
135
136- "MaxValue": Numeric. The maximum valid value for this sensor. It is treated
137  similarly to "MinValue".
138
139- "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the
140  timeout value, in seconds. If this amount of time elapses with no new updates
141  received over D-Bus from the external source, this sensor will be deemed
142  stale. The value of this sensor will be replaced with floating-point _NaN_, as
143  described above. This field is optional. If not given, the timeout feature
144  will be disabled for this sensor (so it will never be deemed stale).
145
146- "Type": String. Must be exactly "ExternalSensor". This string is used by
147  _ExternalSensor_ to obtain configuration information from _entity-manager_
148  during initialization. This string is what differentiates JSON stanzas
149  intended for _ExternalSensor_ versus JSON stanzas intended for other
150  _dbus-sensors_ sensor daemons.
151
152- "Thresholds": JSON dictionary. This field is optional. It is passed through to
153  the main _Sensor_ class during initialization, similar to other sensor
154  daemons. Other than that, it is not used by _ExternalSensor_.
155
156- "PowerState": String. This field is optional. Similarly to "Thresholds", it is
157  passed through to the main _Sensor_ class during initialization.
158
159Here is an example. The sensor created by this stanza will form this object
160path: /xyz/openbmc_project/sensors/temperature/HostDevTemp
161
162```json
163{
164  "Name": "HostDevTemp",
165  "Units": "DegreesC",
166  "MinValue": -16.0,
167  "MaxValue": 111.5,
168  "Timeout": 4.0,
169  "Type": "ExternalSensor"
170}
171```
172
173There can be multiple _ExternalSensor_ sensors in the configuration. There is no
174set limit on the number of sensors, except what is supported by a service such
175as IPMI.
176
177## Implementation
178
179As it stands now, _ExternalSensor_ is up and running[20]. However, the timeout
180feature was originally implemented at the IPMI layer. Upon further
181investigation, it was found that IPMI was the wrong place for this feature, and
182that it should be moved within _ExternalSensor_ itself[21]. It was originally
183thought that the timeout feature would be a useful enhancement available to all
184IPMI sensors, however, expected usage of almost all external sensor updates is a
185one-shot adjustment (for example, somebody wishes to change a voltage regulator
186setting, or fan speed setting). In this case, the timeout feature would not only
187not be necessary, it would get in the way and require additional coding[22] to
188compensate for the unexpected _NaN_ value. Only sensors intended for use with
189_ExternalSensor_ are expected to receive continuous periodic updates from an
190external source, so it makes sense to move this timeout feature into
191_ExternalSensor_. This change also has the advantage of making _ExternalSensor_
192not dependent on IPMI as the only source of external updates.
193
194A challenge of generalizing the timeout feature into _ExternalSensor_, however,
195was that the existing _Sensor_ base class did not currently allow its existing
196D-Bus setter hook to be customized. This feature was straightforward to add[23].
197One limitation was that the existing _Sensor_ class, by design, dropped updates
198that duplicated the existing sensor value. For use with _ExternalSensor_, we
199want to recognize all updates received, even duplicates, as they are important
200to pet the watchdog, to avoid inadvertently triggering the timeout feature.
201However, it is still important to avoid needlessly sending the D-Bus
202_PropertiesChanged_ event for duplicate readings.
203
204The timeout value was originally a compiled-in constant. If _ExternalSensor_ is
205to succeed as a general-purpose tool, this must be configurable. It was
206straightforward to add another configurable parameter[24] to accept this timeout
207value, as shown in "Parameters" above.
208
209The hardest task of all, however, was getting it accepted upstream. If you are
210reading this, then most likely, it was successful!
211
212## Footnotes
213
214[1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com
215[2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md
216[3]: https://www.kernel.org/doc/html/latest/hwmon/index.html
217[4]: https://github.com/openbmc/entity-manager/blob/master/README.md
218[5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
219[6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476
220[7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp
221[8]: https://think-async.com/Asio/
222[9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp
223[10]:
224  https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md
225[11]:
226  https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html
227[12]: https://www.dmtf.org/standards/redfish
228[13]:
229  https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html
230[14]: https://anniecherkaev.com/the-secret-life-of-nan
231[15]:
232  https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml
233[16]: https://cr.yp.to/proto/utctai.html
234[17]: https://github.com/openbmc/openbmc/issues/1892
235[18]:
236  https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md
237[19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp
238[20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
239[21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398
240[22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294
241[23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394
242[24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397
243[25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177
244