xref: /openbmc/docs/designs/external-sensor.md (revision 857060209ef5e3bef0fe5b5e6dd40cf1b55a8c20)
10ac32b44SJosh Lehan# ExternalSensor in dbus-sensors
20ac32b44SJosh Lehan
30ac32b44SJosh LehanAuthor: Josh Lehan[^1]
40ac32b44SJosh Lehan
50ac32b44SJosh LehanOther contributors: Ed Tanous, Peter Lundgren, Alex Qiu
60ac32b44SJosh Lehan
70ac32b44SJosh LehanCreated: March 19, 2021
80ac32b44SJosh Lehan
90ac32b44SJosh Lehan## Introduction
100ac32b44SJosh Lehan
110ac32b44SJosh LehanIn OpenBMC, the _dbus-sensors_[^2] package contains a suite of sensor daemons.
120ac32b44SJosh LehanEach daemon monitors a particular type of sensor. This document provides
130ac32b44SJosh Lehanrationale and motivation for adding _ExternalSensor_, another sensor daemon, and
140ac32b44SJosh Lehangives some example usages of it.
150ac32b44SJosh Lehan
160ac32b44SJosh Lehan## Motivation
170ac32b44SJosh Lehan
180ac32b44SJosh LehanThere are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor
190ac32b44SJosh Lehandaemon?
200ac32b44SJosh Lehan
21f4febd00SPatrick Williams- Most of the existing sensor daemons are tied to one particular physical
220ac32b44SJosh Lehan  quantity they are measuring, such as temperature, and are hardcoded as such.
230ac32b44SJosh Lehan  An externally-updated sensor has no such limitation, and should be flexible
240ac32b44SJosh Lehan  enough to measure any physical quantity currently supported by OpenBMC.
250ac32b44SJosh Lehan
26f4febd00SPatrick Williams- Essentially all of the existing sensor daemons obtain the sensor values they
270ac32b44SJosh Lehan  publish to D-Bus by reading from local hardware (typically by reading from
28f4febd00SPatrick Williams  virtual files provided by the _hwmon_[^3] subsystem of the Linux kernel). None
29f4febd00SPatrick Williams  of the daemons are currently designed with the intention of accepting values
30f4febd00SPatrick Williams  pushed in from an external source. Although there is some debugging
31f4febd00SPatrick Williams  functionality to add this feature to other sensor daemons[^25], it is not the
32f4febd00SPatrick Williams  primary purpose for which they were designed.
330ac32b44SJosh Lehan
34f4febd00SPatrick Williams- Even if the debugging functionality of an existing daemon were to be used, the
35f4febd00SPatrick Williams  daemon would still need a valid configuration tied to recognized hardware, as
36f4febd00SPatrick Williams  detected by _entity-manager_[^4], in order for the daemon to properly
37f4febd00SPatrick Williams  initialize itself and participate in the OpenBMC software stack.
380ac32b44SJosh Lehan
39f4febd00SPatrick Williams- For the same reason it is desirable for existing sensor daemons to detect and
40f4febd00SPatrick Williams  properly indicate failures of their underlying hardware, it is desirable for
41f4febd00SPatrick Williams  _ExternalSensor_ to detect and properly indicate loss of timely sensor updates
42f4febd00SPatrick Williams  from their external source. This is a new feature, and does not cleanly fit
43f4febd00SPatrick Williams  into the architecture of any existing sensor daemon, thus a new daemon is the
44f4febd00SPatrick Williams  correct choice for this behavior.
450ac32b44SJosh Lehan
460ac32b44SJosh LehanFor these reasons, _ExternalSensor_ has been added[^5], as the eleventh sensor
470ac32b44SJosh Lehandaemon in _dbus-sensors_.
480ac32b44SJosh Lehan
490ac32b44SJosh Lehan## Design
500ac32b44SJosh Lehan
510ac32b44SJosh LehanAfter some discussion, a proof-of-concept _HostSensor_[^6] was published. This
520ac32b44SJosh Lehanwas a stub, but it revealed the minimal implementation that would still be
530ac32b44SJosh Lehancapable of fully initializing and participating in the OpenBMC software stack.
540ac32b44SJosh Lehan_ExternalSensor_ was formed by using this example _HostSensor_, and also one of
550ac32b44SJosh Lehanthe simplest existing sensor daemons, _HwmonTempSensor_[^7], as references to
560ac32b44SJosh Lehanbuild upon.
570ac32b44SJosh Lehan
580ac32b44SJosh LehanAs written, after validating parameters during initialization, there is
590ac32b44SJosh Lehanessentially no work for _ExternalSensor_ to do. The main loop is mostly idle,
600ac32b44SJosh Lehanremaining blocked in the Boost ASIO[^8] library, handling D-Bus requests as they
610ac32b44SJosh Lehancome in. This utilizes the functionality in the underlying _Sensor_[^9] class,
620ac32b44SJosh Lehanwhich already contains the D-Bus hooks necessary to receive values from the
630ac32b44SJosh Lehanexternal source.
640ac32b44SJosh Lehan
650ac32b44SJosh LehanAn example external source is the IPMI service[^10], receiving values from the
660ac32b44SJosh Lehanhost via the IPMI "Set Sensor Reading" command[^11]. _ExternalSensor_ is
670ac32b44SJosh Lehanintended to be source-agnostic, so it does not matter if this is IPMI or
680ac32b44SJosh LehanRedfish[^12] or something else in the future, as long as they are received
690ac32b44SJosh Lehansimilarly over D-Bus.
700ac32b44SJosh Lehan
710ac32b44SJosh Lehan### Timeout
720ac32b44SJosh Lehan
730ac32b44SJosh LehanThe timeout feature is the primary feature which distinguishes _ExternalSensor_
740ac32b44SJosh Lehanfrom other sensor daemons. Once an external source starts providing updates, the
750ac32b44SJosh Lehanexternal source is expected to continue to provide timely updates. Each update
760ac32b44SJosh Lehanwill be properly published onto D-Bus, in the usual way done by all sensor
770ac32b44SJosh Lehandaemons, as a floating-point value.
780ac32b44SJosh Lehan
790ac32b44SJosh LehanA timer is used, the same Boost ASIO[^13] timer mechanism used by other sensor
800ac32b44SJosh Lehandaemons to poll their hardware, but in this case, is used to manage how long it
810ac32b44SJosh Lehanhas been since the last known good external update. When the timer expires, the
820ac32b44SJosh Lehansensor value will be deemed stale, and will be replaced with floating-point
830ac32b44SJosh Lehanquiet _NaN_[^14].
840ac32b44SJosh Lehan
850ac32b44SJosh Lehan### NaN
860ac32b44SJosh Lehan
870ac32b44SJosh LehanThe advantage of floating-point _NaN_ is that it is a drop-in replacement for
880ac32b44SJosh Lehanthe valid floating-point value of the sensor. A subtle difference of the earlier
890ac32b44SJosh LehanOpenBMC sensor "Value" schema change, from integer to floating-point, is that
900ac32b44SJosh Lehanthe field is essentially now nullable. Instead of having to arbitrarily choose
910ac32b44SJosh Lehanan arbitrary integer value to indicate "not valid", such as -1 or 9999 or
920ac32b44SJosh Lehanwhatever, floating-point explicitly has _NaN_ to indicate this. So, there is no
930ac32b44SJosh Lehanpossibility of confusion that this will be mistaken for a valid sensor value, as
940ac32b44SJosh Lehan_NaN_ is literally _not a number_, and thus can not be misparsed as a valid
950ac32b44SJosh Lehansensor reading. It thus saves having to add a second field to reliably indicate
960ac32b44SJosh Lehanvalidity, which would break the existing schema[^15].
970ac32b44SJosh Lehan
980ac32b44SJosh LehanAn alternative to using _NaN_ for staleness indication would have been to use a
990ac32b44SJosh Lehantimestamp, which would introduce the complication of having to parse and compare
1000ac32b44SJosh Lehantimestamps within OpenBMC, and all the subtle difficulties thereof[^16]. What's
1010ac32b44SJosh Lehanmore, adding a second field might require a second D-Bus message to update, and
1020ac32b44SJosh LehanD-Bus messages are computationally expensive[^17] and should be used sparingly.
1030ac32b44SJosh LehanPeriodic things like sensors, which send out regular updates, could easily lead
1040ac32b44SJosh Lehanto frequent D-Bus traffic and thus should be kept as minimal as practical. And
1050ac32b44SJosh Lehanfinally, changing the Value schema would cause a large blast radius, both in
1060ac32b44SJosh Lehandesign and in code, necessitating a large refactoring effort well beyond the
1070ac32b44SJosh Lehanscope of what is needed for _ExternalSensor_.
1080ac32b44SJosh Lehan
1090ac32b44SJosh Lehan### Configuration
1100ac32b44SJosh Lehan
1110ac32b44SJosh LehanConfiguring a sensor for use with _ExternalSensor_ should be done in the usual
1120ac32b44SJosh Lehanway[^18] that is done for use with other sensor daemons, namely, a JSON
1130ac32b44SJosh Lehandictionary that is an element of the "Exposes" array within a JSON configuration
1140ac32b44SJosh Lehanfile to be read by _entity-manager_. In that JSON dictionary, the valid names
1150ac32b44SJosh Lehanare listed below. All of these are mandatory parameters, unless mentioned as
1160ac32b44SJosh Lehanoptional. For fields listed as "Numeric" below, this means that it can be either
1170ac32b44SJosh Lehaninteger or valid floating-point.
1180ac32b44SJosh Lehan
119f4febd00SPatrick Williams- "Name": String. The sensor name, which this sensor will be known as. A
1200ac32b44SJosh Lehan  mandatory component of the `entity-manager` configuration, and the resulting
1210ac32b44SJosh Lehan  D-Bus object path.
1220ac32b44SJosh Lehan
123f4febd00SPatrick Williams- "Units": String. This parameter is unique to _ExternalSensor_. As
1240ac32b44SJosh Lehan  _ExternalSensor_ is not tied to any particular physical hardware, it can
1250ac32b44SJosh Lehan  measure any physical quantity supported by OpenBMC. This string will be
1260ac32b44SJosh Lehan  translated to another string via a lookup table[^19], and forms another
1270ac32b44SJosh Lehan  mandatory component of the D-Bus object path.
1280ac32b44SJosh Lehan
129f4febd00SPatrick Williams- "MinValue": Numeric. The minimum valid value for this sensor. Although not
130f4febd00SPatrick Williams  used by _ExternalSensor_ directly, it is a valuable hint for services such as
131f4febd00SPatrick Williams  IPMI, which need to know the minimum and maximum valid sensor values in order
132f4febd00SPatrick Williams  to scale their reporting range accurately. As _ExternalSensor_ is not tied to
133f4febd00SPatrick Williams  one particular physical quantity, there is no suitable default value for
134f4febd00SPatrick Williams  minimum and maximum. Thus, unlike other sensor daemons where this parameter is
135f4febd00SPatrick Williams  optional, in _ExternalSensor_ it is mandatory.
1360ac32b44SJosh Lehan
137f4febd00SPatrick Williams- "MaxValue": Numeric. The maximum valid value for this sensor. It is treated
1380ac32b44SJosh Lehan  similarly to "MinValue".
1390ac32b44SJosh Lehan
140f4febd00SPatrick Williams- "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the
141f4febd00SPatrick Williams  timeout value, in seconds. If this amount of time elapses with no new updates
142f4febd00SPatrick Williams  received over D-Bus from the external source, this sensor will be deemed
143f4febd00SPatrick Williams  stale. The value of this sensor will be replaced with floating-point _NaN_, as
144f4febd00SPatrick Williams  described above. This field is optional. If not given, the timeout feature
145f4febd00SPatrick Williams  will be disabled for this sensor (so it will never be deemed stale).
1460ac32b44SJosh Lehan
147f4febd00SPatrick Williams- "Type": String. Must be exactly "ExternalSensor". This string is used by
1480ac32b44SJosh Lehan  _ExternalSensor_ to obtain configuration information from _entity-manager_
1490ac32b44SJosh Lehan  during initialization. This string is what differentiates JSON stanzas
1500ac32b44SJosh Lehan  intended for _ExternalSensor_ versus JSON stanzas intended for other
1510ac32b44SJosh Lehan  _dbus-sensors_ sensor daemons.
1520ac32b44SJosh Lehan
153f4febd00SPatrick Williams- "Thresholds": JSON dictionary. This field is optional. It is passed through to
154f4febd00SPatrick Williams  the main _Sensor_ class during initialization, similar to other sensor
1550ac32b44SJosh Lehan  daemons. Other than that, it is not used by _ExternalSensor_.
1560ac32b44SJosh Lehan
157f4febd00SPatrick Williams- "PowerState": String. This field is optional. Similarly to "Thresholds", it is
158f4febd00SPatrick Williams  passed through to the main _Sensor_ class during initialization.
1590ac32b44SJosh Lehan
1600ac32b44SJosh LehanHere is an example. The sensor created by this stanza will form this object
1610ac32b44SJosh Lehanpath: /xyz/openbmc_project/sensors/temperature/HostDevTemp
1620ac32b44SJosh Lehan
1630ac32b44SJosh Lehan```
1640ac32b44SJosh Lehan        {
1650ac32b44SJosh Lehan            "Name": "HostDevTemp",
1660ac32b44SJosh Lehan            "Units": "DegreesC",
1670ac32b44SJosh Lehan            "MinValue": -16.0,
1680ac32b44SJosh Lehan            "MaxValue": 111.5,
1690ac32b44SJosh Lehan            "Timeout": 4.0,
1700ac32b44SJosh Lehan            "Type": "ExternalSensor"
1710ac32b44SJosh Lehan        },
1720ac32b44SJosh Lehan```
1730ac32b44SJosh Lehan
1740ac32b44SJosh LehanThere can be multiple _ExternalSensor_ sensors in the configuration. There is no
1750ac32b44SJosh Lehanset limit on the number of sensors, except what is supported by a service such
1760ac32b44SJosh Lehanas IPMI.
1770ac32b44SJosh Lehan
1780ac32b44SJosh Lehan## Implementation
1790ac32b44SJosh Lehan
1800ac32b44SJosh LehanAs it stands now, _ExternalSensor_ is up and running[^20]. However, the timeout
1810ac32b44SJosh Lehanfeature was originally implemented at the IPMI layer. Upon further
1820ac32b44SJosh Lehaninvestigation, it was found that IPMI was the wrong place for this feature, and
1830ac32b44SJosh Lehanthat it should be moved within _ExternalSensor_ itself[^21]. It was originally
1840ac32b44SJosh Lehanthought that the timeout feature would be a useful enhancement available to all
1850ac32b44SJosh LehanIPMI sensors, however, expected usage of almost all external sensor updates is a
1860ac32b44SJosh Lehanone-shot adjustment (for example, somebody wishes to change a voltage regulator
1870ac32b44SJosh Lehansetting, or fan speed setting). In this case, the timeout feature would not only
1880ac32b44SJosh Lehannot be necessary, it would get in the way and require additional coding[^22] to
1890ac32b44SJosh Lehancompensate for the unexpected _NaN_ value. Only sensors intended for use with
1900ac32b44SJosh Lehan_ExternalSensor_ are expected to receive continuous periodic updates from an
1910ac32b44SJosh Lehanexternal source, so it makes sense to move this timeout feature into
1920ac32b44SJosh Lehan_ExternalSensor_. This change also has the advantage of making _ExternalSensor_
1930ac32b44SJosh Lehannot dependent on IPMI as the only source of external updates.
1940ac32b44SJosh Lehan
1950ac32b44SJosh LehanA challenge of generalizing the timeout feature into _ExternalSensor_, however,
1960ac32b44SJosh Lehanwas that the existing _Sensor_ base class did not currently allow its existing
1970ac32b44SJosh LehanD-Bus setter hook to be customized. This feature was straightforward to
1980ac32b44SJosh Lehanadd[^23]. One limitation was that the existing _Sensor_ class, by design,
1990ac32b44SJosh Lehandropped updates that duplicated the existing sensor value. For use with
2000ac32b44SJosh Lehan_ExternalSensor_, we want to recognize all updates received, even duplicates, as
2010ac32b44SJosh Lehanthey are important to pet the watchdog, to avoid inadvertently triggering the
2020ac32b44SJosh Lehantimeout feature. However, it is still important to avoid needlessly sending the
2030ac32b44SJosh LehanD-Bus _PropertiesChanged_ event for duplicate readings.
2040ac32b44SJosh Lehan
2050ac32b44SJosh LehanThe timeout value was originally a compiled-in constant. If _ExternalSensor_ is
2060ac32b44SJosh Lehanto succeed as a general-purpose tool, this must be configurable. It was
2070ac32b44SJosh Lehanstraightforward to add another configurable parameter[^24] to accept this
2080ac32b44SJosh Lehantimeout value, as shown in "Parameters" above.
2090ac32b44SJosh Lehan
2100ac32b44SJosh LehanThe hardest task of all, however, was getting it accepted upstream. If you are
2110ac32b44SJosh Lehanreading this, then most likely, it was successful!
2120ac32b44SJosh Lehan
2130ac32b44SJosh Lehan## Footnotes
2140ac32b44SJosh Lehan
2150ee8da09SNodeMan97[^1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com
216*85706020SAndrew Geissler
2170ac32b44SJosh Lehan[^2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md
218*85706020SAndrew Geissler
2190ac32b44SJosh Lehan[^3]: https://www.kernel.org/doc/html/latest/hwmon/index.html
220*85706020SAndrew Geissler
2210ac32b44SJosh Lehan[^4]: https://github.com/openbmc/entity-manager/blob/master/README.md
222*85706020SAndrew Geissler
2230ee8da09SNodeMan97[^5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
224*85706020SAndrew Geissler
2250ee8da09SNodeMan97[^6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476
226*85706020SAndrew Geissler
2270ac32b44SJosh Lehan[^7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp
228*85706020SAndrew Geissler
2290ac32b44SJosh Lehan[^8]: https://think-async.com/Asio/
230*85706020SAndrew Geissler
2310ac32b44SJosh Lehan[^9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp
232*85706020SAndrew Geissler
233f4febd00SPatrick Williams[^10]:
234f4febd00SPatrick Williams    https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md
235f4febd00SPatrick Williams
236f4febd00SPatrick Williams[^11]:
237f4febd00SPatrick Williams    https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html
238f4febd00SPatrick Williams
2390ac32b44SJosh Lehan[^12]: https://www.dmtf.org/standards/redfish
240*85706020SAndrew Geissler
241f4febd00SPatrick Williams[^13]:
242f4febd00SPatrick Williams    https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html
243f4febd00SPatrick Williams
2440ac32b44SJosh Lehan[^14]: https://anniecherkaev.com/the-secret-life-of-nan
245*85706020SAndrew Geissler
246f4febd00SPatrick Williams[^15]:
247f4febd00SPatrick Williams    https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml
248f4febd00SPatrick Williams
2490ac32b44SJosh Lehan[^16]: https://cr.yp.to/proto/utctai.html
250*85706020SAndrew Geissler
2510ac32b44SJosh Lehan[^17]: https://github.com/openbmc/openbmc/issues/1892
252*85706020SAndrew Geissler
253f4febd00SPatrick Williams[^18]:
254f4febd00SPatrick Williams    https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md
255f4febd00SPatrick Williams
2560ac32b44SJosh Lehan[^19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp
257*85706020SAndrew Geissler
2580ee8da09SNodeMan97[^20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206
259*85706020SAndrew Geissler
2600ee8da09SNodeMan97[^21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398
261*85706020SAndrew Geissler
2620ee8da09SNodeMan97[^22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294
263*85706020SAndrew Geissler
2640ee8da09SNodeMan97[^23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394
265*85706020SAndrew Geissler
2660ee8da09SNodeMan97[^24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397
267*85706020SAndrew Geissler
2680ee8da09SNodeMan97[^25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177
269