1# ExternalSensor in dbus-sensors 2 3Author: Josh Lehan[^1] 4 5Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu 6 7Created: March 19, 2021 8 9## Introduction 10 11In OpenBMC, the _dbus-sensors_[^2] package contains a suite of sensor daemons. 12Each daemon monitors a particular type of sensor. This document provides 13rationale and motivation for adding _ExternalSensor_, another sensor daemon, and 14gives some example usages of it. 15 16## Motivation 17 18There are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor 19daemon? 20 21* Most of the existing sensor daemons are tied to one particular physical 22 quantity they are measuring, such as temperature, and are hardcoded as such. 23 An externally-updated sensor has no such limitation, and should be flexible 24 enough to measure any physical quantity currently supported by OpenBMC. 25 26* Essentially all of the existing sensor daemons obtain the sensor values they 27 publish to D-Bus by reading from local hardware (typically by reading from 28 virtual files provided by the _hwmon_[^3] subsystem of the Linux kernel). 29 None of the daemons are currently designed with the intention of accepting 30 values pushed in from an external source. Although there is some debugging 31 functionality to add this feature to other sensor daemons[^25], it is not 32 the primary purpose for which they were designed. 33 34* Even if the debugging functionality of an existing daemon were to be used, 35 the daemon would still need a valid configuration tied to recognized 36 hardware, as detected by _entity-manager_[^4], in order for the daemon to 37 properly initialize itself and participate in the OpenBMC software stack. 38 39* For the same reason it is desirable for existing sensor daemons to detect 40 and properly indicate failures of their underlying hardware, it is desirable 41 for _ExternalSensor_ to detect and properly indicate loss of timely sensor 42 updates from their external source. This is a new feature, and does not 43 cleanly fit into the architecture of any existing sensor daemon, thus a new 44 daemon is the correct choice for this behavior. 45 46For these reasons, _ExternalSensor_ has been added[^5], as the eleventh sensor 47daemon in _dbus-sensors_. 48 49## Design 50 51After some discussion, a proof-of-concept _HostSensor_[^6] was published. This 52was a stub, but it revealed the minimal implementation that would still be 53capable of fully initializing and participating in the OpenBMC software stack. 54_ExternalSensor_ was formed by using this example _HostSensor_, and also one of 55the simplest existing sensor daemons, _HwmonTempSensor_[^7], as references to 56build upon. 57 58As written, after validating parameters during initialization, there is 59essentially no work for _ExternalSensor_ to do. The main loop is mostly idle, 60remaining blocked in the Boost ASIO[^8] library, handling D-Bus requests as they 61come in. This utilizes the functionality in the underlying _Sensor_[^9] class, 62which already contains the D-Bus hooks necessary to receive values from the 63external source. 64 65An example external source is the IPMI service[^10], receiving values from the 66host via the IPMI "Set Sensor Reading" command[^11]. _ExternalSensor_ is 67intended to be source-agnostic, so it does not matter if this is IPMI or 68Redfish[^12] or something else in the future, as long as they are received 69similarly over D-Bus. 70 71### Timeout 72 73The timeout feature is the primary feature which distinguishes _ExternalSensor_ 74from other sensor daemons. Once an external source starts providing updates, the 75external source is expected to continue to provide timely updates. Each update 76will be properly published onto D-Bus, in the usual way done by all sensor 77daemons, as a floating-point value. 78 79A timer is used, the same Boost ASIO[^13] timer mechanism used by other sensor 80daemons to poll their hardware, but in this case, is used to manage how long it 81has been since the last known good external update. When the timer expires, the 82sensor value will be deemed stale, and will be replaced with floating-point 83quiet _NaN_[^14]. 84 85### NaN 86 87The advantage of floating-point _NaN_ is that it is a drop-in replacement for 88the valid floating-point value of the sensor. A subtle difference of the earlier 89OpenBMC sensor "Value" schema change, from integer to floating-point, is that 90the field is essentially now nullable. Instead of having to arbitrarily choose 91an arbitrary integer value to indicate "not valid", such as -1 or 9999 or 92whatever, floating-point explicitly has _NaN_ to indicate this. So, there is no 93possibility of confusion that this will be mistaken for a valid sensor value, as 94_NaN_ is literally _not a number_, and thus can not be misparsed as a valid 95sensor reading. It thus saves having to add a second field to reliably indicate 96validity, which would break the existing schema[^15]. 97 98An alternative to using _NaN_ for staleness indication would have been to use a 99timestamp, which would introduce the complication of having to parse and compare 100timestamps within OpenBMC, and all the subtle difficulties thereof[^16]. What's 101more, adding a second field might require a second D-Bus message to update, and 102D-Bus messages are computationally expensive[^17] and should be used sparingly. 103Periodic things like sensors, which send out regular updates, could easily lead 104to frequent D-Bus traffic and thus should be kept as minimal as practical. And 105finally, changing the Value schema would cause a large blast radius, both in 106design and in code, necessitating a large refactoring effort well beyond the 107scope of what is needed for _ExternalSensor_. 108 109### Configuration 110 111Configuring a sensor for use with _ExternalSensor_ should be done in the usual 112way[^18] that is done for use with other sensor daemons, namely, a JSON 113dictionary that is an element of the "Exposes" array within a JSON configuration 114file to be read by _entity-manager_. In that JSON dictionary, the valid names 115are listed below. All of these are mandatory parameters, unless mentioned as 116optional. For fields listed as "Numeric" below, this means that it can be either 117integer or valid floating-point. 118 119* "Name": String. The sensor name, which this sensor will be known as. A 120 mandatory component of the `entity-manager` configuration, and the resulting 121 D-Bus object path. 122 123* "Units": String. This parameter is unique to _ExternalSensor_. As 124 _ExternalSensor_ is not tied to any particular physical hardware, it can 125 measure any physical quantity supported by OpenBMC. This string will be 126 translated to another string via a lookup table[^19], and forms another 127 mandatory component of the D-Bus object path. 128 129* "MinValue": Numeric. The minimum valid value for this sensor. Although not 130 used by _ExternalSensor_ directly, it is a valuable hint for services such 131 as IPMI, which need to know the minimum and maximum valid sensor values in 132 order to scale their reporting range accurately. As _ExternalSensor_ is not 133 tied to one particular physical quantity, there is no suitable default value 134 for minimum and maximum. Thus, unlike other sensor daemons where this 135 parameter is optional, in _ExternalSensor_ it is mandatory. 136 137* "MaxValue": Numeric. The maximum valid value for this sensor. It is treated 138 similarly to "MinValue". 139 140* "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the 141 timeout value, in seconds. If this amount of time elapses with no new 142 updates received over D-Bus from the external source, this sensor will be 143 deemed stale. The value of this sensor will be replaced with floating-point 144 _NaN_, as described above. This field is optional. If not given, the timeout 145 feature will be disabled for this sensor (so it will never be deemed stale). 146 147* "Type": String. Must be exactly "ExternalSensor". This string is used by 148 _ExternalSensor_ to obtain configuration information from _entity-manager_ 149 during initialization. This string is what differentiates JSON stanzas 150 intended for _ExternalSensor_ versus JSON stanzas intended for other 151 _dbus-sensors_ sensor daemons. 152 153* "Thresholds": JSON dictionary. This field is optional. It is passed through 154 to the main _Sensor_ class during initialization, similar to other sensor 155 daemons. Other than that, it is not used by _ExternalSensor_. 156 157* "PowerState": String. This field is optional. Similarly to "Thresholds", it 158 is passed through to the main _Sensor_ class during initialization. 159 160Here is an example. The sensor created by this stanza will form this object 161path: /xyz/openbmc_project/sensors/temperature/HostDevTemp 162 163``` 164 { 165 "Name": "HostDevTemp", 166 "Units": "DegreesC", 167 "MinValue": -16.0, 168 "MaxValue": 111.5, 169 "Timeout": 4.0, 170 "Type": "ExternalSensor" 171 }, 172``` 173 174There can be multiple _ExternalSensor_ sensors in the configuration. There is no 175set limit on the number of sensors, except what is supported by a service such 176as IPMI. 177 178## Implementation 179 180As it stands now, _ExternalSensor_ is up and running[^20]. However, the timeout 181feature was originally implemented at the IPMI layer. Upon further 182investigation, it was found that IPMI was the wrong place for this feature, and 183that it should be moved within _ExternalSensor_ itself[^21]. It was originally 184thought that the timeout feature would be a useful enhancement available to all 185IPMI sensors, however, expected usage of almost all external sensor updates is a 186one-shot adjustment (for example, somebody wishes to change a voltage regulator 187setting, or fan speed setting). In this case, the timeout feature would not only 188not be necessary, it would get in the way and require additional coding[^22] to 189compensate for the unexpected _NaN_ value. Only sensors intended for use with 190_ExternalSensor_ are expected to receive continuous periodic updates from an 191external source, so it makes sense to move this timeout feature into 192_ExternalSensor_. This change also has the advantage of making _ExternalSensor_ 193not dependent on IPMI as the only source of external updates. 194 195A challenge of generalizing the timeout feature into _ExternalSensor_, however, 196was that the existing _Sensor_ base class did not currently allow its existing 197D-Bus setter hook to be customized. This feature was straightforward to 198add[^23]. One limitation was that the existing _Sensor_ class, by design, 199dropped updates that duplicated the existing sensor value. For use with 200_ExternalSensor_, we want to recognize all updates received, even duplicates, as 201they are important to pet the watchdog, to avoid inadvertently triggering the 202timeout feature. However, it is still important to avoid needlessly sending the 203D-Bus _PropertiesChanged_ event for duplicate readings. 204 205The timeout value was originally a compiled-in constant. If _ExternalSensor_ is 206to succeed as a general-purpose tool, this must be configurable. It was 207straightforward to add another configurable parameter[^24] to accept this 208timeout value, as shown in "Parameters" above. 209 210The hardest task of all, however, was getting it accepted upstream. If you are 211reading this, then most likely, it was successful! 212 213## Footnotes 214 215[^1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com 216[^2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md 217[^3]: https://www.kernel.org/doc/html/latest/hwmon/index.html 218[^4]: https://github.com/openbmc/entity-manager/blob/master/README.md 219[^5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206 220[^6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476 221[^7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp 222[^8]: https://think-async.com/Asio/ 223[^9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp 224[^10]: https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md 225[^11]: https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html 226[^12]: https://www.dmtf.org/standards/redfish 227[^13]: https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html 228[^14]: https://anniecherkaev.com/the-secret-life-of-nan 229[^15]: https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/xyz/openbmc_project/Sensor/Value.interface.yaml 230[^16]: https://cr.yp.to/proto/utctai.html 231[^17]: https://github.com/openbmc/openbmc/issues/1892 232[^18]: https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md 233[^19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp 234[^20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206 235[^21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398 236[^22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294 237[^23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394 238[^24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397 239[^25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177 240