1# ExternalSensor in dbus-sensors 2 3Author: Josh Lehan[1] 4 5Other contributors: Ed Tanous, Peter Lundgren, Alex Qiu 6 7Created: March 19, 2021 8 9## Introduction 10 11In OpenBMC, the _dbus-sensors_[2] package contains a suite of sensor daemons. 12Each daemon monitors a particular type of sensor. This document provides 13rationale and motivation for adding _ExternalSensor_, another sensor daemon, and 14gives some example usages of it. 15 16## Motivation 17 18There are 10 existing sensor daemons in _dbus-sensors_. Why add another sensor 19daemon? 20 21- Most of the existing sensor daemons are tied to one particular physical 22 quantity they are measuring, such as temperature, and are hardcoded as such. 23 An externally-updated sensor has no such limitation, and should be flexible 24 enough to measure any physical quantity currently supported by OpenBMC. 25 26- Essentially all of the existing sensor daemons obtain the sensor values they 27 publish to D-Bus by reading from local hardware (typically by reading from 28 virtual files provided by the _hwmon_[3] subsystem of the Linux kernel). None 29 of the daemons are currently designed with the intention of accepting values 30 pushed in from an external source. Although there is some debugging 31 functionality to add this feature to other sensor daemons[25], it is not the 32 primary purpose for which they were designed. 33 34- Even if the debugging functionality of an existing daemon were to be used, the 35 daemon would still need a valid configuration tied to recognized hardware, as 36 detected by _entity-manager_[4], in order for the daemon to properly 37 initialize itself and participate in the OpenBMC software stack. 38 39- For the same reason it is desirable for existing sensor daemons to detect and 40 properly indicate failures of their underlying hardware, it is desirable for 41 _ExternalSensor_ to detect and properly indicate loss of timely sensor updates 42 from their external source. This is a new feature, and does not cleanly fit 43 into the architecture of any existing sensor daemon, thus a new daemon is the 44 correct choice for this behavior. 45 46For these reasons, _ExternalSensor_ has been added[5], as the eleventh sensor 47daemon in _dbus-sensors_. 48 49## Design 50 51After some discussion, a proof-of-concept _HostSensor_[6] was published. This 52was a stub, but it revealed the minimal implementation that would still be 53capable of fully initializing and participating in the OpenBMC software stack. 54_ExternalSensor_ was formed by using this example _HostSensor_, and also one of 55the simplest existing sensor daemons, _HwmonTempSensor_[7], as references to 56build upon. 57 58As written, after validating parameters during initialization, there is 59essentially no work for _ExternalSensor_ to do. The main loop is mostly idle, 60remaining blocked in the Boost ASIO[8] library, handling D-Bus requests as they 61come in. This utilizes the functionality in the underlying _Sensor_[9] class, 62which already contains the D-Bus hooks necessary to receive values from the 63external source. 64 65An example external source is the IPMI service[10], receiving values from the 66host via the IPMI "Set Sensor Reading" command[11]. _ExternalSensor_ is intended 67to be source-agnostic, so it does not matter if this is IPMI or Redfish[12] or 68something else in the future, as long as they are received similarly over D-Bus. 69 70### Timeout 71 72The timeout feature is the primary feature which distinguishes _ExternalSensor_ 73from other sensor daemons. Once an external source starts providing updates, the 74external source is expected to continue to provide timely updates. Each update 75will be properly published onto D-Bus, in the usual way done by all sensor 76daemons, as a floating-point value. 77 78A timer is used, the same Boost ASIO[13] timer mechanism used by other sensor 79daemons to poll their hardware, but in this case, is used to manage how long it 80has been since the last known good external update. When the timer expires, the 81sensor value will be deemed stale, and will be replaced with floating-point 82quiet _NaN_[14]. 83 84### NaN 85 86The advantage of floating-point _NaN_ is that it is a drop-in replacement for 87the valid floating-point value of the sensor. A subtle difference of the earlier 88OpenBMC sensor "Value" schema change, from integer to floating-point, is that 89the field is essentially now nullable. Instead of having to arbitrarily choose 90an arbitrary integer value to indicate "not valid", such as -1 or 9999 or 91whatever, floating-point explicitly has _NaN_ to indicate this. So, there is no 92possibility of confusion that this will be mistaken for a valid sensor value, as 93_NaN_ is literally _not a number_, and thus can not be misparsed as a valid 94sensor reading. It thus saves having to add a second field to reliably indicate 95validity, which would break the existing schema[15]. 96 97An alternative to using _NaN_ for staleness indication would have been to use a 98timestamp, which would introduce the complication of having to parse and compare 99timestamps within OpenBMC, and all the subtle difficulties thereof[16]. What's 100more, adding a second field might require a second D-Bus message to update, and 101D-Bus messages are computationally expensive[17] and should be used sparingly. 102Periodic things like sensors, which send out regular updates, could easily lead 103to frequent D-Bus traffic and thus should be kept as minimal as practical. And 104finally, changing the Value schema would cause a large blast radius, both in 105design and in code, necessitating a large refactoring effort well beyond the 106scope of what is needed for _ExternalSensor_. 107 108### Configuration 109 110Configuring a sensor for use with _ExternalSensor_ should be done in the usual 111way[18] that is done for use with other sensor daemons, namely, a JSON 112dictionary that is an element of the "Exposes" array within a JSON configuration 113file to be read by _entity-manager_. In that JSON dictionary, the valid names 114are listed below. All of these are mandatory parameters, unless mentioned as 115optional. For fields listed as "Numeric" below, this means that it can be either 116integer or valid floating-point. 117 118- "Name": String. The sensor name, which this sensor will be known as. A 119 mandatory component of the `entity-manager` configuration, and the resulting 120 D-Bus object path. 121 122- "Units": String. This parameter is unique to _ExternalSensor_. As 123 _ExternalSensor_ is not tied to any particular physical hardware, it can 124 measure any physical quantity supported by OpenBMC. This string will be 125 translated to another string via a lookup table[19], and forms another 126 mandatory component of the D-Bus object path. 127 128- "MinValue": Numeric. The minimum valid value for this sensor. Although not 129 used by _ExternalSensor_ directly, it is a valuable hint for services such as 130 IPMI, which need to know the minimum and maximum valid sensor values in order 131 to scale their reporting range accurately. As _ExternalSensor_ is not tied to 132 one particular physical quantity, there is no suitable default value for 133 minimum and maximum. Thus, unlike other sensor daemons where this parameter is 134 optional, in _ExternalSensor_ it is mandatory. 135 136- "MaxValue": Numeric. The maximum valid value for this sensor. It is treated 137 similarly to "MinValue". 138 139- "Timeout": Numeric. This parameter is unique to _ExternalSensor_. It is the 140 timeout value, in seconds. If this amount of time elapses with no new updates 141 received over D-Bus from the external source, this sensor will be deemed 142 stale. The value of this sensor will be replaced with floating-point _NaN_, as 143 described above. This field is optional. If not given, the timeout feature 144 will be disabled for this sensor (so it will never be deemed stale). 145 146- "Type": String. Must be exactly "ExternalSensor". This string is used by 147 _ExternalSensor_ to obtain configuration information from _entity-manager_ 148 during initialization. This string is what differentiates JSON stanzas 149 intended for _ExternalSensor_ versus JSON stanzas intended for other 150 _dbus-sensors_ sensor daemons. 151 152- "Thresholds": JSON dictionary. This field is optional. It is passed through to 153 the main _Sensor_ class during initialization, similar to other sensor 154 daemons. Other than that, it is not used by _ExternalSensor_. 155 156- "PowerState": String. This field is optional. Similarly to "Thresholds", it is 157 passed through to the main _Sensor_ class during initialization. 158 159Here is an example. The sensor created by this stanza will form this object 160path: /xyz/openbmc_project/sensors/temperature/HostDevTemp 161 162```json 163{ 164 "Name": "HostDevTemp", 165 "Units": "DegreesC", 166 "MinValue": -16.0, 167 "MaxValue": 111.5, 168 "Timeout": 4.0, 169 "Type": "ExternalSensor" 170} 171``` 172 173There can be multiple _ExternalSensor_ sensors in the configuration. There is no 174set limit on the number of sensors, except what is supported by a service such 175as IPMI. 176 177## Implementation 178 179As it stands now, _ExternalSensor_ is up and running[20]. However, the timeout 180feature was originally implemented at the IPMI layer. Upon further 181investigation, it was found that IPMI was the wrong place for this feature, and 182that it should be moved within _ExternalSensor_ itself[21]. It was originally 183thought that the timeout feature would be a useful enhancement available to all 184IPMI sensors, however, expected usage of almost all external sensor updates is a 185one-shot adjustment (for example, somebody wishes to change a voltage regulator 186setting, or fan speed setting). In this case, the timeout feature would not only 187not be necessary, it would get in the way and require additional coding[22] to 188compensate for the unexpected _NaN_ value. Only sensors intended for use with 189_ExternalSensor_ are expected to receive continuous periodic updates from an 190external source, so it makes sense to move this timeout feature into 191_ExternalSensor_. This change also has the advantage of making _ExternalSensor_ 192not dependent on IPMI as the only source of external updates. 193 194A challenge of generalizing the timeout feature into _ExternalSensor_, however, 195was that the existing _Sensor_ base class did not currently allow its existing 196D-Bus setter hook to be customized. This feature was straightforward to add[23]. 197One limitation was that the existing _Sensor_ class, by design, dropped updates 198that duplicated the existing sensor value. For use with _ExternalSensor_, we 199want to recognize all updates received, even duplicates, as they are important 200to pet the watchdog, to avoid inadvertently triggering the timeout feature. 201However, it is still important to avoid needlessly sending the D-Bus 202_PropertiesChanged_ event for duplicate readings. 203 204The timeout value was originally a compiled-in constant. If _ExternalSensor_ is 205to succeed as a general-purpose tool, this must be configurable. It was 206straightforward to add another configurable parameter[24] to accept this timeout 207value, as shown in "Parameters" above. 208 209The hardest task of all, however, was getting it accepted upstream. If you are 210reading this, then most likely, it was successful! 211 212## Footnotes 213 214[1]: https://gerrit.openbmc.org/q/owner:krellan%2540google.com 215[2]: https://github.com/openbmc/dbus-sensors/blob/master/README.md 216[3]: https://www.kernel.org/doc/html/latest/hwmon/index.html 217[4]: https://github.com/openbmc/entity-manager/blob/master/README.md 218[5]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206 219[6]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/35476 220[7]: https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempMain.cpp 221[8]: https://think-async.com/Asio/ 222[9]: https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp 223[10]: 224 https://github.com/openbmc/docs/blob/master/architecture/ipmi-architecture.md 225[11]: 226 https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-intelligent-platform-mgt-interface-spec-2nd-gen-v2-0-spec-update.html 227[12]: https://www.dmtf.org/standards/redfish 228[13]: 229 https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/timers.html 230[14]: https://anniecherkaev.com/the-secret-life-of-nan 231[15]: 232 https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Sensor/Value.interface.yaml 233[16]: https://cr.yp.to/proto/utctai.html 234[17]: https://github.com/openbmc/openbmc/issues/1892 235[18]: 236 https://github.com/openbmc/entity-manager/blob/master/docs/my_first_sensors.md 237[19]: https://github.com/openbmc/dbus-sensors/blob/master/src/SensorPaths.cpp 238[20]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/36206 239[21]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41398 240[22]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/39294 241[23]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/41394 242[24]: https://gerrit.openbmc.org/c/openbmc/entity-manager/+/41397 243[25]: https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/16177 244