1# How to Configure Phosphor-pid-control 2 3A system needs two groups of configurations: zones and sensors. 4 5They can come either from a dedicated config file or via D-Bus from e.g. 6`entity-manager`. 7 8## D-Bus Configuration 9 10If config file does not exist the configuration is obtained from a set of D-Bus 11interfaces. When using `entity-manager` to provide them refer to `Pid`, 12`Pid.Zone` and `Stepwise` 13[schemas](https://github.com/openbmc/entity-manager/tree/master/schemas). The 14key names are not identical to JSON but similar enough to see the 15correspondence. 16 17## Compile Flag Configuration 18 19### --strict-failsafe-pwm 20 21This build flag is used to set the fans strictly at the failsafe percent when in 22failsafe mode, even when the calculated PWM is higher than failsafe PWM. Without 23this enabled, the PWM is calculated and set to the calculated PWM **or** the 24failsafe PWM, whichever is higher. 25 26## JSON Configuration 27 28Default config file path `/usr/share/swampd/config.json` can be overridden by 29using `--conf` command line option. 30 31The JSON object should be a dictionary with two keys, `sensors` and `zones`. 32`sensors` is a list of the sensor dictionaries, whereas `zones` is a list of 33zones. 34 35### Sensors 36 37``` 38"sensors" : [ 39 { 40 "name": "fan1", 41 "type": "fan", 42 "readPath": "/xyz/openbmc_project/sensors/fan_tach/fan1", 43 "writePath": "/sys/devices/platform/ahb/ahb:apb/1e786000.pwm-tacho-controller/hwmon/**/pwm1", 44 "min": 0, 45 "max": 255, 46 "ignoreDbusMinMax": true 47 "unavailableAsFailed": true 48 }, 49 { 50 "name": "fan2", 51 "type": "fan", 52 "readPath": "/xyz/openbmc_project/sensors/fan_tach/fan2", 53 "writePath": "/sys/devices/platform/ahb/ahb:apb/1e786000.pwm-tacho-controller/hwmon/**/pwm2", 54 "min": 0, 55 "max": 255, 56 "timeout": 4, 57 }, 58... 59``` 60 61A sensor has a `name`, a `type`, a `readPath`, a `writePath`, a `minimum` value, 62a `maximum` value, a `timeout`, a `ignoreDbusMinMax` and a `unavailableAsFailed` 63value. 64 65The `name` is used to reference the sensor in the zone portion of the 66configuration. 67 68The `type` is the type of sensor it is. This influences how its value is 69treated. Supported values are: `fan`, `temp`, and `margin`. 70 71The `readPath` is the path that tells the daemon how to read the value from this 72sensor. It is optional, allowing for write-only sensors. If the value is absent 73or `None` it'll be treated as a write-only sensor. 74 75If the `readPath` value contains: `/xyz/openbmc_project/extsensors/` it'll be 76treated as a sensor hosted by the daemon itself whose value is provided 77externally. The daemon will own the sensor and publish it to dbus. This is 78currently only supported for `temp` and `margin` sensor types. 79 80If the `readPath` value contains: `/xyz/openbmc_project/` (this is checked after 81external), then it's treated as a passive dbus sensor. A passive dbus sensor is 82one that listens for property updates to receive its value instead of actively 83reading the `Value` property. 84 85If the `readPath` value contains: `/sys/` this is treated as a directly read 86sysfs path. There are two supported paths: 87 88- `/sys/class/hwmon/hwmon0/pwm1` 89- `/sys/devices/platform/ahb/1e786000.pwm-tacho-controller/hwmon/<asterisk asterisk>/pwm1` 90 91The `writePath` is the path to set the value for the sensor. This is only valid 92for a sensor of type `fan`. The path is optional. If can be empty or `None`. It 93then only supports two options. 94 95If the `writePath` value contains: `/sys/` this is treated as a directory 96written sysfs path. There are two support paths: 97 98- `/sys/class/hwmon/hwmon0/pwm1` 99- `/sys/devices/platform/ahb/1e786000.pwm-tacho-controller/hwmon/<asterisk asterisk>/pwm1` 100 101If the `writePath` value contains: 102`/xyz/openbmc_project/sensors/fan_tach/fan{N}` it sets of a sensor object that 103writes over dbus to the `xyz.openbmc_project.Control.FanPwm` interface. The 104`writePath` should be the full object path. 105 106``` 107busctl introspect xyz.openbmc_project.Hwmon-1644477290.Hwmon1 /xyz/openbmc_project/sensors/fan_tach/fan1 --no-pager 108NAME TYPE SIGNATURE RESULT/VALUE FLAGS 109org.freedesktop.DBus.Introspectable interface - - - 110.Introspect method - s - 111org.freedesktop.DBus.Peer interface - - - 112.GetMachineId method - s - 113.Ping method - - - 114org.freedesktop.DBus.Properties interface - - - 115.Get method ss v - 116.GetAll method s a{sv} - 117.Set method ssv - - 118.PropertiesChanged signal sa{sv}as - - 119xyz.openbmc_project.Control.FanPwm interface - - - 120.Target property t 255 emits-change writable 121xyz.openbmc_project.Sensor.Value interface - - - 122.MaxValue property x 0 emits-change writable 123.MinValue property x 0 emits-change writable 124.Scale property x 0 emits-change writable 125.Unit property s "xyz.openbmc_project.Sensor.Value.Uni... emits-change writable 126.Value property x 2823 emits-change writable 127``` 128 129The `minimum` and `maximum` values are optional. When `maximum` is non-zero it 130expects to write a percentage value converted to a value between the minimum and 131maximum. 132 133The `timeout` value is optional and controls the sensor failure behavior. If a 134sensor is a fan the default value is 2 seconds, otherwise it's 0. When a 135sensor's timeout is 0 it isn't checked against a read timeout failure case. If a 136sensor fails to be read within the timeout period, the zone goes into failsafe 137to handle the case where it doesn't know what to do -- as it doesn't have all 138its inputs. 139 140The `ignoreDbusMinMax` value is optional and defaults to false. The dbus passive 141sensors check for a `MinValue` and `MaxValue` and scale the incoming values via 142these. Setting this property to true will ignore `MinValue` and `MaxValue` from 143dbus and therefore won't call any passive value scaling. 144 145The `unavailableAsFailed` value is optional and defaults to true. However, some 146specific thermal sensors should not be treated as Failed when they are 147unavailable. For example, when a system is powered-off, its CPU/DIMM Temp 148sensors are unavailable, in such state these sensors should not be treated as 149Failed and trigger FailSafe. This is important for systems whose Fans are always 150on. For these specific sensors set this property to false. 151 152### Zones 153 154``` 155"zones" : [ 156 { 157 "id": 1, 158 "minThermalOutput": 3000.0, 159 "failsafePercent": 75.0, 160 "pids": [], 161... 162``` 163 164Each zone has its own fields, and a list of controllers. 165 166| field | type | meaning | 167| ------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------------- | 168| `id` | `int64_t` | This is a unique identifier for the zone. | 169| `minThermalOutput` | `double` | This is the minimum value that should be considered from the thermal outputs. Commonly used as the minimum fan RPM. | 170| `failsafePercent` | `double` | If there is a fan PID, it will use this value if the zone goes into fail-safe as the output value written to the fan's sensors. | 171| `pids` | `list of strings` | Fan and thermal controllers used by the zone. | 172 173The `id` field here is used in the d-bus path to talk to the 174`xyz.openbmc_project.Control.Mode` interface. 175 176**_TODO:_** Examine how the fan controller always treating its output as a 177percentage works for future cases. 178 179A zone collects all the setpoints and ceilings from the thermal controllers 180attached to it, selects the maximum setpoint, clamps it by the minimum ceiling 181and `minThermalOutput`; the result is used to control fans. 182 183### Controllers 184 185There are `fan`, `temp`, `margin` (PID), and `stepwise` (discrete steps) 186controllers. 187 188The `fan` PID is meant to drive fans or other cooling devices. It's expecting to 189get the setpoint value from the owning zone and then drive the fans to that 190value. 191 192A `temp` PID is meant to drive the setpoint given an absolute temperature value 193(higher value indicates warmer temperature). 194 195A `margin` PID is meant to drive the setpoint given a margin value (lower value 196indicates warmer temperature, in other words, it's the safety margin remaining 197expressed in degrees Celsius). 198 199The setpoint output from the thermal controllers is called `RPMSetpoint()` 200However, it doesn't need to be an RPM value. 201 202**_TODO:_** Rename this method and others to not say necessarily RPM. 203 204Some PID configurations have fields in common, but may be interpreted 205differently. 206 207When using D-Bus, each configuration can have a list of strings called 208`Profiles`. In this case the controller will be loaded only if at least one of 209them is returned as `Current` from an object implementing 210`xyz.openbmc_project.Control.ThermalMode` interface (which can be anywhere on 211D-Bus). `swampd` will automatically reload full configuration whenever `Current` 212is changed. 213 214D-Bus `Name` attribute is used for indexing in certain cases so should be unique 215for all defined configurations. 216 217#### PID Field 218 219If the PID `type` is not `stepwise` then the PID field is defined as follows: 220 221| field | type | meaning | 222| -------------------- | -------- | ------------------------------------------------------------------------ | 223| `samplePeriod` | `double` | How frequently the value is sampled. 0.1 for fans, 1.0 for temperatures. | 224| `proportionalCoeff` | `double` | The proportional coefficient. | 225| `integralCoeff` | `double` | The integral coefficient. | 226| `feedFwdOffsetCoeff` | `double` | The feed forward offset coefficient. | 227| `feedFwdGainCoeff` | `double` | The feed forward gain coefficient. | 228| `integralLimit_min` | `double` | The integral minimum clamp value. | 229| `integralLimit_max` | `double` | The integral maximum clamp value. | 230| `outLim_min` | `double` | The output minimum clamp value. | 231| `outLim_max` | `double` | The output maximum clamp value. | 232| `slewNeg` | `double` | Negative slew value to dampen output. | 233| `slewPos` | `double` | Positive slew value to accelerate output. | 234 235The units for the coefficients depend on the configuration of the PIDs. 236 237If the PID is a `margin` controller and its `setpoint` is in centigrade and 238output in RPM: proportionalCoeff is your p value in units: RPM/C and integral 239coefficient: RPM/C sec 240 241If the PID is a fan controller whose output is pwm: proportionalCoeff is %/RPM 242and integralCoeff is %/RPM sec. 243 244**_NOTE:_** The sample periods are specified in the configuration as they are 245used in the PID computations, however, they are not truly configurable as they 246are used for the update periods for the fan and thermal sensors. 247 248#### type == "fan" 249 250``` 251"name": "fan1-5", 252"type": "fan", 253"inputs": ["fan1", "fan5"], 254"setpoint": 90.0, 255"pid": { 256... 257} 258``` 259 260The type `fan` builds a `FanController` PID. 261 262| field | type | meaning | 263| ---------- | ----------------- | ------------------------------------------------------------------------------ | 264| `name` | `string` | The name of the PID. This is just for humans and logging. | 265| `type` | `string` | `fan` | 266| `inputs` | `list of strings` | The names of the sensor(s) that are used as input and output for the PID loop. | 267| `setpoint` | `double` | Presently UNUSED | 268| `pid` | `dictionary` | A PID dictionary detailed above. | 269 270#### type == "margin" 271 272``` 273"name": "fleetingpid0", 274"type": "margin", 275"inputs": ["fleeting0"], 276"setpoint": 10, 277"pid": { 278... 279} 280``` 281 282The type `margin` builds a `ThermalController` PID. 283 284| field | type | meaning | 285| ---------- | ----------------- | ---------------------------------------------------------------------------- | 286| `name` | `string` | The name of the PID. This is just for humans and logging. | 287| `type` | `string` | `margin` | 288| `inputs` | `list of strings` | The names of the sensor(s) that are used as input for the PID loop. | 289| `setpoint` | `double` | The setpoint value for the thermal PID. The setpoint for the margin sensors. | 290| `pid` | `dictionary` | A PID dictionary detailed above. | 291 292Each input is normally a temperature difference between some hardware threshold 293and the current state. E.g. a CPU sensor can be reporting that it's 20 degrees 294below the point when it starts thermal throttling. So the lower the margin 295temperature, the higher the corresponding absolute value. 296 297Out of all the `inputs` the minimal value is selected and used as an input for 298the PID loop. 299 300The output of a `margin` PID loop is that it sets the setpoint value for the 301zone. It does this by adding the value to a list of values. The value chosen by 302the fan PIDs (in this cascade configuration) is the maximum value. 303 304#### type == "temp" 305 306Exactly the same as `margin` but all the inputs are supposed to be absolute 307temperatures and so the maximal value is used to feed the PID loop. 308 309#### type == "stepwise" 310 311``` 312"name": "temp1", 313"type": "stepwise", 314"inputs": ["temp1"], 315"setpoint": 30.0, 316"pid": { 317 "samplePeriod": 0.1, 318 "positiveHysteresis": 1.0, 319 "negativeHysteresis": 1.0, 320 "isCeiling": false, 321 "reading": { 322 "0": 45, 323 "1": 46, 324 "2": 47, 325 }, 326 "output": { 327 "0": 5000, 328 "1": 2400, 329 "2": 2600, 330 } 331} 332``` 333 334The type `stepwise` builds a `StepwiseController`. 335 336| field | type | meaning | 337| -------- | ----------------- | -------------------------------------------------------------------------------- | 338| `name` | `string` | The name of the controller. This is just for humans and logging. | 339| `type` | `string` | `stepwise` | 340| `inputs` | `list of strings` | The names of the sensor(s) that are used as input and output for the controller. | 341| `pid` | `dictionary` | A controller settings dictionary detailed below. | 342 343The `pid` dictionary (confusingly named) is defined as follows: 344 345| field | type | meaning | 346| -------------------- | ------------ | ---------------------------------------------------------------------------------------------------- | 347| `samplePeriod` | `double` | Presently UNUSED. | 348| `reading` | `dictionary` | Enumerated list of input values, indexed from 0, must be monotonically increasing, maximum 20 items. | 349| `output` | `dictionary` | Enumerated list of output values, indexed from 0, must match the amount of `reading` items. | 350| `positiveHysteresis` | `double` | How much the input value must raise to allow the switch to the next step. | 351| `negativeHysteresis` | `double` | How much the input value must drop to allow the switch to the previous step. | 352| `isCeiling` | `bool` | Whether this controller provides a setpoint or a ceiling for the zone | 353| `setpoint` | `double` | Presently UNUSED. | 354 355**_NOTE:_** `reading` and `output` are normal arrays and not embedded in the 356dictionary in Entity Manager. 357 358Each measurement cycle out of all the `inputs` the maximum value is selected. 359Then it's compared to the list of `reading` values finding the largest that's 360still lower or equal the input (the very first item is used even if it's larger 361than the input). The corresponding `output` value is selected if hysteresis 362allows the switch (the current input value is compared with the input present at 363the moment of the previous switch). The result is added to the list of setpoints 364or ceilings for the zone depending on `isCeiling` setting. 365