1=================================== 2Generic Thermal Sysfs driver How To 3=================================== 4 5Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com> 6 7Updated: 2 January 2008 8 9Copyright (c) 2008 Intel Corporation 10 11 120. Introduction 13=============== 14 15The generic thermal sysfs provides a set of interfaces for thermal zone 16devices (sensors) and thermal cooling devices (fan, processor...) to register 17with the thermal management solution and to be a part of it. 18 19This how-to focuses on enabling new thermal zone and cooling devices to 20participate in thermal management. 21This solution is platform independent and any type of thermal zone devices 22and cooling devices should be able to make use of the infrastructure. 23 24The main task of the thermal sysfs driver is to expose thermal zone attributes 25as well as cooling device attributes to the user space. 26An intelligent thermal management application can make decisions based on 27inputs from thermal zone attributes (the current temperature and trip point 28temperature) and throttle appropriate devices. 29 30- `[0-*]` denotes any positive number starting from 0 31- `[1-*]` denotes any positive number starting from 1 32 331. thermal sysfs driver interface functions 34=========================================== 35 361.1 thermal zone device interface 37--------------------------------- 38 39 :: 40 41 struct thermal_zone_device 42 *thermal_zone_device_register(char *type, 43 int trips, int mask, void *devdata, 44 struct thermal_zone_device_ops *ops, 45 const struct thermal_zone_params *tzp, 46 int passive_delay, int polling_delay)) 47 48 This interface function adds a new thermal zone device (sensor) to 49 /sys/class/thermal folder as `thermal_zone[0-*]`. It tries to bind all the 50 thermal cooling devices registered at the same time. 51 52 type: 53 the thermal zone type. 54 trips: 55 the total number of trip points this thermal zone supports. 56 mask: 57 Bit string: If 'n'th bit is set, then trip point 'n' is writable. 58 devdata: 59 device private data 60 ops: 61 thermal zone device call-backs. 62 63 .bind: 64 bind the thermal zone device with a thermal cooling device. 65 .unbind: 66 unbind the thermal zone device with a thermal cooling device. 67 .get_temp: 68 get the current temperature of the thermal zone. 69 .set_trips: 70 set the trip points window. Whenever the current temperature 71 is updated, the trip points immediately below and above the 72 current temperature are found. 73 .get_mode: 74 get the current mode (enabled/disabled) of the thermal zone. 75 76 - "enabled" means the kernel thermal management is 77 enabled. 78 - "disabled" will prevent kernel thermal driver action 79 upon trip points so that user applications can take 80 charge of thermal management. 81 .set_mode: 82 set the mode (enabled/disabled) of the thermal zone. 83 .get_trip_type: 84 get the type of certain trip point. 85 .get_trip_temp: 86 get the temperature above which the certain trip point 87 will be fired. 88 .set_emul_temp: 89 set the emulation temperature which helps in debugging 90 different threshold temperature points. 91 tzp: 92 thermal zone platform parameters. 93 passive_delay: 94 number of milliseconds to wait between polls when 95 performing passive cooling. 96 polling_delay: 97 number of milliseconds to wait between polls when checking 98 whether trip points have been crossed (0 for interrupt driven systems). 99 100 :: 101 102 void thermal_zone_device_unregister(struct thermal_zone_device *tz) 103 104 This interface function removes the thermal zone device. 105 It deletes the corresponding entry from /sys/class/thermal folder and 106 unbinds all the thermal cooling devices it uses. 107 108 :: 109 110 struct thermal_zone_device 111 *thermal_zone_of_sensor_register(struct device *dev, int sensor_id, 112 void *data, 113 const struct thermal_zone_of_device_ops *ops) 114 115 This interface adds a new sensor to a DT thermal zone. 116 This function will search the list of thermal zones described in 117 device tree and look for the zone that refer to the sensor device 118 pointed by dev->of_node as temperature providers. For the zone 119 pointing to the sensor node, the sensor will be added to the DT 120 thermal zone device. 121 122 The parameters for this interface are: 123 124 dev: 125 Device node of sensor containing valid node pointer in 126 dev->of_node. 127 sensor_id: 128 a sensor identifier, in case the sensor IP has more 129 than one sensors 130 data: 131 a private pointer (owned by the caller) that will be 132 passed back, when a temperature reading is needed. 133 ops: 134 `struct thermal_zone_of_device_ops *`. 135 136 ============== ======================================= 137 get_temp a pointer to a function that reads the 138 sensor temperature. This is mandatory 139 callback provided by sensor driver. 140 set_trips a pointer to a function that sets a 141 temperature window. When this window is 142 left the driver must inform the thermal 143 core via thermal_zone_device_update. 144 get_trend a pointer to a function that reads the 145 sensor temperature trend. 146 set_emul_temp a pointer to a function that sets 147 sensor emulated temperature. 148 ============== ======================================= 149 150 The thermal zone temperature is provided by the get_temp() function 151 pointer of thermal_zone_of_device_ops. When called, it will 152 have the private pointer @data back. 153 154 It returns error pointer if fails otherwise valid thermal zone device 155 handle. Caller should check the return handle with IS_ERR() for finding 156 whether success or not. 157 158 :: 159 160 void thermal_zone_of_sensor_unregister(struct device *dev, 161 struct thermal_zone_device *tzd) 162 163 This interface unregisters a sensor from a DT thermal zone which was 164 successfully added by interface thermal_zone_of_sensor_register(). 165 This function removes the sensor callbacks and private data from the 166 thermal zone device registered with thermal_zone_of_sensor_register() 167 interface. It will also silent the zone by remove the .get_temp() and 168 get_trend() thermal zone device callbacks. 169 170 :: 171 172 struct thermal_zone_device 173 *devm_thermal_zone_of_sensor_register(struct device *dev, 174 int sensor_id, 175 void *data, 176 const struct thermal_zone_of_device_ops *ops) 177 178 This interface is resource managed version of 179 thermal_zone_of_sensor_register(). 180 181 All details of thermal_zone_of_sensor_register() described in 182 section 1.1.3 is applicable here. 183 184 The benefit of using this interface to register sensor is that it 185 is not require to explicitly call thermal_zone_of_sensor_unregister() 186 in error path or during driver unbinding as this is done by driver 187 resource manager. 188 189 :: 190 191 void devm_thermal_zone_of_sensor_unregister(struct device *dev, 192 struct thermal_zone_device *tzd) 193 194 This interface is resource managed version of 195 thermal_zone_of_sensor_unregister(). 196 All details of thermal_zone_of_sensor_unregister() described in 197 section 1.1.4 is applicable here. 198 Normally this function will not need to be called and the resource 199 management code will ensure that the resource is freed. 200 201 :: 202 203 int thermal_zone_get_slope(struct thermal_zone_device *tz) 204 205 This interface is used to read the slope attribute value 206 for the thermal zone device, which might be useful for platform 207 drivers for temperature calculations. 208 209 :: 210 211 int thermal_zone_get_offset(struct thermal_zone_device *tz) 212 213 This interface is used to read the offset attribute value 214 for the thermal zone device, which might be useful for platform 215 drivers for temperature calculations. 216 2171.2 thermal cooling device interface 218------------------------------------ 219 220 221 :: 222 223 struct thermal_cooling_device 224 *thermal_cooling_device_register(char *name, 225 void *devdata, struct thermal_cooling_device_ops *) 226 227 This interface function adds a new thermal cooling device (fan/processor/...) 228 to /sys/class/thermal/ folder as `cooling_device[0-*]`. It tries to bind itself 229 to all the thermal zone devices registered at the same time. 230 231 name: 232 the cooling device name. 233 devdata: 234 device private data. 235 ops: 236 thermal cooling devices call-backs. 237 238 .get_max_state: 239 get the Maximum throttle state of the cooling device. 240 .get_cur_state: 241 get the Currently requested throttle state of the 242 cooling device. 243 .set_cur_state: 244 set the Current throttle state of the cooling device. 245 246 :: 247 248 void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) 249 250 This interface function removes the thermal cooling device. 251 It deletes the corresponding entry from /sys/class/thermal folder and 252 unbinds itself from all the thermal zone devices using it. 253 2541.3 interface for binding a thermal zone device with a thermal cooling device 255----------------------------------------------------------------------------- 256 257 :: 258 259 int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, 260 int trip, struct thermal_cooling_device *cdev, 261 unsigned long upper, unsigned long lower, unsigned int weight); 262 263 This interface function binds a thermal cooling device to a particular trip 264 point of a thermal zone device. 265 266 This function is usually called in the thermal zone device .bind callback. 267 268 tz: 269 the thermal zone device 270 cdev: 271 thermal cooling device 272 trip: 273 indicates which trip point in this thermal zone the cooling device 274 is associated with. 275 upper: 276 the Maximum cooling state for this trip point. 277 THERMAL_NO_LIMIT means no upper limit, 278 and the cooling device can be in max_state. 279 lower: 280 the Minimum cooling state can be used for this trip point. 281 THERMAL_NO_LIMIT means no lower limit, 282 and the cooling device can be in cooling state 0. 283 weight: 284 the influence of this cooling device in this thermal 285 zone. See 1.4.1 below for more information. 286 287 :: 288 289 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz, 290 int trip, struct thermal_cooling_device *cdev); 291 292 This interface function unbinds a thermal cooling device from a particular 293 trip point of a thermal zone device. This function is usually called in 294 the thermal zone device .unbind callback. 295 296 tz: 297 the thermal zone device 298 cdev: 299 thermal cooling device 300 trip: 301 indicates which trip point in this thermal zone the cooling device 302 is associated with. 303 3041.4 Thermal Zone Parameters 305--------------------------- 306 307 :: 308 309 struct thermal_bind_params 310 311 This structure defines the following parameters that are used to bind 312 a zone with a cooling device for a particular trip point. 313 314 .cdev: 315 The cooling device pointer 316 .weight: 317 The 'influence' of a particular cooling device on this 318 zone. This is relative to the rest of the cooling 319 devices. For example, if all cooling devices have a 320 weight of 1, then they all contribute the same. You can 321 use percentages if you want, but it's not mandatory. A 322 weight of 0 means that this cooling device doesn't 323 contribute to the cooling of this zone unless all cooling 324 devices have a weight of 0. If all weights are 0, then 325 they all contribute the same. 326 .trip_mask: 327 This is a bit mask that gives the binding relation between 328 this thermal zone and cdev, for a particular trip point. 329 If nth bit is set, then the cdev and thermal zone are bound 330 for trip point n. 331 .binding_limits: 332 This is an array of cooling state limits. Must have 333 exactly 2 * thermal_zone.number_of_trip_points. It is an 334 array consisting of tuples <lower-state upper-state> of 335 state limits. Each trip will be associated with one state 336 limit tuple when binding. A NULL pointer means 337 <THERMAL_NO_LIMITS THERMAL_NO_LIMITS> on all trips. 338 These limits are used when binding a cdev to a trip point. 339 .match: 340 This call back returns success(0) if the 'tz and cdev' need to 341 be bound, as per platform data. 342 343 :: 344 345 struct thermal_zone_params 346 347 This structure defines the platform level parameters for a thermal zone. 348 This data, for each thermal zone should come from the platform layer. 349 This is an optional feature where some platforms can choose not to 350 provide this data. 351 352 .governor_name: 353 Name of the thermal governor used for this zone 354 .no_hwmon: 355 a boolean to indicate if the thermal to hwmon sysfs interface 356 is required. when no_hwmon == false, a hwmon sysfs interface 357 will be created. when no_hwmon == true, nothing will be done. 358 In case the thermal_zone_params is NULL, the hwmon interface 359 will be created (for backward compatibility). 360 .num_tbps: 361 Number of thermal_bind_params entries for this zone 362 .tbp: 363 thermal_bind_params entries 364 3652. sysfs attributes structure 366============================= 367 368== ================ 369RO read only value 370WO write only value 371RW read/write value 372== ================ 373 374Thermal sysfs attributes will be represented under /sys/class/thermal. 375Hwmon sysfs I/F extension is also available under /sys/class/hwmon 376if hwmon is compiled in or built as a module. 377 378Thermal zone device sys I/F, created once it's registered:: 379 380 /sys/class/thermal/thermal_zone[0-*]: 381 |---type: Type of the thermal zone 382 |---temp: Current temperature 383 |---mode: Working mode of the thermal zone 384 |---policy: Thermal governor used for this zone 385 |---available_policies: Available thermal governors for this zone 386 |---trip_point_[0-*]_temp: Trip point temperature 387 |---trip_point_[0-*]_type: Trip point type 388 |---trip_point_[0-*]_hyst: Hysteresis value for this trip point 389 |---emul_temp: Emulated temperature set node 390 |---sustainable_power: Sustainable dissipatable power 391 |---k_po: Proportional term during temperature overshoot 392 |---k_pu: Proportional term during temperature undershoot 393 |---k_i: PID's integral term in the power allocator gov 394 |---k_d: PID's derivative term in the power allocator 395 |---integral_cutoff: Offset above which errors are accumulated 396 |---slope: Slope constant applied as linear extrapolation 397 |---offset: Offset constant applied as linear extrapolation 398 399Thermal cooling device sys I/F, created once it's registered:: 400 401 /sys/class/thermal/cooling_device[0-*]: 402 |---type: Type of the cooling device(processor/fan/...) 403 |---max_state: Maximum cooling state of the cooling device 404 |---cur_state: Current cooling state of the cooling device 405 |---stats: Directory containing cooling device's statistics 406 |---stats/reset: Writing any value resets the statistics 407 |---stats/time_in_state_ms: Time (msec) spent in various cooling states 408 |---stats/total_trans: Total number of times cooling state is changed 409 |---stats/trans_table: Cooling state transition table 410 411 412Then next two dynamic attributes are created/removed in pairs. They represent 413the relationship between a thermal zone and its associated cooling device. 414They are created/removed for each successful execution of 415thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device. 416 417:: 418 419 /sys/class/thermal/thermal_zone[0-*]: 420 |---cdev[0-*]: [0-*]th cooling device in current thermal zone 421 |---cdev[0-*]_trip_point: Trip point that cdev[0-*] is associated with 422 |---cdev[0-*]_weight: Influence of the cooling device in 423 this thermal zone 424 425Besides the thermal zone device sysfs I/F and cooling device sysfs I/F, 426the generic thermal driver also creates a hwmon sysfs I/F for each _type_ 427of thermal zone device. E.g. the generic thermal driver registers one hwmon 428class device and build the associated hwmon sysfs I/F for all the registered 429ACPI thermal zones. 430 431:: 432 433 /sys/class/hwmon/hwmon[0-*]: 434 |---name: The type of the thermal zone devices 435 |---temp[1-*]_input: The current temperature of thermal zone [1-*] 436 |---temp[1-*]_critical: The critical trip point of thermal zone [1-*] 437 438Please read Documentation/hwmon/sysfs-interface.rst for additional information. 439 440Thermal zone attributes 441----------------------- 442 443type 444 Strings which represent the thermal zone type. 445 This is given by thermal zone driver as part of registration. 446 E.g: "acpitz" indicates it's an ACPI thermal device. 447 In order to keep it consistent with hwmon sys attribute; this should 448 be a short, lowercase string, not containing spaces nor dashes. 449 RO, Required 450 451temp 452 Current temperature as reported by thermal zone (sensor). 453 Unit: millidegree Celsius 454 RO, Required 455 456mode 457 One of the predefined values in [enabled, disabled]. 458 This file gives information about the algorithm that is currently 459 managing the thermal zone. It can be either default kernel based 460 algorithm or user space application. 461 462 enabled 463 enable Kernel Thermal management. 464 disabled 465 Preventing kernel thermal zone driver actions upon 466 trip points so that user application can take full 467 charge of the thermal management. 468 469 RW, Optional 470 471policy 472 One of the various thermal governors used for a particular zone. 473 474 RW, Required 475 476available_policies 477 Available thermal governors which can be used for a particular zone. 478 479 RO, Required 480 481`trip_point_[0-*]_temp` 482 The temperature above which trip point will be fired. 483 484 Unit: millidegree Celsius 485 486 RO, Optional 487 488`trip_point_[0-*]_type` 489 Strings which indicate the type of the trip point. 490 491 E.g. it can be one of critical, hot, passive, `active[0-*]` for ACPI 492 thermal zone. 493 494 RO, Optional 495 496`trip_point_[0-*]_hyst` 497 The hysteresis value for a trip point, represented as an integer 498 Unit: Celsius 499 RW, Optional 500 501`cdev[0-*]` 502 Sysfs link to the thermal cooling device node where the sys I/F 503 for cooling device throttling control represents. 504 505 RO, Optional 506 507`cdev[0-*]_trip_point` 508 The trip point in this thermal zone which `cdev[0-*]` is associated 509 with; -1 means the cooling device is not associated with any trip 510 point. 511 512 RO, Optional 513 514`cdev[0-*]_weight` 515 The influence of `cdev[0-*]` in this thermal zone. This value 516 is relative to the rest of cooling devices in the thermal 517 zone. For example, if a cooling device has a weight double 518 than that of other, it's twice as effective in cooling the 519 thermal zone. 520 521 RW, Optional 522 523emul_temp 524 Interface to set the emulated temperature method in thermal zone 525 (sensor). After setting this temperature, the thermal zone may pass 526 this temperature to platform emulation function if registered or 527 cache it locally. This is useful in debugging different temperature 528 threshold and its associated cooling action. This is write only node 529 and writing 0 on this node should disable emulation. 530 Unit: millidegree Celsius 531 532 WO, Optional 533 534 WARNING: 535 Be careful while enabling this option on production systems, 536 because userland can easily disable the thermal policy by simply 537 flooding this sysfs node with low temperature values. 538 539sustainable_power 540 An estimate of the sustained power that can be dissipated by 541 the thermal zone. Used by the power allocator governor. For 542 more information see Documentation/driver-api/thermal/power_allocator.rst 543 544 Unit: milliwatts 545 546 RW, Optional 547 548k_po 549 The proportional term of the power allocator governor's PID 550 controller during temperature overshoot. Temperature overshoot 551 is when the current temperature is above the "desired 552 temperature" trip point. For more information see 553 Documentation/driver-api/thermal/power_allocator.rst 554 555 RW, Optional 556 557k_pu 558 The proportional term of the power allocator governor's PID 559 controller during temperature undershoot. Temperature undershoot 560 is when the current temperature is below the "desired 561 temperature" trip point. For more information see 562 Documentation/driver-api/thermal/power_allocator.rst 563 564 RW, Optional 565 566k_i 567 The integral term of the power allocator governor's PID 568 controller. This term allows the PID controller to compensate 569 for long term drift. For more information see 570 Documentation/driver-api/thermal/power_allocator.rst 571 572 RW, Optional 573 574k_d 575 The derivative term of the power allocator governor's PID 576 controller. For more information see 577 Documentation/driver-api/thermal/power_allocator.rst 578 579 RW, Optional 580 581integral_cutoff 582 Temperature offset from the desired temperature trip point 583 above which the integral term of the power allocator 584 governor's PID controller starts accumulating errors. For 585 example, if integral_cutoff is 0, then the integral term only 586 accumulates error when temperature is above the desired 587 temperature trip point. For more information see 588 Documentation/driver-api/thermal/power_allocator.rst 589 590 Unit: millidegree Celsius 591 592 RW, Optional 593 594slope 595 The slope constant used in a linear extrapolation model 596 to determine a hotspot temperature based off the sensor's 597 raw readings. It is up to the device driver to determine 598 the usage of these values. 599 600 RW, Optional 601 602offset 603 The offset constant used in a linear extrapolation model 604 to determine a hotspot temperature based off the sensor's 605 raw readings. It is up to the device driver to determine 606 the usage of these values. 607 608 RW, Optional 609 610Cooling device attributes 611------------------------- 612 613type 614 String which represents the type of device, e.g: 615 616 - for generic ACPI: should be "Fan", "Processor" or "LCD" 617 - for memory controller device on intel_menlow platform: 618 should be "Memory controller". 619 620 RO, Required 621 622max_state 623 The maximum permissible cooling state of this cooling device. 624 625 RO, Required 626 627cur_state 628 The current cooling state of this cooling device. 629 The value can any integer numbers between 0 and max_state: 630 631 - cur_state == 0 means no cooling 632 - cur_state == max_state means the maximum cooling. 633 634 RW, Required 635 636stats/reset 637 Writing any value resets the cooling device's statistics. 638 WO, Required 639 640stats/time_in_state_ms: 641 The amount of time spent by the cooling device in various cooling 642 states. The output will have "<state> <time>" pair in each line, which 643 will mean this cooling device spent <time> msec of time at <state>. 644 Output will have one line for each of the supported states. 645 RO, Required 646 647 648stats/total_trans: 649 A single positive value showing the total number of times the state of a 650 cooling device is changed. 651 652 RO, Required 653 654stats/trans_table: 655 This gives fine grained information about all the cooling state 656 transitions. The cat output here is a two dimensional matrix, where an 657 entry <i,j> (row i, column j) represents the number of transitions from 658 State_i to State_j. If the transition table is bigger than PAGE_SIZE, 659 reading this will return an -EFBIG error. 660 RO, Required 661 6623. A simple implementation 663========================== 664 665ACPI thermal zone may support multiple trip points like critical, hot, 666passive, active. If an ACPI thermal zone supports critical, passive, 667active[0] and active[1] at the same time, it may register itself as a 668thermal_zone_device (thermal_zone1) with 4 trip points in all. 669It has one processor and one fan, which are both registered as 670thermal_cooling_device. Both are considered to have the same 671effectiveness in cooling the thermal zone. 672 673If the processor is listed in _PSL method, and the fan is listed in _AL0 674method, the sys I/F structure will be built like this:: 675 676 /sys/class/thermal: 677 |thermal_zone1: 678 |---type: acpitz 679 |---temp: 37000 680 |---mode: enabled 681 |---policy: step_wise 682 |---available_policies: step_wise fair_share 683 |---trip_point_0_temp: 100000 684 |---trip_point_0_type: critical 685 |---trip_point_1_temp: 80000 686 |---trip_point_1_type: passive 687 |---trip_point_2_temp: 70000 688 |---trip_point_2_type: active0 689 |---trip_point_3_temp: 60000 690 |---trip_point_3_type: active1 691 |---cdev0: --->/sys/class/thermal/cooling_device0 692 |---cdev0_trip_point: 1 /* cdev0 can be used for passive */ 693 |---cdev0_weight: 1024 694 |---cdev1: --->/sys/class/thermal/cooling_device3 695 |---cdev1_trip_point: 2 /* cdev1 can be used for active[0]*/ 696 |---cdev1_weight: 1024 697 698 |cooling_device0: 699 |---type: Processor 700 |---max_state: 8 701 |---cur_state: 0 702 703 |cooling_device3: 704 |---type: Fan 705 |---max_state: 2 706 |---cur_state: 0 707 708 /sys/class/hwmon: 709 |hwmon0: 710 |---name: acpitz 711 |---temp1_input: 37000 712 |---temp1_crit: 100000 713 7144. Export Symbol APIs 715===================== 716 7174.1. get_tz_trend 718----------------- 719 720This function returns the trend of a thermal zone, i.e the rate of change 721of temperature of the thermal zone. Ideally, the thermal sensor drivers 722are supposed to implement the callback. If they don't, the thermal 723framework calculated the trend by comparing the previous and the current 724temperature values. 725 7264.2. get_thermal_instance 727------------------------- 728 729This function returns the thermal_instance corresponding to a given 730{thermal_zone, cooling_device, trip_point} combination. Returns NULL 731if such an instance does not exist. 732 7334.3. thermal_cdev_update 734------------------------ 735 736This function serves as an arbitrator to set the state of a cooling 737device. It sets the cooling device to the deepest cooling state if 738possible. 739 7405. thermal_emergency_poweroff 741============================= 742 743On an event of critical trip temperature crossing the thermal framework 744shuts down the system by calling hw_protection_shutdown(). The 745hw_protection_shutdown() first attempts to perform an orderly shutdown 746but accepts a delay after which it proceeds doing a forced power-off 747or as last resort an emergency_restart. 748 749The delay should be carefully profiled so as to give adequate time for 750orderly poweroff. 751 752If the delay is set to 0 emergency poweroff will not be supported. So a 753carefully profiled non-zero positive value is a must for emergency 754poweroff to be triggered. 755