1707bf8e1SSrinivas Pandruvada======================= 2707bf8e1SSrinivas PandruvadaIntel Powerclamp Driver 3707bf8e1SSrinivas Pandruvada======================= 4707bf8e1SSrinivas Pandruvada 5707bf8e1SSrinivas PandruvadaBy: 6707bf8e1SSrinivas Pandruvada - Arjan van de Ven <arjan@linux.intel.com> 7707bf8e1SSrinivas Pandruvada - Jacob Pan <jacob.jun.pan@linux.intel.com> 8707bf8e1SSrinivas Pandruvada 9707bf8e1SSrinivas Pandruvada.. Contents: 10707bf8e1SSrinivas Pandruvada 11707bf8e1SSrinivas Pandruvada (*) Introduction 12707bf8e1SSrinivas Pandruvada - Goals and Objectives 13707bf8e1SSrinivas Pandruvada 14707bf8e1SSrinivas Pandruvada (*) Theory of Operation 15707bf8e1SSrinivas Pandruvada - Idle Injection 16707bf8e1SSrinivas Pandruvada - Calibration 17707bf8e1SSrinivas Pandruvada 18707bf8e1SSrinivas Pandruvada (*) Performance Analysis 19707bf8e1SSrinivas Pandruvada - Effectiveness and Limitations 20707bf8e1SSrinivas Pandruvada - Power vs Performance 21707bf8e1SSrinivas Pandruvada - Scalability 22707bf8e1SSrinivas Pandruvada - Calibration 23707bf8e1SSrinivas Pandruvada - Comparison with Alternative Techniques 24707bf8e1SSrinivas Pandruvada 25707bf8e1SSrinivas Pandruvada (*) Usage and Interfaces 26707bf8e1SSrinivas Pandruvada - Generic Thermal Layer (sysfs) 27707bf8e1SSrinivas Pandruvada - Kernel APIs (TBD) 28707bf8e1SSrinivas Pandruvada 29ebf51971SSrinivas Pandruvada (*) Module Parameters 30ebf51971SSrinivas Pandruvada 31707bf8e1SSrinivas PandruvadaINTRODUCTION 32707bf8e1SSrinivas Pandruvada============ 33707bf8e1SSrinivas Pandruvada 34707bf8e1SSrinivas PandruvadaConsider the situation where a system’s power consumption must be 35707bf8e1SSrinivas Pandruvadareduced at runtime, due to power budget, thermal constraint, or noise 36707bf8e1SSrinivas Pandruvadalevel, and where active cooling is not preferred. Software managed 37707bf8e1SSrinivas Pandruvadapassive power reduction must be performed to prevent the hardware 38707bf8e1SSrinivas Pandruvadaactions that are designed for catastrophic scenarios. 39707bf8e1SSrinivas Pandruvada 40707bf8e1SSrinivas PandruvadaCurrently, P-states, T-states (clock modulation), and CPU offlining 41707bf8e1SSrinivas Pandruvadaare used for CPU throttling. 42707bf8e1SSrinivas Pandruvada 43707bf8e1SSrinivas PandruvadaOn Intel CPUs, C-states provide effective power reduction, but so far 44707bf8e1SSrinivas Pandruvadathey’re only used opportunistically, based on workload. With the 45707bf8e1SSrinivas Pandruvadadevelopment of intel_powerclamp driver, the method of synchronizing 46707bf8e1SSrinivas Pandruvadaidle injection across all online CPU threads was introduced. The goal 47707bf8e1SSrinivas Pandruvadais to achieve forced and controllable C-state residency. 48707bf8e1SSrinivas Pandruvada 49707bf8e1SSrinivas PandruvadaTest/Analysis has been made in the areas of power, performance, 50707bf8e1SSrinivas Pandruvadascalability, and user experience. In many cases, clear advantage is 51707bf8e1SSrinivas Pandruvadashown over taking the CPU offline or modulating the CPU clock. 52707bf8e1SSrinivas Pandruvada 53707bf8e1SSrinivas Pandruvada 54707bf8e1SSrinivas PandruvadaTHEORY OF OPERATION 55707bf8e1SSrinivas Pandruvada=================== 56707bf8e1SSrinivas Pandruvada 57707bf8e1SSrinivas PandruvadaIdle Injection 58707bf8e1SSrinivas Pandruvada-------------- 59707bf8e1SSrinivas Pandruvada 60707bf8e1SSrinivas PandruvadaOn modern Intel processors (Nehalem or later), package level C-state 61707bf8e1SSrinivas Pandruvadaresidency is available in MSRs, thus also available to the kernel. 62707bf8e1SSrinivas Pandruvada 63707bf8e1SSrinivas PandruvadaThese MSRs are:: 64707bf8e1SSrinivas Pandruvada 65707bf8e1SSrinivas Pandruvada #define MSR_PKG_C2_RESIDENCY 0x60D 66707bf8e1SSrinivas Pandruvada #define MSR_PKG_C3_RESIDENCY 0x3F8 67707bf8e1SSrinivas Pandruvada #define MSR_PKG_C6_RESIDENCY 0x3F9 68707bf8e1SSrinivas Pandruvada #define MSR_PKG_C7_RESIDENCY 0x3FA 69707bf8e1SSrinivas Pandruvada 70707bf8e1SSrinivas PandruvadaIf the kernel can also inject idle time to the system, then a 71707bf8e1SSrinivas Pandruvadaclosed-loop control system can be established that manages package 72707bf8e1SSrinivas Pandruvadalevel C-state. The intel_powerclamp driver is conceived as such a 73707bf8e1SSrinivas Pandruvadacontrol system, where the target set point is a user-selected idle 74707bf8e1SSrinivas Pandruvadaratio (based on power reduction), and the error is the difference 75707bf8e1SSrinivas Pandruvadabetween the actual package level C-state residency ratio and the target idle 76707bf8e1SSrinivas Pandruvadaratio. 77707bf8e1SSrinivas Pandruvada 78707bf8e1SSrinivas PandruvadaInjection is controlled by high priority kernel threads, spawned for 79707bf8e1SSrinivas Pandruvadaeach online CPU. 80707bf8e1SSrinivas Pandruvada 81707bf8e1SSrinivas PandruvadaThese kernel threads, with SCHED_FIFO class, are created to perform 82707bf8e1SSrinivas Pandruvadaclamping actions of controlled duty ratio and duration. Each per-CPU 83707bf8e1SSrinivas Pandruvadathread synchronizes its idle time and duration, based on the rounding 84707bf8e1SSrinivas Pandruvadaof jiffies, so accumulated errors can be prevented to avoid a jittery 85707bf8e1SSrinivas Pandruvadaeffect. Threads are also bound to the CPU such that they cannot be 86707bf8e1SSrinivas Pandruvadamigrated, unless the CPU is taken offline. In this case, threads 87707bf8e1SSrinivas Pandruvadabelong to the offlined CPUs will be terminated immediately. 88707bf8e1SSrinivas Pandruvada 89707bf8e1SSrinivas PandruvadaRunning as SCHED_FIFO and relatively high priority, also allows such 90*70756b49SLinus Torvaldsscheme to work for both preemptible and non-preemptible kernels. 91707bf8e1SSrinivas PandruvadaAlignment of idle time around jiffies ensures scalability for HZ 92707bf8e1SSrinivas Pandruvadavalues. This effect can be better visualized using a Perf timechart. 93707bf8e1SSrinivas PandruvadaThe following diagram shows the behavior of kernel thread 94707bf8e1SSrinivas Pandruvadakidle_inject/cpu. During idle injection, it runs monitor/mwait idle 95707bf8e1SSrinivas Pandruvadafor a given "duration", then relinquishes the CPU to other tasks, 96707bf8e1SSrinivas Pandruvadauntil the next time interval. 97707bf8e1SSrinivas Pandruvada 98707bf8e1SSrinivas PandruvadaThe NOHZ schedule tick is disabled during idle time, but interrupts 99707bf8e1SSrinivas Pandruvadaare not masked. Tests show that the extra wakeups from scheduler tick 100707bf8e1SSrinivas Pandruvadahave a dramatic impact on the effectiveness of the powerclamp driver 101707bf8e1SSrinivas Pandruvadaon large scale systems (Westmere system with 80 processors). 102707bf8e1SSrinivas Pandruvada 103707bf8e1SSrinivas Pandruvada:: 104707bf8e1SSrinivas Pandruvada 105707bf8e1SSrinivas Pandruvada CPU0 106707bf8e1SSrinivas Pandruvada ____________ ____________ 107707bf8e1SSrinivas Pandruvada kidle_inject/0 | sleep | mwait | sleep | 108707bf8e1SSrinivas Pandruvada _________| |________| |_______ 109707bf8e1SSrinivas Pandruvada duration 110707bf8e1SSrinivas Pandruvada CPU1 111707bf8e1SSrinivas Pandruvada ____________ ____________ 112707bf8e1SSrinivas Pandruvada kidle_inject/1 | sleep | mwait | sleep | 113707bf8e1SSrinivas Pandruvada _________| |________| |_______ 114707bf8e1SSrinivas Pandruvada ^ 115707bf8e1SSrinivas Pandruvada | 116707bf8e1SSrinivas Pandruvada | 117707bf8e1SSrinivas Pandruvada roundup(jiffies, interval) 118707bf8e1SSrinivas Pandruvada 119707bf8e1SSrinivas PandruvadaOnly one CPU is allowed to collect statistics and update global 120707bf8e1SSrinivas Pandruvadacontrol parameters. This CPU is referred to as the controlling CPU in 121707bf8e1SSrinivas Pandruvadathis document. The controlling CPU is elected at runtime, with a 122707bf8e1SSrinivas Pandruvadapolicy that favors BSP, taking into account the possibility of a CPU 123707bf8e1SSrinivas Pandruvadahot-plug. 124707bf8e1SSrinivas Pandruvada 125707bf8e1SSrinivas PandruvadaIn terms of dynamics of the idle control system, package level idle 126707bf8e1SSrinivas Pandruvadatime is considered largely as a non-causal system where its behavior 127707bf8e1SSrinivas Pandruvadacannot be based on the past or current input. Therefore, the 128707bf8e1SSrinivas Pandruvadaintel_powerclamp driver attempts to enforce the desired idle time 129707bf8e1SSrinivas Pandruvadainstantly as given input (target idle ratio). After injection, 130707bf8e1SSrinivas Pandruvadapowerclamp monitors the actual idle for a given time window and adjust 131707bf8e1SSrinivas Pandruvadathe next injection accordingly to avoid over/under correction. 132707bf8e1SSrinivas Pandruvada 133707bf8e1SSrinivas PandruvadaWhen used in a causal control system, such as a temperature control, 134707bf8e1SSrinivas Pandruvadait is up to the user of this driver to implement algorithms where 135707bf8e1SSrinivas Pandruvadapast samples and outputs are included in the feedback. For example, a 136707bf8e1SSrinivas PandruvadaPID-based thermal controller can use the powerclamp driver to 137707bf8e1SSrinivas Pandruvadamaintain a desired target temperature, based on integral and 138707bf8e1SSrinivas Pandruvadaderivative gains of the past samples. 139707bf8e1SSrinivas Pandruvada 140707bf8e1SSrinivas Pandruvada 141707bf8e1SSrinivas Pandruvada 142707bf8e1SSrinivas PandruvadaCalibration 143707bf8e1SSrinivas Pandruvada----------- 144707bf8e1SSrinivas PandruvadaDuring scalability testing, it is observed that synchronized actions 145707bf8e1SSrinivas Pandruvadaamong CPUs become challenging as the number of cores grows. This is 146707bf8e1SSrinivas Pandruvadaalso true for the ability of a system to enter package level C-states. 147707bf8e1SSrinivas Pandruvada 148707bf8e1SSrinivas PandruvadaTo make sure the intel_powerclamp driver scales well, online 149707bf8e1SSrinivas Pandruvadacalibration is implemented. The goals for doing such a calibration 150707bf8e1SSrinivas Pandruvadaare: 151707bf8e1SSrinivas Pandruvada 152707bf8e1SSrinivas Pandruvadaa) determine the effective range of idle injection ratio 153707bf8e1SSrinivas Pandruvadab) determine the amount of compensation needed at each target ratio 154707bf8e1SSrinivas Pandruvada 155707bf8e1SSrinivas PandruvadaCompensation to each target ratio consists of two parts: 156707bf8e1SSrinivas Pandruvada 157707bf8e1SSrinivas Pandruvada a) steady state error compensation 158fef1f0beSBagas Sanjaya 159707bf8e1SSrinivas Pandruvada This is to offset the error occurring when the system can 160707bf8e1SSrinivas Pandruvada enter idle without extra wakeups (such as external interrupts). 161707bf8e1SSrinivas Pandruvada 162707bf8e1SSrinivas Pandruvada b) dynamic error compensation 163fef1f0beSBagas Sanjaya 164707bf8e1SSrinivas Pandruvada When an excessive amount of wakeups occurs during idle, an 165707bf8e1SSrinivas Pandruvada additional idle ratio can be added to quiet interrupts, by 166707bf8e1SSrinivas Pandruvada slowing down CPU activities. 167707bf8e1SSrinivas Pandruvada 168707bf8e1SSrinivas PandruvadaA debugfs file is provided for the user to examine compensation 169707bf8e1SSrinivas Pandruvadaprogress and results, such as on a Westmere system:: 170707bf8e1SSrinivas Pandruvada 171707bf8e1SSrinivas Pandruvada [jacob@nex01 ~]$ cat 172707bf8e1SSrinivas Pandruvada /sys/kernel/debug/intel_powerclamp/powerclamp_calib 173707bf8e1SSrinivas Pandruvada controlling cpu: 0 174707bf8e1SSrinivas Pandruvada pct confidence steady dynamic (compensation) 175707bf8e1SSrinivas Pandruvada 0 0 0 0 176707bf8e1SSrinivas Pandruvada 1 1 0 0 177707bf8e1SSrinivas Pandruvada 2 1 1 0 178707bf8e1SSrinivas Pandruvada 3 3 1 0 179707bf8e1SSrinivas Pandruvada 4 3 1 0 180707bf8e1SSrinivas Pandruvada 5 3 1 0 181707bf8e1SSrinivas Pandruvada 6 3 1 0 182707bf8e1SSrinivas Pandruvada 7 3 1 0 183707bf8e1SSrinivas Pandruvada 8 3 1 0 184707bf8e1SSrinivas Pandruvada ... 185707bf8e1SSrinivas Pandruvada 30 3 2 0 186707bf8e1SSrinivas Pandruvada 31 3 2 0 187707bf8e1SSrinivas Pandruvada 32 3 1 0 188707bf8e1SSrinivas Pandruvada 33 3 2 0 189707bf8e1SSrinivas Pandruvada 34 3 1 0 190707bf8e1SSrinivas Pandruvada 35 3 2 0 191707bf8e1SSrinivas Pandruvada 36 3 1 0 192707bf8e1SSrinivas Pandruvada 37 3 2 0 193707bf8e1SSrinivas Pandruvada 38 3 1 0 194707bf8e1SSrinivas Pandruvada 39 3 2 0 195707bf8e1SSrinivas Pandruvada 40 3 3 0 196707bf8e1SSrinivas Pandruvada 41 3 1 0 197707bf8e1SSrinivas Pandruvada 42 3 2 0 198707bf8e1SSrinivas Pandruvada 43 3 1 0 199707bf8e1SSrinivas Pandruvada 44 3 1 0 200707bf8e1SSrinivas Pandruvada 45 3 2 0 201707bf8e1SSrinivas Pandruvada 46 3 3 0 202707bf8e1SSrinivas Pandruvada 47 3 0 0 203707bf8e1SSrinivas Pandruvada 48 3 2 0 204707bf8e1SSrinivas Pandruvada 49 3 3 0 205707bf8e1SSrinivas Pandruvada 206707bf8e1SSrinivas PandruvadaCalibration occurs during runtime. No offline method is available. 207707bf8e1SSrinivas PandruvadaSteady state compensation is used only when confidence levels of all 208707bf8e1SSrinivas Pandruvadaadjacent ratios have reached satisfactory level. A confidence level 209707bf8e1SSrinivas Pandruvadais accumulated based on clean data collected at runtime. Data 210707bf8e1SSrinivas Pandruvadacollected during a period without extra interrupts is considered 211707bf8e1SSrinivas Pandruvadaclean. 212707bf8e1SSrinivas Pandruvada 213707bf8e1SSrinivas PandruvadaTo compensate for excessive amounts of wakeup during idle, additional 214707bf8e1SSrinivas Pandruvadaidle time is injected when such a condition is detected. Currently, 215707bf8e1SSrinivas Pandruvadawe have a simple algorithm to double the injection ratio. A possible 216707bf8e1SSrinivas Pandruvadaenhancement might be to throttle the offending IRQ, such as delaying 217707bf8e1SSrinivas PandruvadaEOI for level triggered interrupts. But it is a challenge to be 218707bf8e1SSrinivas Pandruvadanon-intrusive to the scheduler or the IRQ core code. 219707bf8e1SSrinivas Pandruvada 220707bf8e1SSrinivas Pandruvada 221707bf8e1SSrinivas PandruvadaCPU Online/Offline 222707bf8e1SSrinivas Pandruvada------------------ 223707bf8e1SSrinivas PandruvadaPer-CPU kernel threads are started/stopped upon receiving 224707bf8e1SSrinivas Pandruvadanotifications of CPU hotplug activities. The intel_powerclamp driver 225707bf8e1SSrinivas Pandruvadakeeps track of clamping kernel threads, even after they are migrated 226707bf8e1SSrinivas Pandruvadato other CPUs, after a CPU offline event. 227707bf8e1SSrinivas Pandruvada 228707bf8e1SSrinivas Pandruvada 229707bf8e1SSrinivas PandruvadaPerformance Analysis 230707bf8e1SSrinivas Pandruvada==================== 231707bf8e1SSrinivas PandruvadaThis section describes the general performance data collected on 232707bf8e1SSrinivas Pandruvadamultiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P). 233707bf8e1SSrinivas Pandruvada 234707bf8e1SSrinivas PandruvadaEffectiveness and Limitations 235707bf8e1SSrinivas Pandruvada----------------------------- 236707bf8e1SSrinivas PandruvadaThe maximum range that idle injection is allowed is capped at 50 237707bf8e1SSrinivas Pandruvadapercent. As mentioned earlier, since interrupts are allowed during 238707bf8e1SSrinivas Pandruvadaforced idle time, excessive interrupts could result in less 239707bf8e1SSrinivas Pandruvadaeffectiveness. The extreme case would be doing a ping -f to generated 240707bf8e1SSrinivas Pandruvadaflooded network interrupts without much CPU acknowledgement. In this 241707bf8e1SSrinivas Pandruvadacase, little can be done from the idle injection threads. In most 242707bf8e1SSrinivas Pandruvadanormal cases, such as scp a large file, applications can be throttled 243707bf8e1SSrinivas Pandruvadaby the powerclamp driver, since slowing down the CPU also slows down 244707bf8e1SSrinivas Pandruvadanetwork protocol processing, which in turn reduces interrupts. 245707bf8e1SSrinivas Pandruvada 246707bf8e1SSrinivas PandruvadaWhen control parameters change at runtime by the controlling CPU, it 247707bf8e1SSrinivas Pandruvadamay take an additional period for the rest of the CPUs to catch up 248707bf8e1SSrinivas Pandruvadawith the changes. During this time, idle injection is out of sync, 249707bf8e1SSrinivas Pandruvadathus not able to enter package C- states at the expected ratio. But 250707bf8e1SSrinivas Pandruvadathis effect is minor, in that in most cases change to the target 251707bf8e1SSrinivas Pandruvadaratio is updated much less frequently than the idle injection 252707bf8e1SSrinivas Pandruvadafrequency. 253707bf8e1SSrinivas Pandruvada 254707bf8e1SSrinivas PandruvadaScalability 255707bf8e1SSrinivas Pandruvada----------- 256707bf8e1SSrinivas PandruvadaTests also show a minor, but measurable, difference between the 4P/8P 257707bf8e1SSrinivas PandruvadaIvy Bridge system and the 80P Westmere server under 50% idle ratio. 258707bf8e1SSrinivas PandruvadaMore compensation is needed on Westmere for the same amount of 259707bf8e1SSrinivas Pandruvadatarget idle ratio. The compensation also increases as the idle ratio 260707bf8e1SSrinivas Pandruvadagets larger. The above reason constitutes the need for the 261707bf8e1SSrinivas Pandruvadacalibration code. 262707bf8e1SSrinivas Pandruvada 263707bf8e1SSrinivas PandruvadaOn the IVB 8P system, compared to an offline CPU, powerclamp can 264707bf8e1SSrinivas Pandruvadaachieve up to 40% better performance per watt. (measured by a spin 265707bf8e1SSrinivas Pandruvadacounter summed over per CPU counting threads spawned for all running 266707bf8e1SSrinivas PandruvadaCPUs). 267707bf8e1SSrinivas Pandruvada 268707bf8e1SSrinivas PandruvadaUsage and Interfaces 269707bf8e1SSrinivas Pandruvada==================== 270707bf8e1SSrinivas PandruvadaThe powerclamp driver is registered to the generic thermal layer as a 271707bf8e1SSrinivas Pandruvadacooling device. Currently, it’s not bound to any thermal zones:: 272707bf8e1SSrinivas Pandruvada 273707bf8e1SSrinivas Pandruvada jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . * 274707bf8e1SSrinivas Pandruvada cur_state:0 275707bf8e1SSrinivas Pandruvada max_state:50 276707bf8e1SSrinivas Pandruvada type:intel_powerclamp 277707bf8e1SSrinivas Pandruvada 278707bf8e1SSrinivas Pandruvadacur_state allows user to set the desired idle percentage. Writing 0 to 279707bf8e1SSrinivas Pandruvadacur_state will stop idle injection. Writing a value between 1 and 280707bf8e1SSrinivas Pandruvadamax_state will start the idle injection. Reading cur_state returns the 281707bf8e1SSrinivas Pandruvadaactual and current idle percentage. This may not be the same value 282707bf8e1SSrinivas Pandruvadaset by the user in that current idle percentage depends on workload 283707bf8e1SSrinivas Pandruvadaand includes natural idle. When idle injection is disabled, reading 284707bf8e1SSrinivas Pandruvadacur_state returns value -1 instead of 0 which is to avoid confusing 285707bf8e1SSrinivas Pandruvada100% busy state with the disabled state. 286707bf8e1SSrinivas Pandruvada 287707bf8e1SSrinivas PandruvadaExample usage: 288fef1f0beSBagas Sanjaya 289707bf8e1SSrinivas Pandruvada- To inject 25% idle time:: 290707bf8e1SSrinivas Pandruvada 291707bf8e1SSrinivas Pandruvada $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state 292707bf8e1SSrinivas Pandruvada 293707bf8e1SSrinivas PandruvadaIf the system is not busy and has more than 25% idle time already, 294707bf8e1SSrinivas Pandruvadathen the powerclamp driver will not start idle injection. Using Top 295707bf8e1SSrinivas Pandruvadawill not show idle injection kernel threads. 296707bf8e1SSrinivas Pandruvada 297707bf8e1SSrinivas PandruvadaIf the system is busy (spin test below) and has less than 25% natural 298707bf8e1SSrinivas Pandruvadaidle time, powerclamp kernel threads will do idle injection. Forced 299707bf8e1SSrinivas Pandruvadaidle time is accounted as normal idle in that common code path is 300707bf8e1SSrinivas Pandruvadataken as the idle task. 301707bf8e1SSrinivas Pandruvada 302707bf8e1SSrinivas PandruvadaIn this example, 24.1% idle is shown. This helps the system admin or 303707bf8e1SSrinivas Pandruvadauser determine the cause of slowdown, when a powerclamp driver is in action:: 304707bf8e1SSrinivas Pandruvada 305707bf8e1SSrinivas Pandruvada 306707bf8e1SSrinivas Pandruvada Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie 307707bf8e1SSrinivas Pandruvada Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st 308707bf8e1SSrinivas Pandruvada Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers 309707bf8e1SSrinivas Pandruvada Swap: 4087804k total, 0k used, 4087804k free, 945336k cached 310707bf8e1SSrinivas Pandruvada 311707bf8e1SSrinivas Pandruvada PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 312707bf8e1SSrinivas Pandruvada 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin 313707bf8e1SSrinivas Pandruvada 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0 314707bf8e1SSrinivas Pandruvada 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3 315707bf8e1SSrinivas Pandruvada 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1 316707bf8e1SSrinivas Pandruvada 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2 317707bf8e1SSrinivas Pandruvada 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox 318707bf8e1SSrinivas Pandruvada 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg 319707bf8e1SSrinivas Pandruvada 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz 320707bf8e1SSrinivas Pandruvada 321707bf8e1SSrinivas PandruvadaTests have shown that by using the powerclamp driver as a cooling 322707bf8e1SSrinivas Pandruvadadevice, a PID based userspace thermal controller can manage to 323707bf8e1SSrinivas Pandruvadacontrol CPU temperature effectively, when no other thermal influence 324707bf8e1SSrinivas Pandruvadais added. For example, a UltraBook user can compile the kernel under 325707bf8e1SSrinivas Pandruvadacertain temperature (below most active trip points). 326ebf51971SSrinivas Pandruvada 327ebf51971SSrinivas PandruvadaModule Parameters 328ebf51971SSrinivas Pandruvada================= 329ebf51971SSrinivas Pandruvada 330ebf51971SSrinivas Pandruvada``cpumask`` (RW) 331ebf51971SSrinivas Pandruvada A bit mask of CPUs to inject idle. The format of the bitmask is same as 332e8b703edSBagas Sanjaya used in other subsystems like in /proc/irq/\*/smp_affinity. The mask is 333ebf51971SSrinivas Pandruvada comma separated 32 bit groups. Each CPU is one bit. For example for a 256 334ebf51971SSrinivas Pandruvada CPU system the full mask is: 335ebf51971SSrinivas Pandruvada ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 336ebf51971SSrinivas Pandruvada 337ebf51971SSrinivas Pandruvada The rightmost mask is for CPU 0-32. 338ebf51971SSrinivas Pandruvada 339ebf51971SSrinivas Pandruvada``max_idle`` (RW) 340ebf51971SSrinivas Pandruvada Maximum injected idle time to the total CPU time ratio in percent range 341ebf51971SSrinivas Pandruvada from 1 to 100. Even if the cooling device max_state is always 100 (100%), 342ebf51971SSrinivas Pandruvada this parameter allows to add a max idle percent limit. The default is 50, 343ebf51971SSrinivas Pandruvada to match the current implementation of powerclamp driver. Also doesn't 344ebf51971SSrinivas Pandruvada allow value more than 75, if the cpumask includes every CPU present in 345ebf51971SSrinivas Pandruvada the system. 346