xref: /openbmc/linux/Documentation/admin-guide/thermal/intel_powerclamp.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1707bf8e1SSrinivas Pandruvada=======================
2707bf8e1SSrinivas PandruvadaIntel Powerclamp Driver
3707bf8e1SSrinivas Pandruvada=======================
4707bf8e1SSrinivas Pandruvada
5707bf8e1SSrinivas PandruvadaBy:
6707bf8e1SSrinivas Pandruvada  - Arjan van de Ven <arjan@linux.intel.com>
7707bf8e1SSrinivas Pandruvada  - Jacob Pan <jacob.jun.pan@linux.intel.com>
8707bf8e1SSrinivas Pandruvada
9707bf8e1SSrinivas Pandruvada.. Contents:
10707bf8e1SSrinivas Pandruvada
11707bf8e1SSrinivas Pandruvada	(*) Introduction
12707bf8e1SSrinivas Pandruvada	    - Goals and Objectives
13707bf8e1SSrinivas Pandruvada
14707bf8e1SSrinivas Pandruvada	(*) Theory of Operation
15707bf8e1SSrinivas Pandruvada	    - Idle Injection
16707bf8e1SSrinivas Pandruvada	    - Calibration
17707bf8e1SSrinivas Pandruvada
18707bf8e1SSrinivas Pandruvada	(*) Performance Analysis
19707bf8e1SSrinivas Pandruvada	    - Effectiveness and Limitations
20707bf8e1SSrinivas Pandruvada	    - Power vs Performance
21707bf8e1SSrinivas Pandruvada	    - Scalability
22707bf8e1SSrinivas Pandruvada	    - Calibration
23707bf8e1SSrinivas Pandruvada	    - Comparison with Alternative Techniques
24707bf8e1SSrinivas Pandruvada
25707bf8e1SSrinivas Pandruvada	(*) Usage and Interfaces
26707bf8e1SSrinivas Pandruvada	    - Generic Thermal Layer (sysfs)
27707bf8e1SSrinivas Pandruvada	    - Kernel APIs (TBD)
28707bf8e1SSrinivas Pandruvada
29ebf51971SSrinivas Pandruvada	(*) Module Parameters
30ebf51971SSrinivas Pandruvada
31707bf8e1SSrinivas PandruvadaINTRODUCTION
32707bf8e1SSrinivas Pandruvada============
33707bf8e1SSrinivas Pandruvada
34707bf8e1SSrinivas PandruvadaConsider the situation where a system’s power consumption must be
35707bf8e1SSrinivas Pandruvadareduced at runtime, due to power budget, thermal constraint, or noise
36707bf8e1SSrinivas Pandruvadalevel, and where active cooling is not preferred. Software managed
37707bf8e1SSrinivas Pandruvadapassive power reduction must be performed to prevent the hardware
38707bf8e1SSrinivas Pandruvadaactions that are designed for catastrophic scenarios.
39707bf8e1SSrinivas Pandruvada
40707bf8e1SSrinivas PandruvadaCurrently, P-states, T-states (clock modulation), and CPU offlining
41707bf8e1SSrinivas Pandruvadaare used for CPU throttling.
42707bf8e1SSrinivas Pandruvada
43707bf8e1SSrinivas PandruvadaOn Intel CPUs, C-states provide effective power reduction, but so far
44707bf8e1SSrinivas Pandruvadathey’re only used opportunistically, based on workload. With the
45707bf8e1SSrinivas Pandruvadadevelopment of intel_powerclamp driver, the method of synchronizing
46707bf8e1SSrinivas Pandruvadaidle injection across all online CPU threads was introduced. The goal
47707bf8e1SSrinivas Pandruvadais to achieve forced and controllable C-state residency.
48707bf8e1SSrinivas Pandruvada
49707bf8e1SSrinivas PandruvadaTest/Analysis has been made in the areas of power, performance,
50707bf8e1SSrinivas Pandruvadascalability, and user experience. In many cases, clear advantage is
51707bf8e1SSrinivas Pandruvadashown over taking the CPU offline or modulating the CPU clock.
52707bf8e1SSrinivas Pandruvada
53707bf8e1SSrinivas Pandruvada
54707bf8e1SSrinivas PandruvadaTHEORY OF OPERATION
55707bf8e1SSrinivas Pandruvada===================
56707bf8e1SSrinivas Pandruvada
57707bf8e1SSrinivas PandruvadaIdle Injection
58707bf8e1SSrinivas Pandruvada--------------
59707bf8e1SSrinivas Pandruvada
60707bf8e1SSrinivas PandruvadaOn modern Intel processors (Nehalem or later), package level C-state
61707bf8e1SSrinivas Pandruvadaresidency is available in MSRs, thus also available to the kernel.
62707bf8e1SSrinivas Pandruvada
63707bf8e1SSrinivas PandruvadaThese MSRs are::
64707bf8e1SSrinivas Pandruvada
65707bf8e1SSrinivas Pandruvada      #define MSR_PKG_C2_RESIDENCY      0x60D
66707bf8e1SSrinivas Pandruvada      #define MSR_PKG_C3_RESIDENCY      0x3F8
67707bf8e1SSrinivas Pandruvada      #define MSR_PKG_C6_RESIDENCY      0x3F9
68707bf8e1SSrinivas Pandruvada      #define MSR_PKG_C7_RESIDENCY      0x3FA
69707bf8e1SSrinivas Pandruvada
70707bf8e1SSrinivas PandruvadaIf the kernel can also inject idle time to the system, then a
71707bf8e1SSrinivas Pandruvadaclosed-loop control system can be established that manages package
72707bf8e1SSrinivas Pandruvadalevel C-state. The intel_powerclamp driver is conceived as such a
73707bf8e1SSrinivas Pandruvadacontrol system, where the target set point is a user-selected idle
74707bf8e1SSrinivas Pandruvadaratio (based on power reduction), and the error is the difference
75707bf8e1SSrinivas Pandruvadabetween the actual package level C-state residency ratio and the target idle
76707bf8e1SSrinivas Pandruvadaratio.
77707bf8e1SSrinivas Pandruvada
78707bf8e1SSrinivas PandruvadaInjection is controlled by high priority kernel threads, spawned for
79707bf8e1SSrinivas Pandruvadaeach online CPU.
80707bf8e1SSrinivas Pandruvada
81707bf8e1SSrinivas PandruvadaThese kernel threads, with SCHED_FIFO class, are created to perform
82707bf8e1SSrinivas Pandruvadaclamping actions of controlled duty ratio and duration. Each per-CPU
83707bf8e1SSrinivas Pandruvadathread synchronizes its idle time and duration, based on the rounding
84707bf8e1SSrinivas Pandruvadaof jiffies, so accumulated errors can be prevented to avoid a jittery
85707bf8e1SSrinivas Pandruvadaeffect. Threads are also bound to the CPU such that they cannot be
86707bf8e1SSrinivas Pandruvadamigrated, unless the CPU is taken offline. In this case, threads
87707bf8e1SSrinivas Pandruvadabelong to the offlined CPUs will be terminated immediately.
88707bf8e1SSrinivas Pandruvada
89707bf8e1SSrinivas PandruvadaRunning as SCHED_FIFO and relatively high priority, also allows such
90*70756b49SLinus Torvaldsscheme to work for both preemptible and non-preemptible kernels.
91707bf8e1SSrinivas PandruvadaAlignment of idle time around jiffies ensures scalability for HZ
92707bf8e1SSrinivas Pandruvadavalues. This effect can be better visualized using a Perf timechart.
93707bf8e1SSrinivas PandruvadaThe following diagram shows the behavior of kernel thread
94707bf8e1SSrinivas Pandruvadakidle_inject/cpu. During idle injection, it runs monitor/mwait idle
95707bf8e1SSrinivas Pandruvadafor a given "duration", then relinquishes the CPU to other tasks,
96707bf8e1SSrinivas Pandruvadauntil the next time interval.
97707bf8e1SSrinivas Pandruvada
98707bf8e1SSrinivas PandruvadaThe NOHZ schedule tick is disabled during idle time, but interrupts
99707bf8e1SSrinivas Pandruvadaare not masked. Tests show that the extra wakeups from scheduler tick
100707bf8e1SSrinivas Pandruvadahave a dramatic impact on the effectiveness of the powerclamp driver
101707bf8e1SSrinivas Pandruvadaon large scale systems (Westmere system with 80 processors).
102707bf8e1SSrinivas Pandruvada
103707bf8e1SSrinivas Pandruvada::
104707bf8e1SSrinivas Pandruvada
105707bf8e1SSrinivas Pandruvada  CPU0
106707bf8e1SSrinivas Pandruvada		    ____________          ____________
107707bf8e1SSrinivas Pandruvada  kidle_inject/0   |   sleep    |  mwait |  sleep     |
108707bf8e1SSrinivas Pandruvada	  _________|            |________|            |_______
109707bf8e1SSrinivas Pandruvada				 duration
110707bf8e1SSrinivas Pandruvada  CPU1
111707bf8e1SSrinivas Pandruvada		    ____________          ____________
112707bf8e1SSrinivas Pandruvada  kidle_inject/1   |   sleep    |  mwait |  sleep     |
113707bf8e1SSrinivas Pandruvada	  _________|            |________|            |_______
114707bf8e1SSrinivas Pandruvada				^
115707bf8e1SSrinivas Pandruvada				|
116707bf8e1SSrinivas Pandruvada				|
117707bf8e1SSrinivas Pandruvada				roundup(jiffies, interval)
118707bf8e1SSrinivas Pandruvada
119707bf8e1SSrinivas PandruvadaOnly one CPU is allowed to collect statistics and update global
120707bf8e1SSrinivas Pandruvadacontrol parameters. This CPU is referred to as the controlling CPU in
121707bf8e1SSrinivas Pandruvadathis document. The controlling CPU is elected at runtime, with a
122707bf8e1SSrinivas Pandruvadapolicy that favors BSP, taking into account the possibility of a CPU
123707bf8e1SSrinivas Pandruvadahot-plug.
124707bf8e1SSrinivas Pandruvada
125707bf8e1SSrinivas PandruvadaIn terms of dynamics of the idle control system, package level idle
126707bf8e1SSrinivas Pandruvadatime is considered largely as a non-causal system where its behavior
127707bf8e1SSrinivas Pandruvadacannot be based on the past or current input. Therefore, the
128707bf8e1SSrinivas Pandruvadaintel_powerclamp driver attempts to enforce the desired idle time
129707bf8e1SSrinivas Pandruvadainstantly as given input (target idle ratio). After injection,
130707bf8e1SSrinivas Pandruvadapowerclamp monitors the actual idle for a given time window and adjust
131707bf8e1SSrinivas Pandruvadathe next injection accordingly to avoid over/under correction.
132707bf8e1SSrinivas Pandruvada
133707bf8e1SSrinivas PandruvadaWhen used in a causal control system, such as a temperature control,
134707bf8e1SSrinivas Pandruvadait is up to the user of this driver to implement algorithms where
135707bf8e1SSrinivas Pandruvadapast samples and outputs are included in the feedback. For example, a
136707bf8e1SSrinivas PandruvadaPID-based thermal controller can use the powerclamp driver to
137707bf8e1SSrinivas Pandruvadamaintain a desired target temperature, based on integral and
138707bf8e1SSrinivas Pandruvadaderivative gains of the past samples.
139707bf8e1SSrinivas Pandruvada
140707bf8e1SSrinivas Pandruvada
141707bf8e1SSrinivas Pandruvada
142707bf8e1SSrinivas PandruvadaCalibration
143707bf8e1SSrinivas Pandruvada-----------
144707bf8e1SSrinivas PandruvadaDuring scalability testing, it is observed that synchronized actions
145707bf8e1SSrinivas Pandruvadaamong CPUs become challenging as the number of cores grows. This is
146707bf8e1SSrinivas Pandruvadaalso true for the ability of a system to enter package level C-states.
147707bf8e1SSrinivas Pandruvada
148707bf8e1SSrinivas PandruvadaTo make sure the intel_powerclamp driver scales well, online
149707bf8e1SSrinivas Pandruvadacalibration is implemented. The goals for doing such a calibration
150707bf8e1SSrinivas Pandruvadaare:
151707bf8e1SSrinivas Pandruvada
152707bf8e1SSrinivas Pandruvadaa) determine the effective range of idle injection ratio
153707bf8e1SSrinivas Pandruvadab) determine the amount of compensation needed at each target ratio
154707bf8e1SSrinivas Pandruvada
155707bf8e1SSrinivas PandruvadaCompensation to each target ratio consists of two parts:
156707bf8e1SSrinivas Pandruvada
157707bf8e1SSrinivas Pandruvada	a) steady state error compensation
158fef1f0beSBagas Sanjaya
159707bf8e1SSrinivas Pandruvada	   This is to offset the error occurring when the system can
160707bf8e1SSrinivas Pandruvada	   enter idle without extra wakeups (such as external interrupts).
161707bf8e1SSrinivas Pandruvada
162707bf8e1SSrinivas Pandruvada	b) dynamic error compensation
163fef1f0beSBagas Sanjaya
164707bf8e1SSrinivas Pandruvada	   When an excessive amount of wakeups occurs during idle, an
165707bf8e1SSrinivas Pandruvada	   additional idle ratio can be added to quiet interrupts, by
166707bf8e1SSrinivas Pandruvada	   slowing down CPU activities.
167707bf8e1SSrinivas Pandruvada
168707bf8e1SSrinivas PandruvadaA debugfs file is provided for the user to examine compensation
169707bf8e1SSrinivas Pandruvadaprogress and results, such as on a Westmere system::
170707bf8e1SSrinivas Pandruvada
171707bf8e1SSrinivas Pandruvada  [jacob@nex01 ~]$ cat
172707bf8e1SSrinivas Pandruvada  /sys/kernel/debug/intel_powerclamp/powerclamp_calib
173707bf8e1SSrinivas Pandruvada  controlling cpu: 0
174707bf8e1SSrinivas Pandruvada  pct confidence steady dynamic (compensation)
175707bf8e1SSrinivas Pandruvada  0       0       0       0
176707bf8e1SSrinivas Pandruvada  1       1       0       0
177707bf8e1SSrinivas Pandruvada  2       1       1       0
178707bf8e1SSrinivas Pandruvada  3       3       1       0
179707bf8e1SSrinivas Pandruvada  4       3       1       0
180707bf8e1SSrinivas Pandruvada  5       3       1       0
181707bf8e1SSrinivas Pandruvada  6       3       1       0
182707bf8e1SSrinivas Pandruvada  7       3       1       0
183707bf8e1SSrinivas Pandruvada  8       3       1       0
184707bf8e1SSrinivas Pandruvada  ...
185707bf8e1SSrinivas Pandruvada  30      3       2       0
186707bf8e1SSrinivas Pandruvada  31      3       2       0
187707bf8e1SSrinivas Pandruvada  32      3       1       0
188707bf8e1SSrinivas Pandruvada  33      3       2       0
189707bf8e1SSrinivas Pandruvada  34      3       1       0
190707bf8e1SSrinivas Pandruvada  35      3       2       0
191707bf8e1SSrinivas Pandruvada  36      3       1       0
192707bf8e1SSrinivas Pandruvada  37      3       2       0
193707bf8e1SSrinivas Pandruvada  38      3       1       0
194707bf8e1SSrinivas Pandruvada  39      3       2       0
195707bf8e1SSrinivas Pandruvada  40      3       3       0
196707bf8e1SSrinivas Pandruvada  41      3       1       0
197707bf8e1SSrinivas Pandruvada  42      3       2       0
198707bf8e1SSrinivas Pandruvada  43      3       1       0
199707bf8e1SSrinivas Pandruvada  44      3       1       0
200707bf8e1SSrinivas Pandruvada  45      3       2       0
201707bf8e1SSrinivas Pandruvada  46      3       3       0
202707bf8e1SSrinivas Pandruvada  47      3       0       0
203707bf8e1SSrinivas Pandruvada  48      3       2       0
204707bf8e1SSrinivas Pandruvada  49      3       3       0
205707bf8e1SSrinivas Pandruvada
206707bf8e1SSrinivas PandruvadaCalibration occurs during runtime. No offline method is available.
207707bf8e1SSrinivas PandruvadaSteady state compensation is used only when confidence levels of all
208707bf8e1SSrinivas Pandruvadaadjacent ratios have reached satisfactory level. A confidence level
209707bf8e1SSrinivas Pandruvadais accumulated based on clean data collected at runtime. Data
210707bf8e1SSrinivas Pandruvadacollected during a period without extra interrupts is considered
211707bf8e1SSrinivas Pandruvadaclean.
212707bf8e1SSrinivas Pandruvada
213707bf8e1SSrinivas PandruvadaTo compensate for excessive amounts of wakeup during idle, additional
214707bf8e1SSrinivas Pandruvadaidle time is injected when such a condition is detected. Currently,
215707bf8e1SSrinivas Pandruvadawe have a simple algorithm to double the injection ratio. A possible
216707bf8e1SSrinivas Pandruvadaenhancement might be to throttle the offending IRQ, such as delaying
217707bf8e1SSrinivas PandruvadaEOI for level triggered interrupts. But it is a challenge to be
218707bf8e1SSrinivas Pandruvadanon-intrusive to the scheduler or the IRQ core code.
219707bf8e1SSrinivas Pandruvada
220707bf8e1SSrinivas Pandruvada
221707bf8e1SSrinivas PandruvadaCPU Online/Offline
222707bf8e1SSrinivas Pandruvada------------------
223707bf8e1SSrinivas PandruvadaPer-CPU kernel threads are started/stopped upon receiving
224707bf8e1SSrinivas Pandruvadanotifications of CPU hotplug activities. The intel_powerclamp driver
225707bf8e1SSrinivas Pandruvadakeeps track of clamping kernel threads, even after they are migrated
226707bf8e1SSrinivas Pandruvadato other CPUs, after a CPU offline event.
227707bf8e1SSrinivas Pandruvada
228707bf8e1SSrinivas Pandruvada
229707bf8e1SSrinivas PandruvadaPerformance Analysis
230707bf8e1SSrinivas Pandruvada====================
231707bf8e1SSrinivas PandruvadaThis section describes the general performance data collected on
232707bf8e1SSrinivas Pandruvadamultiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
233707bf8e1SSrinivas Pandruvada
234707bf8e1SSrinivas PandruvadaEffectiveness and Limitations
235707bf8e1SSrinivas Pandruvada-----------------------------
236707bf8e1SSrinivas PandruvadaThe maximum range that idle injection is allowed is capped at 50
237707bf8e1SSrinivas Pandruvadapercent. As mentioned earlier, since interrupts are allowed during
238707bf8e1SSrinivas Pandruvadaforced idle time, excessive interrupts could result in less
239707bf8e1SSrinivas Pandruvadaeffectiveness. The extreme case would be doing a ping -f to generated
240707bf8e1SSrinivas Pandruvadaflooded network interrupts without much CPU acknowledgement. In this
241707bf8e1SSrinivas Pandruvadacase, little can be done from the idle injection threads. In most
242707bf8e1SSrinivas Pandruvadanormal cases, such as scp a large file, applications can be throttled
243707bf8e1SSrinivas Pandruvadaby the powerclamp driver, since slowing down the CPU also slows down
244707bf8e1SSrinivas Pandruvadanetwork protocol processing, which in turn reduces interrupts.
245707bf8e1SSrinivas Pandruvada
246707bf8e1SSrinivas PandruvadaWhen control parameters change at runtime by the controlling CPU, it
247707bf8e1SSrinivas Pandruvadamay take an additional period for the rest of the CPUs to catch up
248707bf8e1SSrinivas Pandruvadawith the changes. During this time, idle injection is out of sync,
249707bf8e1SSrinivas Pandruvadathus not able to enter package C- states at the expected ratio. But
250707bf8e1SSrinivas Pandruvadathis effect is minor, in that in most cases change to the target
251707bf8e1SSrinivas Pandruvadaratio is updated much less frequently than the idle injection
252707bf8e1SSrinivas Pandruvadafrequency.
253707bf8e1SSrinivas Pandruvada
254707bf8e1SSrinivas PandruvadaScalability
255707bf8e1SSrinivas Pandruvada-----------
256707bf8e1SSrinivas PandruvadaTests also show a minor, but measurable, difference between the 4P/8P
257707bf8e1SSrinivas PandruvadaIvy Bridge system and the 80P Westmere server under 50% idle ratio.
258707bf8e1SSrinivas PandruvadaMore compensation is needed on Westmere for the same amount of
259707bf8e1SSrinivas Pandruvadatarget idle ratio. The compensation also increases as the idle ratio
260707bf8e1SSrinivas Pandruvadagets larger. The above reason constitutes the need for the
261707bf8e1SSrinivas Pandruvadacalibration code.
262707bf8e1SSrinivas Pandruvada
263707bf8e1SSrinivas PandruvadaOn the IVB 8P system, compared to an offline CPU, powerclamp can
264707bf8e1SSrinivas Pandruvadaachieve up to 40% better performance per watt. (measured by a spin
265707bf8e1SSrinivas Pandruvadacounter summed over per CPU counting threads spawned for all running
266707bf8e1SSrinivas PandruvadaCPUs).
267707bf8e1SSrinivas Pandruvada
268707bf8e1SSrinivas PandruvadaUsage and Interfaces
269707bf8e1SSrinivas Pandruvada====================
270707bf8e1SSrinivas PandruvadaThe powerclamp driver is registered to the generic thermal layer as a
271707bf8e1SSrinivas Pandruvadacooling device. Currently, it’s not bound to any thermal zones::
272707bf8e1SSrinivas Pandruvada
273707bf8e1SSrinivas Pandruvada  jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
274707bf8e1SSrinivas Pandruvada  cur_state:0
275707bf8e1SSrinivas Pandruvada  max_state:50
276707bf8e1SSrinivas Pandruvada  type:intel_powerclamp
277707bf8e1SSrinivas Pandruvada
278707bf8e1SSrinivas Pandruvadacur_state allows user to set the desired idle percentage. Writing 0 to
279707bf8e1SSrinivas Pandruvadacur_state will stop idle injection. Writing a value between 1 and
280707bf8e1SSrinivas Pandruvadamax_state will start the idle injection. Reading cur_state returns the
281707bf8e1SSrinivas Pandruvadaactual and current idle percentage. This may not be the same value
282707bf8e1SSrinivas Pandruvadaset by the user in that current idle percentage depends on workload
283707bf8e1SSrinivas Pandruvadaand includes natural idle. When idle injection is disabled, reading
284707bf8e1SSrinivas Pandruvadacur_state returns value -1 instead of 0 which is to avoid confusing
285707bf8e1SSrinivas Pandruvada100% busy state with the disabled state.
286707bf8e1SSrinivas Pandruvada
287707bf8e1SSrinivas PandruvadaExample usage:
288fef1f0beSBagas Sanjaya
289707bf8e1SSrinivas Pandruvada- To inject 25% idle time::
290707bf8e1SSrinivas Pandruvada
291707bf8e1SSrinivas Pandruvada	$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
292707bf8e1SSrinivas Pandruvada
293707bf8e1SSrinivas PandruvadaIf the system is not busy and has more than 25% idle time already,
294707bf8e1SSrinivas Pandruvadathen the powerclamp driver will not start idle injection. Using Top
295707bf8e1SSrinivas Pandruvadawill not show idle injection kernel threads.
296707bf8e1SSrinivas Pandruvada
297707bf8e1SSrinivas PandruvadaIf the system is busy (spin test below) and has less than 25% natural
298707bf8e1SSrinivas Pandruvadaidle time, powerclamp kernel threads will do idle injection. Forced
299707bf8e1SSrinivas Pandruvadaidle time is accounted as normal idle in that common code path is
300707bf8e1SSrinivas Pandruvadataken as the idle task.
301707bf8e1SSrinivas Pandruvada
302707bf8e1SSrinivas PandruvadaIn this example, 24.1% idle is shown. This helps the system admin or
303707bf8e1SSrinivas Pandruvadauser determine the cause of slowdown, when a powerclamp driver is in action::
304707bf8e1SSrinivas Pandruvada
305707bf8e1SSrinivas Pandruvada
306707bf8e1SSrinivas Pandruvada  Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
307707bf8e1SSrinivas Pandruvada  Cpu(s): 71.2%us,  4.7%sy,  0.0%ni, 24.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
308707bf8e1SSrinivas Pandruvada  Mem:   3943228k total,  1689632k used,  2253596k free,    74960k buffers
309707bf8e1SSrinivas Pandruvada  Swap:  4087804k total,        0k used,  4087804k free,   945336k cached
310707bf8e1SSrinivas Pandruvada
311707bf8e1SSrinivas Pandruvada    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
312707bf8e1SSrinivas Pandruvada   3352 jacob     20   0  262m  644  428 S  286  0.0   0:17.16 spin
313707bf8e1SSrinivas Pandruvada   3341 root     -51   0     0    0    0 D   25  0.0   0:01.62 kidle_inject/0
314707bf8e1SSrinivas Pandruvada   3344 root     -51   0     0    0    0 D   25  0.0   0:01.60 kidle_inject/3
315707bf8e1SSrinivas Pandruvada   3342 root     -51   0     0    0    0 D   25  0.0   0:01.61 kidle_inject/1
316707bf8e1SSrinivas Pandruvada   3343 root     -51   0     0    0    0 D   25  0.0   0:01.60 kidle_inject/2
317707bf8e1SSrinivas Pandruvada   2935 jacob     20   0  696m 125m  35m S    5  3.3   0:31.11 firefox
318707bf8e1SSrinivas Pandruvada   1546 root      20   0  158m  20m 6640 S    3  0.5   0:26.97 Xorg
319707bf8e1SSrinivas Pandruvada   2100 jacob     20   0 1223m  88m  30m S    3  2.3   0:23.68 compiz
320707bf8e1SSrinivas Pandruvada
321707bf8e1SSrinivas PandruvadaTests have shown that by using the powerclamp driver as a cooling
322707bf8e1SSrinivas Pandruvadadevice, a PID based userspace thermal controller can manage to
323707bf8e1SSrinivas Pandruvadacontrol CPU temperature effectively, when no other thermal influence
324707bf8e1SSrinivas Pandruvadais added. For example, a UltraBook user can compile the kernel under
325707bf8e1SSrinivas Pandruvadacertain temperature (below most active trip points).
326ebf51971SSrinivas Pandruvada
327ebf51971SSrinivas PandruvadaModule Parameters
328ebf51971SSrinivas Pandruvada=================
329ebf51971SSrinivas Pandruvada
330ebf51971SSrinivas Pandruvada``cpumask`` (RW)
331ebf51971SSrinivas Pandruvada	A bit mask of CPUs to inject idle. The format of the bitmask is same as
332e8b703edSBagas Sanjaya	used in other subsystems like in /proc/irq/\*/smp_affinity. The mask is
333ebf51971SSrinivas Pandruvada	comma separated 32 bit groups. Each CPU is one bit. For example for a 256
334ebf51971SSrinivas Pandruvada	CPU system the full mask is:
335ebf51971SSrinivas Pandruvada	ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
336ebf51971SSrinivas Pandruvada
337ebf51971SSrinivas Pandruvada	The rightmost mask is for CPU 0-32.
338ebf51971SSrinivas Pandruvada
339ebf51971SSrinivas Pandruvada``max_idle`` (RW)
340ebf51971SSrinivas Pandruvada	Maximum injected idle time to the total CPU time ratio in percent range
341ebf51971SSrinivas Pandruvada	from 1 to 100. Even if the cooling device max_state is always 100 (100%),
342ebf51971SSrinivas Pandruvada	this parameter allows to add a max idle percent limit. The default is 50,
343ebf51971SSrinivas Pandruvada	to match the current implementation of powerclamp driver. Also doesn't
344ebf51971SSrinivas Pandruvada	allow value more than 75, if the cpumask includes every CPU present in
345ebf51971SSrinivas Pandruvada	the system.
346