Lines Matching +full:cpu +full:- +full:idle +full:- +full:states

1 // SPDX-License-Identifier: GPL-2.0
3 * Timer events oriented CPU idle governor
6 * Copyright (C) 2018 - 2021 Intel Corporation
9 * Util-awareness mechanism:
15 * DOC: teo-description
19 * other interrupts, so they are likely to be the most significant cause of CPU
20 * wakeups from idle states. Moreover, information about what happened in the
22 * idle state with target residency within the (known) time till the closest
24 * the upcoming CPU idle period and, if not, then which of the shallower idle
25 * states to choose instead of it.
27 * Of course, non-timer wakeup sources are more important in some use cases
28 * which can be covered by taking a few most recent idle time intervals of the
29 * CPU into account. However, even in that context it is not necessary to
30 * consider idle duration values greater than the sleep length, because the
31 * closest timer will ultimately wake up the CPU anyway unless it is woken up
34 * Thus this governor estimates whether or not the prospective idle duration of
35 * a CPU is likely to be significantly shorter than the sleep length and selects
36 * an idle state for it accordingly.
39 * boundaries are aligned with the target residency parameter values of the CPU
40 * idle states provided by the %CPUIdle driver in the ascending order. That is,
42 * the second idle state (idle state 1), the second bin spans from the target
43 * residency of idle state 1 up to, but not including, the target residency of
44 * idle state 2, the third bin spans from the target residency of idle state 2
45 * up to, but not including, the target residency of idle state 3 and so on.
46 * The last bin spans from the target residency of the deepest idle state
50 * They are updated every time before selecting an idle state for the given CPU
54 * sleep length and the idle duration measured after CPU wakeup fall into the
55 * same bin (that is, the CPU appears to wake up "on time" relative to the sleep
57 * situations in which the measured idle duration is so much shorter than the
58 * sleep length that the bin it falls into corresponds to an idle state
64 * %NR_RECENT invocations of it for the given CPU) for each bin.
66 * In order to select an idle state for a CPU, the governor takes the following
70 * 1. Find the deepest CPU idle state whose target residency does not exceed
71 * the current sleep length (the candidate idle state) and compute 3 sums as
74 * - The sum of the "hits" and "intercepts" metrics for the candidate state
75 * and all of the deeper idle states (it represents the cases in which the
76 * CPU was idle long enough to avoid being intercepted if the sleep length
79 * - The sum of the "intercepts" metrics for all of the idle states shallower
80 * than the candidate one (it represents the cases in which the CPU was not
81 * idle long enough to avoid being intercepted if the sleep length had been
84 * - The sum of the numbers of recent intercepts for all of the idle states
88 * greater than %NR_RECENT / 2, the CPU is likely to wake up early, so look
89 * for an alternative idle state to select.
91 * - Traverse the idle states shallower than the candidate one in the
94 * - For each of them compute the sum of the "intercepts" metrics and the sum
95 * of the numbers of recent intercepts over all of the idle states between
99 * - If each of these sums that needs to be taken into account (because the
100 * check related to it has indicated that the CPU is likely to wake up
103 * not exceeded the idle duration in over a half of the relevant cases),
104 * select the given idle state instead of the candidate one.
108 * Util-awareness mechanism:
110 * The idea behind the util-awareness extension is that there are two distinct
111 * scenarios for the CPU which should result in two different approaches to idle
112 * state selection - utilized and not utilized.
114 * In this case, 'utilized' means that the average runqueue util of the CPU is
117 * When the CPU is utilized while going into idle, more likely than not it will
118 * be woken up to do more work soon and so a shallower idle state should be
119 * selected to minimise latency and maximise performance. When the CPU is not
120 * being utilized, the usual metrics-based approach to selecting the deepest
121 * available idle state should be preferred to take advantage of the power
125 * The threshold is computed per-CPU as a percentage of the CPU's capacity
129 * Before selecting the next idle state, the governor compares the current CPU
131 * TEO metrics mechanism. If it's above, the closest shallower idle state will
146 * The number of bits to shift the CPU's capacity by in order to determine
165 * Number of the most recent idle duration values to take into consideration for
171 * struct teo_bin - Metrics used by the TEO cpuidle governor.
183 * struct teo_cpu - CPU data used by the TEO cpuidle governor.
184 * @time_span_ns: Time between idle state selection and post-wakeup update.
186 * @state_bins: Idle state data bins for this CPU.
191 * @util_threshold: Threshold above which the CPU is considered utilized
207 * teo_cpu_is_utilized - Check if the CPU's util is above the threshold
208 * @cpu: Target CPU
209 * @cpu_data: Governor CPU data for the target CPU
212 static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data) in teo_cpu_is_utilized() argument
214 return sched_cpu_util(cpu) > cpu_data->util_threshold; in teo_cpu_is_utilized()
217 static bool teo_cpu_is_utilized(int cpu, struct teo_cpu *cpu_data) in teo_cpu_is_utilized() argument
224 * teo_update - Update CPU metrics after wakeup.
226 * @dev: Target CPU.
230 struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); in teo_update()
235 if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { in teo_update()
238 * enough to the closest timer event expected at the idle state in teo_update()
243 u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; in teo_update()
247 * (saved) time till the next timer event and the measured idle in teo_update()
252 measured_ns = dev->last_residency_ns; in teo_update()
255 * executed by the CPU is not likely to be worst-case every in teo_update()
260 measured_ns -= lat_ns / 2; in teo_update()
265 cpu_data->total = 0; in teo_update()
269 * find the bins that the sleep length and the measured idle duration in teo_update()
272 for (i = 0; i < drv->state_count; i++) { in teo_update()
273 struct teo_bin *bin = &cpu_data->state_bins[i]; in teo_update()
275 bin->hits -= bin->hits >> DECAY_SHIFT; in teo_update()
276 bin->intercepts -= bin->intercepts >> DECAY_SHIFT; in teo_update()
278 cpu_data->total += bin->hits + bin->intercepts; in teo_update()
280 target_residency_ns = drv->states[i].target_residency_ns; in teo_update()
282 if (target_residency_ns <= cpu_data->sleep_length_ns) { in teo_update()
289 i = cpu_data->next_recent_idx++; in teo_update()
290 if (cpu_data->next_recent_idx >= NR_RECENT) in teo_update()
291 cpu_data->next_recent_idx = 0; in teo_update()
293 if (cpu_data->recent_idx[i] >= 0) in teo_update()
294 cpu_data->state_bins[cpu_data->recent_idx[i]].recent--; in teo_update()
299 * to stop the tick. This effectively adds an extra hits-only bin in teo_update()
300 * beyond the last state-related one. in teo_update()
303 cpu_data->tick_hits -= cpu_data->tick_hits >> DECAY_SHIFT; in teo_update()
305 cpu_data->total += cpu_data->tick_hits; in teo_update()
307 if (TICK_NSEC <= cpu_data->sleep_length_ns) { in teo_update()
308 idx_timer = drv->state_count; in teo_update()
310 cpu_data->tick_hits += PULSE; in teo_update()
317 * If the measured idle duration falls into the same bin as the sleep in teo_update()
320 * the measured idle duration. in teo_update()
323 cpu_data->state_bins[idx_timer].hits += PULSE; in teo_update()
324 cpu_data->recent_idx[i] = -1; in teo_update()
326 cpu_data->state_bins[idx_duration].intercepts += PULSE; in teo_update()
327 cpu_data->state_bins[idx_duration].recent++; in teo_update()
328 cpu_data->recent_idx[i] = idx_duration; in teo_update()
332 cpu_data->total += PULSE; in teo_update()
338 drv->states[i].target_residency_ns >= TICK_NSEC; in teo_state_ok()
342 * teo_find_shallower_state - Find shallower idle state matching given duration.
344 * @dev: Target CPU.
345 * @state_idx: Index of the capping idle state.
346 * @duration_ns: Idle duration value to match.
347 * @no_poll: Don't consider polling states.
355 for (i = state_idx - 1; i >= 0; i--) { in teo_find_shallower_state()
356 if (dev->states_usage[i].disable || in teo_find_shallower_state()
357 (no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING)) in teo_find_shallower_state()
361 if (drv->states[i].target_residency_ns <= duration_ns) in teo_find_shallower_state()
368 * teo_select - Selects the next idle state to enter.
370 * @dev: Target CPU.
376 struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); in teo_select()
377 s64 latency_req = cpuidle_governor_latency_req(dev->cpu); in teo_select()
387 int idx0 = 0, idx = -1; in teo_select()
393 if (dev->last_state_idx >= 0) { in teo_select()
395 dev->last_state_idx = -1; in teo_select()
398 cpu_data->time_span_ns = local_clock(); in teo_select()
403 cpu_data->sleep_length_ns = KTIME_MAX; in teo_select()
406 if (drv->state_count < 2) { in teo_select()
411 if (!dev->states_usage[0].disable) in teo_select()
414 cpu_utilized = teo_cpu_is_utilized(dev->cpu, cpu_data); in teo_select()
416 * If the CPU is being utilized over the threshold and there are only 2 in teo_select()
417 * states to choose from, the metrics need not be considered, so choose in teo_select()
418 * the shallowest non-polling state and exit. in teo_select()
420 if (drv->state_count < 3 && cpu_utilized) { in teo_select()
424 * which case care needs to be taken to leave the CPU in a deep in teo_select()
429 if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING) && in teo_select()
430 teo_state_ok(0, drv)) || dev->states_usage[1].disable) { in teo_select()
436 duration_ns = drv->states[1].target_residency_ns; in teo_select()
441 for (i = 1; i < drv->state_count; i++) { in teo_select()
442 struct teo_bin *prev_bin = &cpu_data->state_bins[i-1]; in teo_select()
443 struct cpuidle_state *s = &drv->states[i]; in teo_select()
446 * Update the sums of idle state mertics for all of the states in teo_select()
449 intercept_sum += prev_bin->intercepts; in teo_select()
450 hit_sum += prev_bin->hits; in teo_select()
451 recent_sum += prev_bin->recent; in teo_select()
453 if (dev->states_usage[i].disable) in teo_select()
461 if (s->exit_latency_ns <= latency_req) in teo_select()
472 idx = 0; /* No states enabled, must use 0. */ in teo_select()
478 * Only one idle state is enabled, so use it, but do not in teo_select()
481 duration_ns = drv->states[idx].target_residency_ns; in teo_select()
486 cpu_data->state_bins[drv->state_count-1].intercepts; in teo_select()
489 * If the sum of the intercepts metric for all of the idle states in teo_select()
492 * all of the deeper states, or the sum of the numbers of recent in teo_select()
493 * intercepts over all of the states shallower than the candidate one in teo_select()
495 * account, a shallower idle state is likely to be a better choice. in teo_select()
497 alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum; in teo_select()
503 * Look for the deepest idle state whose target residency had in teo_select()
504 * not exceeded the idle duration in over a half of the relevant in teo_select()
514 for (i = idx - 1; i >= 0; i--) { in teo_select()
515 struct teo_bin *bin = &cpu_data->state_bins[i]; in teo_select()
517 intercept_sum += bin->intercepts; in teo_select()
518 recent_sum += bin->recent; in teo_select()
529 !dev->states_usage[i].disable) in teo_select()
537 if (dev->states_usage[i].disable) in teo_select()
558 * idle state shallower than the current candidate one. in teo_select()
564 * If the CPU is being utilized over the threshold, choose a shallower in teo_select()
565 * non-polling state to improve latency, unless the scheduler tick has in teo_select()
577 * because an immediate non-timer wakeup is expected in that case. in teo_select()
587 if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) && in teo_select()
588 drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) in teo_select()
592 cpu_data->sleep_length_ns = duration_ns; in teo_select()
598 if (drv->states[idx].target_residency_ns > duration_ns) { in teo_select()
609 if (drv->states[idx].target_residency_ns < TICK_NSEC && in teo_select()
610 tick_intercept_sum > cpu_data->total / 2 + cpu_data->total / 8) in teo_select()
616 * one or the expected idle duration is shorter than the tick period in teo_select()
619 if ((!(drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && in teo_select()
629 drv->states[idx].target_residency_ns > delta_tick) in teo_select()
638 * teo_reflect - Note that governor data for the CPU need to be updated.
639 * @dev: Target CPU.
644 struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); in teo_reflect()
646 dev->last_state_idx = state; in teo_reflect()
649 * nets, assume that the CPU might have been idle for the entire sleep in teo_reflect()
652 if (dev->poll_time_limit || in teo_reflect()
653 (tick_nohz_idle_got_tick() && cpu_data->sleep_length_ns > TICK_NSEC)) { in teo_reflect()
654 dev->poll_time_limit = false; in teo_reflect()
655 cpu_data->time_span_ns = cpu_data->sleep_length_ns; in teo_reflect()
657 cpu_data->time_span_ns = local_clock() - cpu_data->time_span_ns; in teo_reflect()
662 * teo_enable_device - Initialize the governor's data for the target CPU.
664 * @dev: Target CPU.
669 struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); in teo_enable_device()
670 unsigned long max_capacity = arch_scale_cpu_capacity(dev->cpu); in teo_enable_device()
674 cpu_data->util_threshold = max_capacity >> UTIL_THRESHOLD_SHIFT; in teo_enable_device()
677 cpu_data->recent_idx[i] = -1; in teo_enable_device()