78aabcb3 | 22-Aug-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Avoid unnecessary variable assignments
Notice that it is not necessary to assign tick_intercept_sum in every iteration of the first loop over idle states in teo_select(), because the i
cpuidle: teo: Avoid unnecessary variable assignments
Notice that it is not necessary to assign tick_intercept_sum in every iteration of the first loop over idle states in teo_select(), because the intercept_sum value does not change after the assignment in a given iteration of the loop, so its value after the last iteration of the loop can be used for computing the tick_intercept_sum value directly.
Modify the code accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
5484e31b | 10-Aug-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in menu_select() so it first
cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in menu_select() so it first uses the statistics to determine the expected idle duration. If that value is higher than RESIDENCY_THRESHOLD_NS, tick_nohz_get_sleep_length() will be called to obtain the time till the closest timer and refine the idle duration prediction if necessary.
This causes the governor to always take the full overhead of get_typical_interval() with the assumption that the cost will be amortized by skipping the tick_nohz_get_sleep_length() call in the cases when the predicted idle duration is relatively very small.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net>
show more ...
|
26623420 | 03-Aug-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Gather statistics regarding whether or not to stop the tick
Currently, if the target residency of the deepest idle state is less than the tick period length, which is quite likely for
cpuidle: teo: Gather statistics regarding whether or not to stop the tick
Currently, if the target residency of the deepest idle state is less than the tick period length, which is quite likely for HZ=100, and the deepest idle state is about to be selected by the TEO idle governor, the decision on whether or not to stop the scheduler tick is based entirely on the time till the closest timer. This is often insufficient, because timers may not be in heavy use and there may be a plenty of other CPU wakeup events between the deepest idle state's target residency and the closest tick.
Allow the governor to count those events by making the deepest idle state's bin effectively end at TICK_NSEC and introducing an additional "bin" for collecting "hit" events (ie. the ones in which the measured idle duration falls into the same bin as the time till the closest timer) with idle duration values past TICK_NSEC.
This way the "intercepts" metric for the deepest idle state's bin becomes nonzero in general, and so it can influence the decision on whether or not to stop the tick possibly increasing the governor's accuracy in that respect.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
show more ...
|
6da8f9ba | 03-Aug-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases
Make teo_select() avoid calling tick_nohz_get_sleep_length() if the candidate idle state to return is state 0 or if state 0 is a po
cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases
Make teo_select() avoid calling tick_nohz_get_sleep_length() if the candidate idle state to return is state 0 or if state 0 is a polling one and the target residency of the current candidate one is below a certain threshold, in which cases it may be assumed that the CPU will be woken up immediately by a non-timer wakeup source and the timers are not likely to matter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
show more ...
|
21d28cd2 | 03-Aug-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront
Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in teo_select() so it first uses
cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront
Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in teo_select() so it first uses the statistics to pick up a candidate idle state and applies the utilization heuristic to it and only then calls tick_nohz_get_sleep_length() to obtain the sleep length value and refine the selection if necessary.
This change by itself does not cause tick_nohz_get_sleep_length() to be called less often, but it prepares the code for subsequent changes that will do so.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
show more ...
|
9a41e16f | 31-Jul-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Drop utilized from struct teo_cpu
Because the utilized field in struct teo_cpu is only used locally in teo_select(), replace it with a local variable in that function.
No intentional
cpuidle: teo: Drop utilized from struct teo_cpu
Because the utilized field in struct teo_cpu is only used locally in teo_select(), replace it with a local variable in that function.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
show more ...
|
04bae4e2 | 31-Jul-2023 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out
When teo_select() is going to return early in some special cases, make it avoid stopping the tick if the idle state to be returne
cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out
When teo_select() is going to return early in some special cases, make it avoid stopping the tick if the idle state to be returned is shallow.
In particular, never stop the tick if state 0 is to be returned.
Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
show more ...
|
9ce0f7c4 | 05-Jan-2023 |
Kajetan Puchalski <kajetan.puchalski@arm.com> |
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have power efficient shallow idle states. Selecting deeper idle states on a device while a l
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have power efficient shallow idle states. Selecting deeper idle states on a device while a latency-sensitive workload is running can adversely impact performance due to increased latency. Additionally, if the CPU wakes up from a deeper sleep before its target residency as is often the case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling information into account. They also tend to overestimate the idle duration quite often, which causes them to select excessively deep idle states, thus leading to increased wakeup latency and lower performance with no power saving. For 'menu' while web browsing on Android for instance, those types of wakeups ('too deep') account for over 24% of all wakeups.
At the same time, on some platforms idle state 0 can be power efficient enough to warrant wanting to prefer it over idle state 1. This is because the power usage of the two states can be so close that sufficient amounts of too deep state 1 sleeps can completely offset the state 1 power saving to the point where it would've been more power efficient to just use state 0 instead. This is, of course, for systems where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too shallow') only save less power than they otherwise could have. Too deep sleeps, on the other hand, harm performance and nullify the potential power saving from using state 1 in the first place. While taking this into account, it is clear that on balance it is preferable for an idle governor to have more too shallow sleeps instead of more too deep sleeps on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util signal of a CPU's runqueue in order to determine to what extent the CPU is being utilized. This util value is then compared to a threshold defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5% in the current implementation). If the util is above the threshold, the index of the idle state selected by TEO metrics will be reduced by 1, thus selecting a shallower state. If the util is below the threshold, the governor defaults to the TEO metrics mechanism to try to select the deepest available idle state based on the closest timer event and its own correctness.
The main goal of this is to reduce latency and increase performance for some workloads. Under some workloads it will result in an increase in power usage (Geekbench 5) while for other workloads it will also result in a decrease in power usage compared to TEO (PCMark Web, Jankbench, Speedometer).
It can provide drastically decreased latency and performance benefits in certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) | | gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) | | gmean too deep % | 14.99% | 9.65% | 4.02% | | gmean too shallow % | 2.5% | 5.96% | 14.59% | | gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware | | ------------------------------------- | -------------- | ----------------- | ----------------- | | gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) | | gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) | | gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) | | gmean too deep % | 26.00% | 11.00% | 2.54% | | gmean too shallow % | 4.74% | 11.89% | 21.93% | | gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) | | gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com> [ rjw: Comment edits and white space adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
4adae7dd | 30-Jul-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Rename two local variables in teo_select()
Rename two local variables in teo_select() so that their names better reflect their purpose.
No functional impact.
Signed-off-by: Rafael J.
cpuidle: teo: Rename two local variables in teo_select()
Rename two local variables in teo_select() so that their names better reflect their purpose.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
795e0e38 | 15-Jun-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
cpuidle: teo: remove unneeded semicolon in teo_select()
Fix following coccicheck warning: drivers/cpuidle/governors/teo.c:315:10-11: Unneeded semicolon
Signed-off-by: Wan Jiabing <wanjiabing@vivo.c
cpuidle: teo: remove unneeded semicolon in teo_select()
Fix following coccicheck warning: drivers/cpuidle/governors/teo.c:315:10-11: Unneeded semicolon
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
154ae8bb | 02-Jun-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Use kerneldoc documentation in admin-guide
There are two descriptions of the TEO (Timer Events Oriented) cpuidle governor in the kernel source tree, one in the C file containing its co
cpuidle: teo: Use kerneldoc documentation in admin-guide
There are two descriptions of the TEO (Timer Events Oriented) cpuidle governor in the kernel source tree, one in the C file containing its code and one in cpuidle.rst which is part of admin-guide.
Instead of trying to keep them both in sync and in order to reduce text duplication, include the governor description from the C file directly into cpuidle.rst.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
77577558 | 02-Jun-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Rework most recent idle duration values treatment
The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle stat
cpuidle: teo: Rework most recent idle duration values treatment
The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value.
However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit.
Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
c410a9a1 | 02-Jun-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection.
Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not.
In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU).
Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
b18e0de1 | 02-Jun-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Cosmetic modification of teo_select()
Initialize local variables in teo_select() where they are declared.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@int
cpuidle: teo: Cosmetic modification of teo_select()
Initialize local variables in teo_select() where they are declared.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
060e3535 | 29-Mar-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: menu: Take negative "sleep length" values into account
Make the menu governor check the tick_nohz_get_next_hrtimer() return value so as to avoid dealing with negative "sleep length" values
cpuidle: menu: Take negative "sleep length" values into account
Make the menu governor check the tick_nohz_get_next_hrtimer() return value so as to avoid dealing with negative "sleep length" values and make it use that value directly when the tick is stopped.
While at it, rename local variable delta_next in menu_select() to delta_tick which better reflects its purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
030adec9 | 29-Mar-2021 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
cpuidle: teo: Take negative "sleep length" values into account
Modify the TEO governor to take possible negative return values of tick_nohz_get_next_hrtimer() into account by changing the data type
cpuidle: teo: Take negative "sleep length" values into account
Modify the TEO governor to take possible negative return values of tick_nohz_get_next_hrtimer() into account by changing the data type of some variables used by it to s64 which allows it to carry out computations without potentially problematic data type conversions into u64.
Also change the computations in teo_select() so that the negative values themselves are handled in a natural way to avoid adding extra negative value checks to that function.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|