165065fd7SValentin Schneider=========================
265065fd7SValentin SchneiderCapacity Aware Scheduling
365065fd7SValentin Schneider=========================
465065fd7SValentin Schneider
565065fd7SValentin Schneider1. CPU Capacity
665065fd7SValentin Schneider===============
765065fd7SValentin Schneider
865065fd7SValentin Schneider1.1 Introduction
965065fd7SValentin Schneider----------------
1065065fd7SValentin Schneider
1165065fd7SValentin SchneiderConventional, homogeneous SMP platforms are composed of purely identical
1265065fd7SValentin SchneiderCPUs. Heterogeneous platforms on the other hand are composed of CPUs with
1365065fd7SValentin Schneiderdifferent performance characteristics - on such platforms, not all CPUs can be
1465065fd7SValentin Schneiderconsidered equal.
1565065fd7SValentin Schneider
1665065fd7SValentin SchneiderCPU capacity is a measure of the performance a CPU can reach, normalized against
1765065fd7SValentin Schneiderthe most performant CPU in the system. Heterogeneous systems are also called
1865065fd7SValentin Schneiderasymmetric CPU capacity systems, as they contain CPUs of different capacities.
1965065fd7SValentin Schneider
2065065fd7SValentin SchneiderDisparity in maximum attainable performance (IOW in maximum CPU capacity) stems
2165065fd7SValentin Schneiderfrom two factors:
2265065fd7SValentin Schneider
2365065fd7SValentin Schneider- not all CPUs may have the same microarchitecture (µarch).
2465065fd7SValentin Schneider- with Dynamic Voltage and Frequency Scaling (DVFS), not all CPUs may be
2565065fd7SValentin Schneider  physically able to attain the higher Operating Performance Points (OPP).
2665065fd7SValentin Schneider
2765065fd7SValentin SchneiderArm big.LITTLE systems are an example of both. The big CPUs are more
2865065fd7SValentin Schneiderperformance-oriented than the LITTLE ones (more pipeline stages, bigger caches,
2965065fd7SValentin Schneidersmarter predictors, etc), and can usually reach higher OPPs than the LITTLE ones
3065065fd7SValentin Schneidercan.
3165065fd7SValentin Schneider
3265065fd7SValentin SchneiderCPU performance is usually expressed in Millions of Instructions Per Second
3365065fd7SValentin Schneider(MIPS), which can also be expressed as a given amount of instructions attainable
3465065fd7SValentin Schneiderper Hz, leading to::
3565065fd7SValentin Schneider
3665065fd7SValentin Schneider  capacity(cpu) = work_per_hz(cpu) * max_freq(cpu)
3765065fd7SValentin Schneider
3865065fd7SValentin Schneider1.2 Scheduler terms
3965065fd7SValentin Schneider-------------------
4065065fd7SValentin Schneider
4165065fd7SValentin SchneiderTwo different capacity values are used within the scheduler. A CPU's
4265065fd7SValentin Schneider``capacity_orig`` is its maximum attainable capacity, i.e. its maximum
4365065fd7SValentin Schneiderattainable performance level. A CPU's ``capacity`` is its ``capacity_orig`` to
4465065fd7SValentin Schneiderwhich some loss of available performance (e.g. time spent handling IRQs) is
4565065fd7SValentin Schneidersubtracted.
4665065fd7SValentin Schneider
4765065fd7SValentin SchneiderNote that a CPU's ``capacity`` is solely intended to be used by the CFS class,
4865065fd7SValentin Schneiderwhile ``capacity_orig`` is class-agnostic. The rest of this document will use
4965065fd7SValentin Schneiderthe term ``capacity`` interchangeably with ``capacity_orig`` for the sake of
5065065fd7SValentin Schneiderbrevity.
5165065fd7SValentin Schneider
5265065fd7SValentin Schneider1.3 Platform examples
5365065fd7SValentin Schneider---------------------
5465065fd7SValentin Schneider
5565065fd7SValentin Schneider1.3.1 Identical OPPs
5665065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~
5765065fd7SValentin Schneider
5865065fd7SValentin SchneiderConsider an hypothetical dual-core asymmetric CPU capacity system where
5965065fd7SValentin Schneider
6065065fd7SValentin Schneider- work_per_hz(CPU0) = W
6165065fd7SValentin Schneider- work_per_hz(CPU1) = W/2
6265065fd7SValentin Schneider- all CPUs are running at the same fixed frequency
6365065fd7SValentin Schneider
6465065fd7SValentin SchneiderBy the above definition of capacity:
6565065fd7SValentin Schneider
6665065fd7SValentin Schneider- capacity(CPU0) = C
6765065fd7SValentin Schneider- capacity(CPU1) = C/2
6865065fd7SValentin Schneider
6965065fd7SValentin SchneiderTo draw the parallel with Arm big.LITTLE, CPU0 would be a big while CPU1 would
7065065fd7SValentin Schneiderbe a LITTLE.
7165065fd7SValentin Schneider
7265065fd7SValentin SchneiderWith a workload that periodically does a fixed amount of work, you will get an
7365065fd7SValentin Schneiderexecution trace like so::
7465065fd7SValentin Schneider
7565065fd7SValentin Schneider CPU0 work ^
7665065fd7SValentin Schneider           |     ____                ____                ____
7765065fd7SValentin Schneider           |    |    |              |    |              |    |
7865065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
7965065fd7SValentin Schneider
8065065fd7SValentin Schneider CPU1 work ^
8165065fd7SValentin Schneider           |     _________           _________           ____
8265065fd7SValentin Schneider           |    |         |         |         |         |
8365065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
8465065fd7SValentin Schneider
8565065fd7SValentin SchneiderCPU0 has the highest capacity in the system (C), and completes a fixed amount of
8665065fd7SValentin Schneiderwork W in T units of time. On the other hand, CPU1 has half the capacity of
8765065fd7SValentin SchneiderCPU0, and thus only completes W/2 in T.
8865065fd7SValentin Schneider
8965065fd7SValentin Schneider1.3.2 Different max OPPs
9065065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~
9165065fd7SValentin Schneider
9265065fd7SValentin SchneiderUsually, CPUs of different capacity values also have different maximum
9365065fd7SValentin SchneiderOPPs. Consider the same CPUs as above (i.e. same work_per_hz()) with:
9465065fd7SValentin Schneider
9565065fd7SValentin Schneider- max_freq(CPU0) = F
9665065fd7SValentin Schneider- max_freq(CPU1) = 2/3 * F
9765065fd7SValentin Schneider
9865065fd7SValentin SchneiderThis yields:
9965065fd7SValentin Schneider
10065065fd7SValentin Schneider- capacity(CPU0) = C
10165065fd7SValentin Schneider- capacity(CPU1) = C/3
10265065fd7SValentin Schneider
10365065fd7SValentin SchneiderExecuting the same workload as described in 1.3.1, which each CPU running at its
10465065fd7SValentin Schneidermaximum frequency results in::
10565065fd7SValentin Schneider
10665065fd7SValentin Schneider CPU0 work ^
10765065fd7SValentin Schneider           |     ____                ____                ____
10865065fd7SValentin Schneider           |    |    |              |    |              |    |
10965065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
11065065fd7SValentin Schneider
11165065fd7SValentin Schneider                            workload on CPU1
11265065fd7SValentin Schneider CPU1 work ^
11365065fd7SValentin Schneider           |     ______________      ______________      ____
11465065fd7SValentin Schneider           |    |              |    |              |    |
11565065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
11665065fd7SValentin Schneider
11765065fd7SValentin Schneider1.4 Representation caveat
11865065fd7SValentin Schneider-------------------------
11965065fd7SValentin Schneider
12065065fd7SValentin SchneiderIt should be noted that having a *single* value to represent differences in CPU
12165065fd7SValentin Schneiderperformance is somewhat of a contentious point. The relative performance
12265065fd7SValentin Schneiderdifference between two different µarchs could be X% on integer operations, Y% on
12365065fd7SValentin Schneiderfloating point operations, Z% on branches, and so on. Still, results using this
12465065fd7SValentin Schneidersimple approach have been satisfactory for now.
12565065fd7SValentin Schneider
12665065fd7SValentin Schneider2. Task utilization
12765065fd7SValentin Schneider===================
12865065fd7SValentin Schneider
12965065fd7SValentin Schneider2.1 Introduction
13065065fd7SValentin Schneider----------------
13165065fd7SValentin Schneider
13265065fd7SValentin SchneiderCapacity aware scheduling requires an expression of a task's requirements with
13365065fd7SValentin Schneiderregards to CPU capacity. Each scheduler class can express this differently, and
13465065fd7SValentin Schneiderwhile task utilization is specific to CFS, it is convenient to describe it here
13565065fd7SValentin Schneiderin order to introduce more generic concepts.
13665065fd7SValentin Schneider
13765065fd7SValentin SchneiderTask utilization is a percentage meant to represent the throughput requirements
13865065fd7SValentin Schneiderof a task. A simple approximation of it is the task's duty cycle, i.e.::
13965065fd7SValentin Schneider
14065065fd7SValentin Schneider  task_util(p) = duty_cycle(p)
14165065fd7SValentin Schneider
14265065fd7SValentin SchneiderOn an SMP system with fixed frequencies, 100% utilization suggests the task is a
14365065fd7SValentin Schneiderbusy loop. Conversely, 10% utilization hints it is a small periodic task that
14465065fd7SValentin Schneiderspends more time sleeping than executing. Variable CPU frequencies and
14565065fd7SValentin Schneiderasymmetric CPU capacities complexify this somewhat; the following sections will
14665065fd7SValentin Schneiderexpand on these.
14765065fd7SValentin Schneider
14865065fd7SValentin Schneider2.2 Frequency invariance
14965065fd7SValentin Schneider------------------------
15065065fd7SValentin Schneider
15165065fd7SValentin SchneiderOne issue that needs to be taken into account is that a workload's duty cycle is
15265065fd7SValentin Schneiderdirectly impacted by the current OPP the CPU is running at. Consider running a
15365065fd7SValentin Schneiderperiodic workload at a given frequency F::
15465065fd7SValentin Schneider
15565065fd7SValentin Schneider  CPU work ^
15665065fd7SValentin Schneider           |     ____                ____                ____
15765065fd7SValentin Schneider           |    |    |              |    |              |    |
15865065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
15965065fd7SValentin Schneider
16065065fd7SValentin SchneiderThis yields duty_cycle(p) == 25%.
16165065fd7SValentin Schneider
16265065fd7SValentin SchneiderNow, consider running the *same* workload at frequency F/2::
16365065fd7SValentin Schneider
16465065fd7SValentin Schneider  CPU work ^
16565065fd7SValentin Schneider           |     _________           _________           ____
16665065fd7SValentin Schneider           |    |         |         |         |         |
16765065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
16865065fd7SValentin Schneider
16965065fd7SValentin SchneiderThis yields duty_cycle(p) == 50%, despite the task having the exact same
17065065fd7SValentin Schneiderbehaviour (i.e. executing the same amount of work) in both executions.
17165065fd7SValentin Schneider
17265065fd7SValentin SchneiderThe task utilization signal can be made frequency invariant using the following
17365065fd7SValentin Schneiderformula::
17465065fd7SValentin Schneider
17565065fd7SValentin Schneider  task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) / max_frequency(cpu))
17665065fd7SValentin Schneider
17765065fd7SValentin SchneiderApplying this formula to the two examples above yields a frequency invariant
17865065fd7SValentin Schneidertask utilization of 25%.
17965065fd7SValentin Schneider
18065065fd7SValentin Schneider2.3 CPU invariance
18165065fd7SValentin Schneider------------------
18265065fd7SValentin Schneider
18365065fd7SValentin SchneiderCPU capacity has a similar effect on task utilization in that running an
18465065fd7SValentin Schneideridentical workload on CPUs of different capacity values will yield different
18565065fd7SValentin Schneiderduty cycles.
18665065fd7SValentin Schneider
18765065fd7SValentin SchneiderConsider the system described in 1.3.2., i.e.::
18865065fd7SValentin Schneider
18965065fd7SValentin Schneider- capacity(CPU0) = C
19065065fd7SValentin Schneider- capacity(CPU1) = C/3
19165065fd7SValentin Schneider
19265065fd7SValentin SchneiderExecuting a given periodic workload on each CPU at their maximum frequency would
19365065fd7SValentin Schneiderresult in::
19465065fd7SValentin Schneider
19565065fd7SValentin Schneider CPU0 work ^
19665065fd7SValentin Schneider           |     ____                ____                ____
19765065fd7SValentin Schneider           |    |    |              |    |              |    |
19865065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
19965065fd7SValentin Schneider
20065065fd7SValentin Schneider CPU1 work ^
20165065fd7SValentin Schneider           |     ______________      ______________      ____
20265065fd7SValentin Schneider           |    |              |    |              |    |
20365065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
20465065fd7SValentin Schneider
20565065fd7SValentin SchneiderIOW,
20665065fd7SValentin Schneider
20765065fd7SValentin Schneider- duty_cycle(p) == 25% if p runs on CPU0 at its maximum frequency
20865065fd7SValentin Schneider- duty_cycle(p) == 75% if p runs on CPU1 at its maximum frequency
20965065fd7SValentin Schneider
21065065fd7SValentin SchneiderThe task utilization signal can be made CPU invariant using the following
21165065fd7SValentin Schneiderformula::
21265065fd7SValentin Schneider
21365065fd7SValentin Schneider  task_util_cpu_inv(p) = duty_cycle(p) * (capacity(cpu) / max_capacity)
21465065fd7SValentin Schneider
21565065fd7SValentin Schneiderwith ``max_capacity`` being the highest CPU capacity value in the
21665065fd7SValentin Schneidersystem. Applying this formula to the above example above yields a CPU
21765065fd7SValentin Schneiderinvariant task utilization of 25%.
21865065fd7SValentin Schneider
21965065fd7SValentin Schneider2.4 Invariant task utilization
22065065fd7SValentin Schneider------------------------------
22165065fd7SValentin Schneider
22265065fd7SValentin SchneiderBoth frequency and CPU invariance need to be applied to task utilization in
22365065fd7SValentin Schneiderorder to obtain a truly invariant signal. The pseudo-formula for a task
22465065fd7SValentin Schneiderutilization that is both CPU and frequency invariant is thus, for a given
22565065fd7SValentin Schneidertask p::
22665065fd7SValentin Schneider
22765065fd7SValentin Schneider                                     curr_frequency(cpu)   capacity(cpu)
22865065fd7SValentin Schneider  task_util_inv(p) = duty_cycle(p) * ------------------- * -------------
22965065fd7SValentin Schneider                                     max_frequency(cpu)    max_capacity
23065065fd7SValentin Schneider
23165065fd7SValentin SchneiderIn other words, invariant task utilization describes the behaviour of a task as
23265065fd7SValentin Schneiderif it were running on the highest-capacity CPU in the system, running at its
23365065fd7SValentin Schneidermaximum frequency.
23465065fd7SValentin Schneider
23565065fd7SValentin SchneiderAny mention of task utilization in the following sections will imply its
23665065fd7SValentin Schneiderinvariant form.
23765065fd7SValentin Schneider
23865065fd7SValentin Schneider2.5 Utilization estimation
23965065fd7SValentin Schneider--------------------------
24065065fd7SValentin Schneider
24165065fd7SValentin SchneiderWithout a crystal ball, task behaviour (and thus task utilization) cannot
24265065fd7SValentin Schneideraccurately be predicted the moment a task first becomes runnable. The CFS class
24365065fd7SValentin Schneidermaintains a handful of CPU and task signals based on the Per-Entity Load
24465065fd7SValentin SchneiderTracking (PELT) mechanism, one of those yielding an *average* utilization (as
24565065fd7SValentin Schneideropposed to instantaneous).
24665065fd7SValentin Schneider
24765065fd7SValentin SchneiderThis means that while the capacity aware scheduling criteria will be written
24865065fd7SValentin Schneiderconsidering a "true" task utilization (using a crystal ball), the implementation
24965065fd7SValentin Schneiderwill only ever be able to use an estimator thereof.
25065065fd7SValentin Schneider
25165065fd7SValentin Schneider3. Capacity aware scheduling requirements
25265065fd7SValentin Schneider=========================================
25365065fd7SValentin Schneider
25465065fd7SValentin Schneider3.1 CPU capacity
25565065fd7SValentin Schneider----------------
25665065fd7SValentin Schneider
25765065fd7SValentin SchneiderLinux cannot currently figure out CPU capacity on its own, this information thus
25865065fd7SValentin Schneiderneeds to be handed to it. Architectures must define arch_scale_cpu_capacity()
25965065fd7SValentin Schneiderfor that purpose.
26065065fd7SValentin Schneider
261*5d89176aSSong ShuaiThe arm, arm64, and RISC-V architectures directly map this to the arch_topology driver
26265065fd7SValentin SchneiderCPU scaling data, which is derived from the capacity-dmips-mhz CPU binding; see
2637d207831SConor DooleyDocumentation/devicetree/bindings/cpu/cpu-capacity.txt.
26465065fd7SValentin Schneider
26565065fd7SValentin Schneider3.2 Frequency invariance
26665065fd7SValentin Schneider------------------------
26765065fd7SValentin Schneider
26865065fd7SValentin SchneiderAs stated in 2.2, capacity-aware scheduling requires a frequency-invariant task
26965065fd7SValentin Schneiderutilization. Architectures must define arch_scale_freq_capacity(cpu) for that
27065065fd7SValentin Schneiderpurpose.
27165065fd7SValentin Schneider
27265065fd7SValentin SchneiderImplementing this function requires figuring out at which frequency each CPU
27365065fd7SValentin Schneiderhave been running at. One way to implement this is to leverage hardware counters
27465065fd7SValentin Schneiderwhose increment rate scale with a CPU's current frequency (APERF/MPERF on x86,
27565065fd7SValentin SchneiderAMU on arm64). Another is to directly hook into cpufreq frequency transitions,
27665065fd7SValentin Schneiderwhen the kernel is aware of the switched-to frequency (also employed by
27765065fd7SValentin Schneiderarm/arm64).
27865065fd7SValentin Schneider
27965065fd7SValentin Schneider4. Scheduler topology
28065065fd7SValentin Schneider=====================
28165065fd7SValentin Schneider
28265065fd7SValentin SchneiderDuring the construction of the sched domains, the scheduler will figure out
28365065fd7SValentin Schneiderwhether the system exhibits asymmetric CPU capacities. Should that be the
28465065fd7SValentin Schneidercase:
28565065fd7SValentin Schneider
28665065fd7SValentin Schneider- The sched_asym_cpucapacity static key will be enabled.
287adf3c31eSBeata Michalska- The SD_ASYM_CPUCAPACITY_FULL flag will be set at the lowest sched_domain
288adf3c31eSBeata Michalska  level that spans all unique CPU capacity values.
289adf3c31eSBeata Michalska- The SD_ASYM_CPUCAPACITY flag will be set for any sched_domain that spans
290adf3c31eSBeata Michalska  CPUs with any range of asymmetry.
29165065fd7SValentin Schneider
29265065fd7SValentin SchneiderThe sched_asym_cpucapacity static key is intended to guard sections of code that
29365065fd7SValentin Schneidercater to asymmetric CPU capacity systems. Do note however that said key is
29465065fd7SValentin Schneider*system-wide*. Imagine the following setup using cpusets::
29565065fd7SValentin Schneider
29665065fd7SValentin Schneider  capacity    C/2          C
29765065fd7SValentin Schneider            ________    ________
29865065fd7SValentin Schneider           /        \  /        \
29965065fd7SValentin Schneider  CPUs     0  1  2  3  4  5  6  7
30065065fd7SValentin Schneider           \__/  \______________/
30165065fd7SValentin Schneider  cpusets   cs0         cs1
30265065fd7SValentin Schneider
30365065fd7SValentin SchneiderWhich could be created via:
30465065fd7SValentin Schneider
30565065fd7SValentin Schneider.. code-block:: sh
30665065fd7SValentin Schneider
30765065fd7SValentin Schneider  mkdir /sys/fs/cgroup/cpuset/cs0
30865065fd7SValentin Schneider  echo 0-1 > /sys/fs/cgroup/cpuset/cs0/cpuset.cpus
30965065fd7SValentin Schneider  echo 0 > /sys/fs/cgroup/cpuset/cs0/cpuset.mems
31065065fd7SValentin Schneider
31165065fd7SValentin Schneider  mkdir /sys/fs/cgroup/cpuset/cs1
31265065fd7SValentin Schneider  echo 2-7 > /sys/fs/cgroup/cpuset/cs1/cpuset.cpus
31365065fd7SValentin Schneider  echo 0 > /sys/fs/cgroup/cpuset/cs1/cpuset.mems
31465065fd7SValentin Schneider
31565065fd7SValentin Schneider  echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance
31665065fd7SValentin Schneider
31765065fd7SValentin SchneiderSince there *is* CPU capacity asymmetry in the system, the
31865065fd7SValentin Schneidersched_asym_cpucapacity static key will be enabled. However, the sched_domain
31965065fd7SValentin Schneiderhierarchy of CPUs 0-1 spans a single capacity value: SD_ASYM_CPUCAPACITY isn't
32065065fd7SValentin Schneiderset in that hierarchy, it describes an SMP island and should be treated as such.
32165065fd7SValentin Schneider
32265065fd7SValentin SchneiderTherefore, the 'canonical' pattern for protecting codepaths that cater to
32365065fd7SValentin Schneiderasymmetric CPU capacities is to:
32465065fd7SValentin Schneider
32565065fd7SValentin Schneider- Check the sched_asym_cpucapacity static key
32665065fd7SValentin Schneider- If it is enabled, then also check for the presence of SD_ASYM_CPUCAPACITY in
32765065fd7SValentin Schneider  the sched_domain hierarchy (if relevant, i.e. the codepath targets a specific
32865065fd7SValentin Schneider  CPU or group thereof)
32965065fd7SValentin Schneider
33065065fd7SValentin Schneider5. Capacity aware scheduling implementation
33165065fd7SValentin Schneider===========================================
33265065fd7SValentin Schneider
33365065fd7SValentin Schneider5.1 CFS
33465065fd7SValentin Schneider-------
33565065fd7SValentin Schneider
33665065fd7SValentin Schneider5.1.1 Capacity fitness
33765065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~
33865065fd7SValentin Schneider
33965065fd7SValentin SchneiderThe main capacity scheduling criterion of CFS is::
34065065fd7SValentin Schneider
34165065fd7SValentin Schneider  task_util(p) < capacity(task_cpu(p))
34265065fd7SValentin Schneider
34365065fd7SValentin SchneiderThis is commonly called the capacity fitness criterion, i.e. CFS must ensure a
34465065fd7SValentin Schneidertask "fits" on its CPU. If it is violated, the task will need to achieve more
34565065fd7SValentin Schneiderwork than what its CPU can provide: it will be CPU-bound.
34665065fd7SValentin Schneider
34765065fd7SValentin SchneiderFurthermore, uclamp lets userspace specify a minimum and a maximum utilization
34865065fd7SValentin Schneidervalue for a task, either via sched_setattr() or via the cgroup interface (see
34965065fd7SValentin SchneiderDocumentation/admin-guide/cgroup-v2.rst). As its name imply, this can be used to
35065065fd7SValentin Schneiderclamp task_util() in the previous criterion.
35165065fd7SValentin Schneider
35265065fd7SValentin Schneider5.1.2 Wakeup CPU selection
35365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~
35465065fd7SValentin Schneider
35565065fd7SValentin SchneiderCFS task wakeup CPU selection follows the capacity fitness criterion described
35665065fd7SValentin Schneiderabove. On top of that, uclamp is used to clamp the task utilization values,
35765065fd7SValentin Schneiderwhich lets userspace have more leverage over the CPU selection of CFS
35865065fd7SValentin Schneidertasks. IOW, CFS wakeup CPU selection searches for a CPU that satisfies::
35965065fd7SValentin Schneider
36065065fd7SValentin Schneider  clamp(task_util(p), task_uclamp_min(p), task_uclamp_max(p)) < capacity(cpu)
36165065fd7SValentin Schneider
36265065fd7SValentin SchneiderBy using uclamp, userspace can e.g. allow a busy loop (100% utilization) to run
36365065fd7SValentin Schneideron any CPU by giving it a low uclamp.max value. Conversely, it can force a small
36465065fd7SValentin Schneiderperiodic task (e.g. 10% utilization) to run on the highest-performance CPUs by
36565065fd7SValentin Schneidergiving it a high uclamp.min value.
36665065fd7SValentin Schneider
36765065fd7SValentin Schneider.. note::
36865065fd7SValentin Schneider
36965065fd7SValentin Schneider  Wakeup CPU selection in CFS can be eclipsed by Energy Aware Scheduling
370e4e29e78SMauro Carvalho Chehab  (EAS), which is described in Documentation/scheduler/sched-energy.rst.
37165065fd7SValentin Schneider
37265065fd7SValentin Schneider5.1.3 Load balancing
37365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~
37465065fd7SValentin Schneider
37565065fd7SValentin SchneiderA pathological case in the wakeup CPU selection occurs when a task rarely
37665065fd7SValentin Schneidersleeps, if at all - it thus rarely wakes up, if at all. Consider::
37765065fd7SValentin Schneider
37865065fd7SValentin Schneider  w == wakeup event
37965065fd7SValentin Schneider
38065065fd7SValentin Schneider  capacity(CPU0) = C
38165065fd7SValentin Schneider  capacity(CPU1) = C / 3
38265065fd7SValentin Schneider
38365065fd7SValentin Schneider                           workload on CPU0
38465065fd7SValentin Schneider  CPU work ^
38565065fd7SValentin Schneider           |     _________           _________           ____
38665065fd7SValentin Schneider           |    |         |         |         |         |
38765065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+-> time
38865065fd7SValentin Schneider                w                   w                   w
38965065fd7SValentin Schneider
39065065fd7SValentin Schneider                           workload on CPU1
39165065fd7SValentin Schneider  CPU work ^
39265065fd7SValentin Schneider           |     ____________________________________________
39365065fd7SValentin Schneider           |    |
39465065fd7SValentin Schneider           +----+----+----+----+----+----+----+----+----+----+->
39565065fd7SValentin Schneider                w
39665065fd7SValentin Schneider
39765065fd7SValentin SchneiderThis workload should run on CPU0, but if the task either:
39865065fd7SValentin Schneider
39965065fd7SValentin Schneider- was improperly scheduled from the start (inaccurate initial
40065065fd7SValentin Schneider  utilization estimation)
40165065fd7SValentin Schneider- was properly scheduled from the start, but suddenly needs more
40265065fd7SValentin Schneider  processing power
40365065fd7SValentin Schneider
40465065fd7SValentin Schneiderthen it might become CPU-bound, IOW ``task_util(p) > capacity(task_cpu(p))``;
40565065fd7SValentin Schneiderthe CPU capacity scheduling criterion is violated, and there may not be any more
40665065fd7SValentin Schneiderwakeup event to fix this up via wakeup CPU selection.
40765065fd7SValentin Schneider
40865065fd7SValentin SchneiderTasks that are in this situation are dubbed "misfit" tasks, and the mechanism
40965065fd7SValentin Schneiderput in place to handle this shares the same name. Misfit task migration
41065065fd7SValentin Schneiderleverages the CFS load balancer, more specifically the active load balance part
41165065fd7SValentin Schneider(which caters to migrating currently running tasks). When load balance happens,
41265065fd7SValentin Schneidera misfit active load balance will be triggered if a misfit task can be migrated
41365065fd7SValentin Schneiderto a CPU with more capacity than its current one.
41465065fd7SValentin Schneider
41565065fd7SValentin Schneider5.2 RT
41665065fd7SValentin Schneider------
41765065fd7SValentin Schneider
41865065fd7SValentin Schneider5.2.1 Wakeup CPU selection
41965065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~
42065065fd7SValentin Schneider
42165065fd7SValentin SchneiderRT task wakeup CPU selection searches for a CPU that satisfies::
42265065fd7SValentin Schneider
42365065fd7SValentin Schneider  task_uclamp_min(p) <= capacity(task_cpu(cpu))
42465065fd7SValentin Schneider
42565065fd7SValentin Schneiderwhile still following the usual priority constraints. If none of the candidate
42665065fd7SValentin SchneiderCPUs can satisfy this capacity criterion, then strict priority based scheduling
42765065fd7SValentin Schneideris followed and CPU capacities are ignored.
42865065fd7SValentin Schneider
42965065fd7SValentin Schneider5.3 DL
43065065fd7SValentin Schneider------
43165065fd7SValentin Schneider
43265065fd7SValentin Schneider5.3.1 Wakeup CPU selection
43365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~
43465065fd7SValentin Schneider
43565065fd7SValentin SchneiderDL task wakeup CPU selection searches for a CPU that satisfies::
43665065fd7SValentin Schneider
43765065fd7SValentin Schneider  task_bandwidth(p) < capacity(task_cpu(p))
43865065fd7SValentin Schneider
43965065fd7SValentin Schneiderwhile still respecting the usual bandwidth and deadline constraints. If
44065065fd7SValentin Schneidernone of the candidate CPUs can satisfy this capacity criterion, then the
44165065fd7SValentin Schneidertask will remain on its current CPU.
442