165065fd7SValentin Schneider========================= 265065fd7SValentin SchneiderCapacity Aware Scheduling 365065fd7SValentin Schneider========================= 465065fd7SValentin Schneider 565065fd7SValentin Schneider1. CPU Capacity 665065fd7SValentin Schneider=============== 765065fd7SValentin Schneider 865065fd7SValentin Schneider1.1 Introduction 965065fd7SValentin Schneider---------------- 1065065fd7SValentin Schneider 1165065fd7SValentin SchneiderConventional, homogeneous SMP platforms are composed of purely identical 1265065fd7SValentin SchneiderCPUs. Heterogeneous platforms on the other hand are composed of CPUs with 1365065fd7SValentin Schneiderdifferent performance characteristics - on such platforms, not all CPUs can be 1465065fd7SValentin Schneiderconsidered equal. 1565065fd7SValentin Schneider 1665065fd7SValentin SchneiderCPU capacity is a measure of the performance a CPU can reach, normalized against 1765065fd7SValentin Schneiderthe most performant CPU in the system. Heterogeneous systems are also called 1865065fd7SValentin Schneiderasymmetric CPU capacity systems, as they contain CPUs of different capacities. 1965065fd7SValentin Schneider 2065065fd7SValentin SchneiderDisparity in maximum attainable performance (IOW in maximum CPU capacity) stems 2165065fd7SValentin Schneiderfrom two factors: 2265065fd7SValentin Schneider 2365065fd7SValentin Schneider- not all CPUs may have the same microarchitecture (µarch). 2465065fd7SValentin Schneider- with Dynamic Voltage and Frequency Scaling (DVFS), not all CPUs may be 2565065fd7SValentin Schneider physically able to attain the higher Operating Performance Points (OPP). 2665065fd7SValentin Schneider 2765065fd7SValentin SchneiderArm big.LITTLE systems are an example of both. The big CPUs are more 2865065fd7SValentin Schneiderperformance-oriented than the LITTLE ones (more pipeline stages, bigger caches, 2965065fd7SValentin Schneidersmarter predictors, etc), and can usually reach higher OPPs than the LITTLE ones 3065065fd7SValentin Schneidercan. 3165065fd7SValentin Schneider 3265065fd7SValentin SchneiderCPU performance is usually expressed in Millions of Instructions Per Second 3365065fd7SValentin Schneider(MIPS), which can also be expressed as a given amount of instructions attainable 3465065fd7SValentin Schneiderper Hz, leading to:: 3565065fd7SValentin Schneider 3665065fd7SValentin Schneider capacity(cpu) = work_per_hz(cpu) * max_freq(cpu) 3765065fd7SValentin Schneider 3865065fd7SValentin Schneider1.2 Scheduler terms 3965065fd7SValentin Schneider------------------- 4065065fd7SValentin Schneider 4165065fd7SValentin SchneiderTwo different capacity values are used within the scheduler. A CPU's 4265065fd7SValentin Schneider``capacity_orig`` is its maximum attainable capacity, i.e. its maximum 4365065fd7SValentin Schneiderattainable performance level. A CPU's ``capacity`` is its ``capacity_orig`` to 4465065fd7SValentin Schneiderwhich some loss of available performance (e.g. time spent handling IRQs) is 4565065fd7SValentin Schneidersubtracted. 4665065fd7SValentin Schneider 4765065fd7SValentin SchneiderNote that a CPU's ``capacity`` is solely intended to be used by the CFS class, 4865065fd7SValentin Schneiderwhile ``capacity_orig`` is class-agnostic. The rest of this document will use 4965065fd7SValentin Schneiderthe term ``capacity`` interchangeably with ``capacity_orig`` for the sake of 5065065fd7SValentin Schneiderbrevity. 5165065fd7SValentin Schneider 5265065fd7SValentin Schneider1.3 Platform examples 5365065fd7SValentin Schneider--------------------- 5465065fd7SValentin Schneider 5565065fd7SValentin Schneider1.3.1 Identical OPPs 5665065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~ 5765065fd7SValentin Schneider 5865065fd7SValentin SchneiderConsider an hypothetical dual-core asymmetric CPU capacity system where 5965065fd7SValentin Schneider 6065065fd7SValentin Schneider- work_per_hz(CPU0) = W 6165065fd7SValentin Schneider- work_per_hz(CPU1) = W/2 6265065fd7SValentin Schneider- all CPUs are running at the same fixed frequency 6365065fd7SValentin Schneider 6465065fd7SValentin SchneiderBy the above definition of capacity: 6565065fd7SValentin Schneider 6665065fd7SValentin Schneider- capacity(CPU0) = C 6765065fd7SValentin Schneider- capacity(CPU1) = C/2 6865065fd7SValentin Schneider 6965065fd7SValentin SchneiderTo draw the parallel with Arm big.LITTLE, CPU0 would be a big while CPU1 would 7065065fd7SValentin Schneiderbe a LITTLE. 7165065fd7SValentin Schneider 7265065fd7SValentin SchneiderWith a workload that periodically does a fixed amount of work, you will get an 7365065fd7SValentin Schneiderexecution trace like so:: 7465065fd7SValentin Schneider 7565065fd7SValentin Schneider CPU0 work ^ 7665065fd7SValentin Schneider | ____ ____ ____ 7765065fd7SValentin Schneider | | | | | | | 7865065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 7965065fd7SValentin Schneider 8065065fd7SValentin Schneider CPU1 work ^ 8165065fd7SValentin Schneider | _________ _________ ____ 8265065fd7SValentin Schneider | | | | | | 8365065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 8465065fd7SValentin Schneider 8565065fd7SValentin SchneiderCPU0 has the highest capacity in the system (C), and completes a fixed amount of 8665065fd7SValentin Schneiderwork W in T units of time. On the other hand, CPU1 has half the capacity of 8765065fd7SValentin SchneiderCPU0, and thus only completes W/2 in T. 8865065fd7SValentin Schneider 8965065fd7SValentin Schneider1.3.2 Different max OPPs 9065065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~ 9165065fd7SValentin Schneider 9265065fd7SValentin SchneiderUsually, CPUs of different capacity values also have different maximum 9365065fd7SValentin SchneiderOPPs. Consider the same CPUs as above (i.e. same work_per_hz()) with: 9465065fd7SValentin Schneider 9565065fd7SValentin Schneider- max_freq(CPU0) = F 9665065fd7SValentin Schneider- max_freq(CPU1) = 2/3 * F 9765065fd7SValentin Schneider 9865065fd7SValentin SchneiderThis yields: 9965065fd7SValentin Schneider 10065065fd7SValentin Schneider- capacity(CPU0) = C 10165065fd7SValentin Schneider- capacity(CPU1) = C/3 10265065fd7SValentin Schneider 10365065fd7SValentin SchneiderExecuting the same workload as described in 1.3.1, which each CPU running at its 10465065fd7SValentin Schneidermaximum frequency results in:: 10565065fd7SValentin Schneider 10665065fd7SValentin Schneider CPU0 work ^ 10765065fd7SValentin Schneider | ____ ____ ____ 10865065fd7SValentin Schneider | | | | | | | 10965065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 11065065fd7SValentin Schneider 11165065fd7SValentin Schneider workload on CPU1 11265065fd7SValentin Schneider CPU1 work ^ 11365065fd7SValentin Schneider | ______________ ______________ ____ 11465065fd7SValentin Schneider | | | | | | 11565065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 11665065fd7SValentin Schneider 11765065fd7SValentin Schneider1.4 Representation caveat 11865065fd7SValentin Schneider------------------------- 11965065fd7SValentin Schneider 12065065fd7SValentin SchneiderIt should be noted that having a *single* value to represent differences in CPU 12165065fd7SValentin Schneiderperformance is somewhat of a contentious point. The relative performance 12265065fd7SValentin Schneiderdifference between two different µarchs could be X% on integer operations, Y% on 12365065fd7SValentin Schneiderfloating point operations, Z% on branches, and so on. Still, results using this 12465065fd7SValentin Schneidersimple approach have been satisfactory for now. 12565065fd7SValentin Schneider 12665065fd7SValentin Schneider2. Task utilization 12765065fd7SValentin Schneider=================== 12865065fd7SValentin Schneider 12965065fd7SValentin Schneider2.1 Introduction 13065065fd7SValentin Schneider---------------- 13165065fd7SValentin Schneider 13265065fd7SValentin SchneiderCapacity aware scheduling requires an expression of a task's requirements with 13365065fd7SValentin Schneiderregards to CPU capacity. Each scheduler class can express this differently, and 13465065fd7SValentin Schneiderwhile task utilization is specific to CFS, it is convenient to describe it here 13565065fd7SValentin Schneiderin order to introduce more generic concepts. 13665065fd7SValentin Schneider 13765065fd7SValentin SchneiderTask utilization is a percentage meant to represent the throughput requirements 13865065fd7SValentin Schneiderof a task. A simple approximation of it is the task's duty cycle, i.e.:: 13965065fd7SValentin Schneider 14065065fd7SValentin Schneider task_util(p) = duty_cycle(p) 14165065fd7SValentin Schneider 14265065fd7SValentin SchneiderOn an SMP system with fixed frequencies, 100% utilization suggests the task is a 14365065fd7SValentin Schneiderbusy loop. Conversely, 10% utilization hints it is a small periodic task that 14465065fd7SValentin Schneiderspends more time sleeping than executing. Variable CPU frequencies and 14565065fd7SValentin Schneiderasymmetric CPU capacities complexify this somewhat; the following sections will 14665065fd7SValentin Schneiderexpand on these. 14765065fd7SValentin Schneider 14865065fd7SValentin Schneider2.2 Frequency invariance 14965065fd7SValentin Schneider------------------------ 15065065fd7SValentin Schneider 15165065fd7SValentin SchneiderOne issue that needs to be taken into account is that a workload's duty cycle is 15265065fd7SValentin Schneiderdirectly impacted by the current OPP the CPU is running at. Consider running a 15365065fd7SValentin Schneiderperiodic workload at a given frequency F:: 15465065fd7SValentin Schneider 15565065fd7SValentin Schneider CPU work ^ 15665065fd7SValentin Schneider | ____ ____ ____ 15765065fd7SValentin Schneider | | | | | | | 15865065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 15965065fd7SValentin Schneider 16065065fd7SValentin SchneiderThis yields duty_cycle(p) == 25%. 16165065fd7SValentin Schneider 16265065fd7SValentin SchneiderNow, consider running the *same* workload at frequency F/2:: 16365065fd7SValentin Schneider 16465065fd7SValentin Schneider CPU work ^ 16565065fd7SValentin Schneider | _________ _________ ____ 16665065fd7SValentin Schneider | | | | | | 16765065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 16865065fd7SValentin Schneider 16965065fd7SValentin SchneiderThis yields duty_cycle(p) == 50%, despite the task having the exact same 17065065fd7SValentin Schneiderbehaviour (i.e. executing the same amount of work) in both executions. 17165065fd7SValentin Schneider 17265065fd7SValentin SchneiderThe task utilization signal can be made frequency invariant using the following 17365065fd7SValentin Schneiderformula:: 17465065fd7SValentin Schneider 17565065fd7SValentin Schneider task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) / max_frequency(cpu)) 17665065fd7SValentin Schneider 17765065fd7SValentin SchneiderApplying this formula to the two examples above yields a frequency invariant 17865065fd7SValentin Schneidertask utilization of 25%. 17965065fd7SValentin Schneider 18065065fd7SValentin Schneider2.3 CPU invariance 18165065fd7SValentin Schneider------------------ 18265065fd7SValentin Schneider 18365065fd7SValentin SchneiderCPU capacity has a similar effect on task utilization in that running an 18465065fd7SValentin Schneideridentical workload on CPUs of different capacity values will yield different 18565065fd7SValentin Schneiderduty cycles. 18665065fd7SValentin Schneider 18765065fd7SValentin SchneiderConsider the system described in 1.3.2., i.e.:: 18865065fd7SValentin Schneider 18965065fd7SValentin Schneider- capacity(CPU0) = C 19065065fd7SValentin Schneider- capacity(CPU1) = C/3 19165065fd7SValentin Schneider 19265065fd7SValentin SchneiderExecuting a given periodic workload on each CPU at their maximum frequency would 19365065fd7SValentin Schneiderresult in:: 19465065fd7SValentin Schneider 19565065fd7SValentin Schneider CPU0 work ^ 19665065fd7SValentin Schneider | ____ ____ ____ 19765065fd7SValentin Schneider | | | | | | | 19865065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 19965065fd7SValentin Schneider 20065065fd7SValentin Schneider CPU1 work ^ 20165065fd7SValentin Schneider | ______________ ______________ ____ 20265065fd7SValentin Schneider | | | | | | 20365065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 20465065fd7SValentin Schneider 20565065fd7SValentin SchneiderIOW, 20665065fd7SValentin Schneider 20765065fd7SValentin Schneider- duty_cycle(p) == 25% if p runs on CPU0 at its maximum frequency 20865065fd7SValentin Schneider- duty_cycle(p) == 75% if p runs on CPU1 at its maximum frequency 20965065fd7SValentin Schneider 21065065fd7SValentin SchneiderThe task utilization signal can be made CPU invariant using the following 21165065fd7SValentin Schneiderformula:: 21265065fd7SValentin Schneider 21365065fd7SValentin Schneider task_util_cpu_inv(p) = duty_cycle(p) * (capacity(cpu) / max_capacity) 21465065fd7SValentin Schneider 21565065fd7SValentin Schneiderwith ``max_capacity`` being the highest CPU capacity value in the 21665065fd7SValentin Schneidersystem. Applying this formula to the above example above yields a CPU 21765065fd7SValentin Schneiderinvariant task utilization of 25%. 21865065fd7SValentin Schneider 21965065fd7SValentin Schneider2.4 Invariant task utilization 22065065fd7SValentin Schneider------------------------------ 22165065fd7SValentin Schneider 22265065fd7SValentin SchneiderBoth frequency and CPU invariance need to be applied to task utilization in 22365065fd7SValentin Schneiderorder to obtain a truly invariant signal. The pseudo-formula for a task 22465065fd7SValentin Schneiderutilization that is both CPU and frequency invariant is thus, for a given 22565065fd7SValentin Schneidertask p:: 22665065fd7SValentin Schneider 22765065fd7SValentin Schneider curr_frequency(cpu) capacity(cpu) 22865065fd7SValentin Schneider task_util_inv(p) = duty_cycle(p) * ------------------- * ------------- 22965065fd7SValentin Schneider max_frequency(cpu) max_capacity 23065065fd7SValentin Schneider 23165065fd7SValentin SchneiderIn other words, invariant task utilization describes the behaviour of a task as 23265065fd7SValentin Schneiderif it were running on the highest-capacity CPU in the system, running at its 23365065fd7SValentin Schneidermaximum frequency. 23465065fd7SValentin Schneider 23565065fd7SValentin SchneiderAny mention of task utilization in the following sections will imply its 23665065fd7SValentin Schneiderinvariant form. 23765065fd7SValentin Schneider 23865065fd7SValentin Schneider2.5 Utilization estimation 23965065fd7SValentin Schneider-------------------------- 24065065fd7SValentin Schneider 24165065fd7SValentin SchneiderWithout a crystal ball, task behaviour (and thus task utilization) cannot 24265065fd7SValentin Schneideraccurately be predicted the moment a task first becomes runnable. The CFS class 24365065fd7SValentin Schneidermaintains a handful of CPU and task signals based on the Per-Entity Load 24465065fd7SValentin SchneiderTracking (PELT) mechanism, one of those yielding an *average* utilization (as 24565065fd7SValentin Schneideropposed to instantaneous). 24665065fd7SValentin Schneider 24765065fd7SValentin SchneiderThis means that while the capacity aware scheduling criteria will be written 24865065fd7SValentin Schneiderconsidering a "true" task utilization (using a crystal ball), the implementation 24965065fd7SValentin Schneiderwill only ever be able to use an estimator thereof. 25065065fd7SValentin Schneider 25165065fd7SValentin Schneider3. Capacity aware scheduling requirements 25265065fd7SValentin Schneider========================================= 25365065fd7SValentin Schneider 25465065fd7SValentin Schneider3.1 CPU capacity 25565065fd7SValentin Schneider---------------- 25665065fd7SValentin Schneider 25765065fd7SValentin SchneiderLinux cannot currently figure out CPU capacity on its own, this information thus 25865065fd7SValentin Schneiderneeds to be handed to it. Architectures must define arch_scale_cpu_capacity() 25965065fd7SValentin Schneiderfor that purpose. 26065065fd7SValentin Schneider 26165065fd7SValentin SchneiderThe arm and arm64 architectures directly map this to the arch_topology driver 26265065fd7SValentin SchneiderCPU scaling data, which is derived from the capacity-dmips-mhz CPU binding; see 26365065fd7SValentin SchneiderDocumentation/devicetree/bindings/arm/cpu-capacity.txt. 26465065fd7SValentin Schneider 26565065fd7SValentin Schneider3.2 Frequency invariance 26665065fd7SValentin Schneider------------------------ 26765065fd7SValentin Schneider 26865065fd7SValentin SchneiderAs stated in 2.2, capacity-aware scheduling requires a frequency-invariant task 26965065fd7SValentin Schneiderutilization. Architectures must define arch_scale_freq_capacity(cpu) for that 27065065fd7SValentin Schneiderpurpose. 27165065fd7SValentin Schneider 27265065fd7SValentin SchneiderImplementing this function requires figuring out at which frequency each CPU 27365065fd7SValentin Schneiderhave been running at. One way to implement this is to leverage hardware counters 27465065fd7SValentin Schneiderwhose increment rate scale with a CPU's current frequency (APERF/MPERF on x86, 27565065fd7SValentin SchneiderAMU on arm64). Another is to directly hook into cpufreq frequency transitions, 27665065fd7SValentin Schneiderwhen the kernel is aware of the switched-to frequency (also employed by 27765065fd7SValentin Schneiderarm/arm64). 27865065fd7SValentin Schneider 27965065fd7SValentin Schneider4. Scheduler topology 28065065fd7SValentin Schneider===================== 28165065fd7SValentin Schneider 28265065fd7SValentin SchneiderDuring the construction of the sched domains, the scheduler will figure out 28365065fd7SValentin Schneiderwhether the system exhibits asymmetric CPU capacities. Should that be the 28465065fd7SValentin Schneidercase: 28565065fd7SValentin Schneider 28665065fd7SValentin Schneider- The sched_asym_cpucapacity static key will be enabled. 287*adf3c31eSBeata Michalska- The SD_ASYM_CPUCAPACITY_FULL flag will be set at the lowest sched_domain 288*adf3c31eSBeata Michalska level that spans all unique CPU capacity values. 289*adf3c31eSBeata Michalska- The SD_ASYM_CPUCAPACITY flag will be set for any sched_domain that spans 290*adf3c31eSBeata Michalska CPUs with any range of asymmetry. 29165065fd7SValentin Schneider 29265065fd7SValentin SchneiderThe sched_asym_cpucapacity static key is intended to guard sections of code that 29365065fd7SValentin Schneidercater to asymmetric CPU capacity systems. Do note however that said key is 29465065fd7SValentin Schneider*system-wide*. Imagine the following setup using cpusets:: 29565065fd7SValentin Schneider 29665065fd7SValentin Schneider capacity C/2 C 29765065fd7SValentin Schneider ________ ________ 29865065fd7SValentin Schneider / \ / \ 29965065fd7SValentin Schneider CPUs 0 1 2 3 4 5 6 7 30065065fd7SValentin Schneider \__/ \______________/ 30165065fd7SValentin Schneider cpusets cs0 cs1 30265065fd7SValentin Schneider 30365065fd7SValentin SchneiderWhich could be created via: 30465065fd7SValentin Schneider 30565065fd7SValentin Schneider.. code-block:: sh 30665065fd7SValentin Schneider 30765065fd7SValentin Schneider mkdir /sys/fs/cgroup/cpuset/cs0 30865065fd7SValentin Schneider echo 0-1 > /sys/fs/cgroup/cpuset/cs0/cpuset.cpus 30965065fd7SValentin Schneider echo 0 > /sys/fs/cgroup/cpuset/cs0/cpuset.mems 31065065fd7SValentin Schneider 31165065fd7SValentin Schneider mkdir /sys/fs/cgroup/cpuset/cs1 31265065fd7SValentin Schneider echo 2-7 > /sys/fs/cgroup/cpuset/cs1/cpuset.cpus 31365065fd7SValentin Schneider echo 0 > /sys/fs/cgroup/cpuset/cs1/cpuset.mems 31465065fd7SValentin Schneider 31565065fd7SValentin Schneider echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance 31665065fd7SValentin Schneider 31765065fd7SValentin SchneiderSince there *is* CPU capacity asymmetry in the system, the 31865065fd7SValentin Schneidersched_asym_cpucapacity static key will be enabled. However, the sched_domain 31965065fd7SValentin Schneiderhierarchy of CPUs 0-1 spans a single capacity value: SD_ASYM_CPUCAPACITY isn't 32065065fd7SValentin Schneiderset in that hierarchy, it describes an SMP island and should be treated as such. 32165065fd7SValentin Schneider 32265065fd7SValentin SchneiderTherefore, the 'canonical' pattern for protecting codepaths that cater to 32365065fd7SValentin Schneiderasymmetric CPU capacities is to: 32465065fd7SValentin Schneider 32565065fd7SValentin Schneider- Check the sched_asym_cpucapacity static key 32665065fd7SValentin Schneider- If it is enabled, then also check for the presence of SD_ASYM_CPUCAPACITY in 32765065fd7SValentin Schneider the sched_domain hierarchy (if relevant, i.e. the codepath targets a specific 32865065fd7SValentin Schneider CPU or group thereof) 32965065fd7SValentin Schneider 33065065fd7SValentin Schneider5. Capacity aware scheduling implementation 33165065fd7SValentin Schneider=========================================== 33265065fd7SValentin Schneider 33365065fd7SValentin Schneider5.1 CFS 33465065fd7SValentin Schneider------- 33565065fd7SValentin Schneider 33665065fd7SValentin Schneider5.1.1 Capacity fitness 33765065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~ 33865065fd7SValentin Schneider 33965065fd7SValentin SchneiderThe main capacity scheduling criterion of CFS is:: 34065065fd7SValentin Schneider 34165065fd7SValentin Schneider task_util(p) < capacity(task_cpu(p)) 34265065fd7SValentin Schneider 34365065fd7SValentin SchneiderThis is commonly called the capacity fitness criterion, i.e. CFS must ensure a 34465065fd7SValentin Schneidertask "fits" on its CPU. If it is violated, the task will need to achieve more 34565065fd7SValentin Schneiderwork than what its CPU can provide: it will be CPU-bound. 34665065fd7SValentin Schneider 34765065fd7SValentin SchneiderFurthermore, uclamp lets userspace specify a minimum and a maximum utilization 34865065fd7SValentin Schneidervalue for a task, either via sched_setattr() or via the cgroup interface (see 34965065fd7SValentin SchneiderDocumentation/admin-guide/cgroup-v2.rst). As its name imply, this can be used to 35065065fd7SValentin Schneiderclamp task_util() in the previous criterion. 35165065fd7SValentin Schneider 35265065fd7SValentin Schneider5.1.2 Wakeup CPU selection 35365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~ 35465065fd7SValentin Schneider 35565065fd7SValentin SchneiderCFS task wakeup CPU selection follows the capacity fitness criterion described 35665065fd7SValentin Schneiderabove. On top of that, uclamp is used to clamp the task utilization values, 35765065fd7SValentin Schneiderwhich lets userspace have more leverage over the CPU selection of CFS 35865065fd7SValentin Schneidertasks. IOW, CFS wakeup CPU selection searches for a CPU that satisfies:: 35965065fd7SValentin Schneider 36065065fd7SValentin Schneider clamp(task_util(p), task_uclamp_min(p), task_uclamp_max(p)) < capacity(cpu) 36165065fd7SValentin Schneider 36265065fd7SValentin SchneiderBy using uclamp, userspace can e.g. allow a busy loop (100% utilization) to run 36365065fd7SValentin Schneideron any CPU by giving it a low uclamp.max value. Conversely, it can force a small 36465065fd7SValentin Schneiderperiodic task (e.g. 10% utilization) to run on the highest-performance CPUs by 36565065fd7SValentin Schneidergiving it a high uclamp.min value. 36665065fd7SValentin Schneider 36765065fd7SValentin Schneider.. note:: 36865065fd7SValentin Schneider 36965065fd7SValentin Schneider Wakeup CPU selection in CFS can be eclipsed by Energy Aware Scheduling 370e4e29e78SMauro Carvalho Chehab (EAS), which is described in Documentation/scheduler/sched-energy.rst. 37165065fd7SValentin Schneider 37265065fd7SValentin Schneider5.1.3 Load balancing 37365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~ 37465065fd7SValentin Schneider 37565065fd7SValentin SchneiderA pathological case in the wakeup CPU selection occurs when a task rarely 37665065fd7SValentin Schneidersleeps, if at all - it thus rarely wakes up, if at all. Consider:: 37765065fd7SValentin Schneider 37865065fd7SValentin Schneider w == wakeup event 37965065fd7SValentin Schneider 38065065fd7SValentin Schneider capacity(CPU0) = C 38165065fd7SValentin Schneider capacity(CPU1) = C / 3 38265065fd7SValentin Schneider 38365065fd7SValentin Schneider workload on CPU0 38465065fd7SValentin Schneider CPU work ^ 38565065fd7SValentin Schneider | _________ _________ ____ 38665065fd7SValentin Schneider | | | | | | 38765065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> time 38865065fd7SValentin Schneider w w w 38965065fd7SValentin Schneider 39065065fd7SValentin Schneider workload on CPU1 39165065fd7SValentin Schneider CPU work ^ 39265065fd7SValentin Schneider | ____________________________________________ 39365065fd7SValentin Schneider | | 39465065fd7SValentin Schneider +----+----+----+----+----+----+----+----+----+----+-> 39565065fd7SValentin Schneider w 39665065fd7SValentin Schneider 39765065fd7SValentin SchneiderThis workload should run on CPU0, but if the task either: 39865065fd7SValentin Schneider 39965065fd7SValentin Schneider- was improperly scheduled from the start (inaccurate initial 40065065fd7SValentin Schneider utilization estimation) 40165065fd7SValentin Schneider- was properly scheduled from the start, but suddenly needs more 40265065fd7SValentin Schneider processing power 40365065fd7SValentin Schneider 40465065fd7SValentin Schneiderthen it might become CPU-bound, IOW ``task_util(p) > capacity(task_cpu(p))``; 40565065fd7SValentin Schneiderthe CPU capacity scheduling criterion is violated, and there may not be any more 40665065fd7SValentin Schneiderwakeup event to fix this up via wakeup CPU selection. 40765065fd7SValentin Schneider 40865065fd7SValentin SchneiderTasks that are in this situation are dubbed "misfit" tasks, and the mechanism 40965065fd7SValentin Schneiderput in place to handle this shares the same name. Misfit task migration 41065065fd7SValentin Schneiderleverages the CFS load balancer, more specifically the active load balance part 41165065fd7SValentin Schneider(which caters to migrating currently running tasks). When load balance happens, 41265065fd7SValentin Schneidera misfit active load balance will be triggered if a misfit task can be migrated 41365065fd7SValentin Schneiderto a CPU with more capacity than its current one. 41465065fd7SValentin Schneider 41565065fd7SValentin Schneider5.2 RT 41665065fd7SValentin Schneider------ 41765065fd7SValentin Schneider 41865065fd7SValentin Schneider5.2.1 Wakeup CPU selection 41965065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~ 42065065fd7SValentin Schneider 42165065fd7SValentin SchneiderRT task wakeup CPU selection searches for a CPU that satisfies:: 42265065fd7SValentin Schneider 42365065fd7SValentin Schneider task_uclamp_min(p) <= capacity(task_cpu(cpu)) 42465065fd7SValentin Schneider 42565065fd7SValentin Schneiderwhile still following the usual priority constraints. If none of the candidate 42665065fd7SValentin SchneiderCPUs can satisfy this capacity criterion, then strict priority based scheduling 42765065fd7SValentin Schneideris followed and CPU capacities are ignored. 42865065fd7SValentin Schneider 42965065fd7SValentin Schneider5.3 DL 43065065fd7SValentin Schneider------ 43165065fd7SValentin Schneider 43265065fd7SValentin Schneider5.3.1 Wakeup CPU selection 43365065fd7SValentin Schneider~~~~~~~~~~~~~~~~~~~~~~~~~~ 43465065fd7SValentin Schneider 43565065fd7SValentin SchneiderDL task wakeup CPU selection searches for a CPU that satisfies:: 43665065fd7SValentin Schneider 43765065fd7SValentin Schneider task_bandwidth(p) < capacity(task_cpu(p)) 43865065fd7SValentin Schneider 43965065fd7SValentin Schneiderwhile still respecting the usual bandwidth and deadline constraints. If 44065065fd7SValentin Schneidernone of the candidate CPUs can satisfy this capacity criterion, then the 44165065fd7SValentin Schneidertask will remain on its current CPU. 442