1171e9af1STang Yizhou.. SPDX-License-Identifier: GPL-2.0 2171e9af1STang Yizhou.. include:: ../disclaimer-zh_CN.rst 3171e9af1STang Yizhou 4171e9af1STang Yizhou:Original: Documentation/scheduler/sched-capacity.rst 5171e9af1STang Yizhou 6171e9af1STang Yizhou:翻译: 7171e9af1STang Yizhou 8171e9af1STang Yizhou 唐艺舟 Tang Yizhou <tangyeechou@gmail.com> 9171e9af1STang Yizhou 10171e9af1STang Yizhou:校译: 11171e9af1STang Yizhou 12171e9af1STang Yizhou 时奎亮 Alex Shi <alexs@kernel.org> 13171e9af1STang Yizhou 14171e9af1STang Yizhou============= 15171e9af1STang Yizhou算力感知调度 16171e9af1STang Yizhou============= 17171e9af1STang Yizhou 18171e9af1STang Yizhou1. CPU算力 19171e9af1STang Yizhou========== 20171e9af1STang Yizhou 21171e9af1STang Yizhou1.1 简介 22171e9af1STang Yizhou-------- 23171e9af1STang Yizhou 24171e9af1STang Yizhou一般来说,同构的SMP平台由完全相同的CPU构成。异构的平台则由性能特征不同的CPU构成,在这样的 25171e9af1STang Yizhou平台中,CPU不能被认为是相同的。 26171e9af1STang Yizhou 27171e9af1STang Yizhou我们引入CPU算力(capacity)的概念来测量每个CPU能达到的性能,它的值相对系统中性能最强的CPU 28171e9af1STang Yizhou做过归一化处理。异构系统也被称为非对称CPU算力系统,因为它们由不同算力的CPU组成。 29171e9af1STang Yizhou 30171e9af1STang Yizhou最大可达性能(换言之,最大CPU算力)的差异有两个主要来源: 31171e9af1STang Yizhou 32171e9af1STang Yizhou- 不是所有CPU的微架构都相同。 33171e9af1STang Yizhou- 在动态电压频率升降(Dynamic Voltage and Frequency Scaling,DVFS)框架中,不是所有的CPU都 34171e9af1STang Yizhou 能达到一样高的操作性能值(Operating Performance Points,OPP。译注,也就是“频率-电压”对)。 35171e9af1STang Yizhou 36171e9af1STang YizhouArm大小核(big.LITTLE)系统是同时具有两种差异的一个例子。相较小核,大核面向性能(拥有更多的 37171e9af1STang Yizhou流水线层级,更大的缓存,更智能的分支预测器等),通常可以达到更高的操作性能值。 38171e9af1STang Yizhou 39171e9af1STang YizhouCPU性能通常由每秒百万指令(Millions of Instructions Per Second,MIPS)表示,也可表示为 40171e9af1STang Yizhouper Hz能执行的指令数,故:: 41171e9af1STang Yizhou 42171e9af1STang Yizhou capacity(cpu) = work_per_hz(cpu) * max_freq(cpu) 43171e9af1STang Yizhou 44171e9af1STang Yizhou1.2 调度器术语 45171e9af1STang Yizhou-------------- 46171e9af1STang Yizhou 47171e9af1STang Yizhou调度器使用了两种不同的算力值。CPU的 ``capacity_orig`` 是它的最大可达算力,即最大可达性能等级。 48171e9af1STang YizhouCPU的 ``capacity`` 是 ``capacity_orig`` 扣除了一些性能损失(比如处理中断的耗时)的值。 49171e9af1STang Yizhou 50171e9af1STang Yizhou注意CPU的 ``capacity`` 仅仅被设计用于CFS调度类,而 ``capacity_orig`` 是不感知调度类的。为 51171e9af1STang Yizhou简洁起见,本文档的剩余部分将不加区分的使用术语 ``capacity`` 和 ``capacity_orig`` 。 52171e9af1STang Yizhou 53171e9af1STang Yizhou1.3 平台示例 54171e9af1STang Yizhou------------ 55171e9af1STang Yizhou 56171e9af1STang Yizhou1.3.1 操作性能值相同 57171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~ 58171e9af1STang Yizhou 59171e9af1STang Yizhou考虑一个假想的双核非对称CPU算力系统,其中 60171e9af1STang Yizhou 61171e9af1STang Yizhou- work_per_hz(CPU0) = W 62171e9af1STang Yizhou- work_per_hz(CPU1) = W/2 63171e9af1STang Yizhou- 所有CPU以相同的固定频率运行 64171e9af1STang Yizhou 65171e9af1STang Yizhou根据上文对算力的定义: 66171e9af1STang Yizhou 67171e9af1STang Yizhou- capacity(CPU0) = C 68171e9af1STang Yizhou- capacity(CPU1) = C/2 69171e9af1STang Yizhou 70171e9af1STang Yizhou若这是Arm大小核系统,那么CPU0是大核,而CPU1是小核。 71171e9af1STang Yizhou 72171e9af1STang Yizhou考虑一种周期性产生固定工作量的工作负载,你将会得到类似下图的执行轨迹:: 73171e9af1STang Yizhou 74171e9af1STang Yizhou CPU0 work ^ 75171e9af1STang Yizhou | ____ ____ ____ 76171e9af1STang Yizhou | | | | | | | 77171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 78171e9af1STang Yizhou 79171e9af1STang Yizhou CPU1 work ^ 80171e9af1STang Yizhou | _________ _________ ____ 81171e9af1STang Yizhou | | | | | | 82171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 83171e9af1STang Yizhou 84171e9af1STang YizhouCPU0在系统中具有最高算力(C),它使用T个单位时间完成固定工作量W。另一方面,CPU1只有CPU0一半 85171e9af1STang Yizhou算力,因此在T个单位时间内仅完成工作量W/2。 86171e9af1STang Yizhou 87171e9af1STang Yizhou1.3.2 最大操作性能值不同 88171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~~~~~ 89171e9af1STang Yizhou 90171e9af1STang Yizhou具有不同算力值的CPU,通常来说最大操作性能值也不同。考虑上一小节提到的CPU(也就是说, 91171e9af1STang Yizhouwork_per_hz()相同): 92171e9af1STang Yizhou 93171e9af1STang Yizhou- max_freq(CPU0) = F 94171e9af1STang Yizhou- max_freq(CPU1) = 2/3 * F 95171e9af1STang Yizhou 96171e9af1STang Yizhou这将推出: 97171e9af1STang Yizhou 98171e9af1STang Yizhou- capacity(CPU0) = C 99171e9af1STang Yizhou- capacity(CPU1) = C/3 100171e9af1STang Yizhou 101171e9af1STang Yizhou执行1.3.1节描述的工作负载,每个CPU按最大频率运行,结果为:: 102171e9af1STang Yizhou 103171e9af1STang Yizhou CPU0 work ^ 104171e9af1STang Yizhou | ____ ____ ____ 105171e9af1STang Yizhou | | | | | | | 106171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 107171e9af1STang Yizhou 108171e9af1STang Yizhou workload on CPU1 109171e9af1STang Yizhou CPU1 work ^ 110171e9af1STang Yizhou | ______________ ______________ ____ 111171e9af1STang Yizhou | | | | | | 112171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 113171e9af1STang Yizhou 114171e9af1STang Yizhou1.4 关于计算方式的注意事项 115171e9af1STang Yizhou-------------------------- 116171e9af1STang Yizhou 117171e9af1STang Yizhou需要注意的是,使用单一值来表示CPU性能的差异是有些争议的。两个不同的微架构的相对性能差异应该 118171e9af1STang Yizhou描述为:X%整数运算差异,Y%浮点数运算差异,Z%分支跳转差异,等等。尽管如此,使用简单计算方式 119171e9af1STang Yizhou的结果目前还是令人满意的。 120171e9af1STang Yizhou 121171e9af1STang Yizhou2. 任务使用率 122171e9af1STang Yizhou============= 123171e9af1STang Yizhou 124171e9af1STang Yizhou2.1 简介 125171e9af1STang Yizhou-------- 126171e9af1STang Yizhou 127171e9af1STang Yizhou算力感知调度要求描述任务需求,描述方式要和CPU算力相关。每个调度类可以用不同的方式描述它。 128171e9af1STang Yizhou任务使用率是CFS独有的描述方式,不过在这里介绍它有助于引入更多一般性的概念。 129171e9af1STang Yizhou 130171e9af1STang Yizhou任务使用率是一种用百分比来描述任务吞吐率需求的方式。一个简单的近似是任务的占空比,也就是说:: 131171e9af1STang Yizhou 132171e9af1STang Yizhou task_util(p) = duty_cycle(p) 133171e9af1STang Yizhou 134171e9af1STang Yizhou在频率固定的SMP系统中,100%的利用率意味着任务是忙等待循环。反之,10%的利用率暗示这是一个 135171e9af1STang Yizhou小周期任务,它在睡眠上花费的时间比执行更多。 136171e9af1STang Yizhou 137171e9af1STang Yizhou2.2 频率不变性 138171e9af1STang Yizhou-------------- 139171e9af1STang Yizhou 140171e9af1STang Yizhou一个需要考虑的议题是,工作负载的占空比受CPU正在运行的操作性能值直接影响。考虑以给定的频率F 141171e9af1STang Yizhou执行周期性工作负载:: 142171e9af1STang Yizhou 143171e9af1STang Yizhou CPU work ^ 144171e9af1STang Yizhou | ____ ____ ____ 145171e9af1STang Yizhou | | | | | | | 146171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 147171e9af1STang Yizhou 148171e9af1STang Yizhou可以算出 duty_cycle(p) == 25%。 149171e9af1STang Yizhou 150171e9af1STang Yizhou现在,考虑以给定频率F/2执行 *同一个* 工作负载:: 151171e9af1STang Yizhou 152171e9af1STang Yizhou CPU work ^ 153171e9af1STang Yizhou | _________ _________ ____ 154171e9af1STang Yizhou | | | | | | 155171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 156171e9af1STang Yizhou 157171e9af1STang Yizhou可以算出 duty_cycle(p) == 50%,尽管两次执行中,任务的行为完全一致(也就是说,执行的工作量 158171e9af1STang Yizhou相同)。 159171e9af1STang Yizhou 160171e9af1STang Yizhou任务利用率信号可按下面公式处理成频率不变的(译注:这里的术语用到了信号与系统的概念):: 161171e9af1STang Yizhou 162171e9af1STang Yizhou task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) / max_frequency(cpu)) 163171e9af1STang Yizhou 164171e9af1STang Yizhou对上面两个例子运用该公式,可以算出频率不变的任务利用率均为25%。 165171e9af1STang Yizhou 166171e9af1STang Yizhou2.3 CPU不变性 167171e9af1STang Yizhou------------- 168171e9af1STang Yizhou 169171e9af1STang YizhouCPU算力与任务利用率具有类型的效应,在算力不同的CPU上执行完全相同的工作负载,将算出不同的 170171e9af1STang Yizhou占空比。 171171e9af1STang Yizhou 172171e9af1STang Yizhou考虑1.3.2节提到的系统,也就是说:: 173171e9af1STang Yizhou 174171e9af1STang Yizhou- capacity(CPU0) = C 175171e9af1STang Yizhou- capacity(CPU1) = C/3 176171e9af1STang Yizhou 177171e9af1STang Yizhou每个CPU按最大频率执行指定周期性工作负载,结果为:: 178171e9af1STang Yizhou 179171e9af1STang Yizhou CPU0 work ^ 180171e9af1STang Yizhou | ____ ____ ____ 181171e9af1STang Yizhou | | | | | | | 182171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 183171e9af1STang Yizhou 184171e9af1STang Yizhou CPU1 work ^ 185171e9af1STang Yizhou | ______________ ______________ ____ 186171e9af1STang Yizhou | | | | | | 187171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 188171e9af1STang Yizhou 189171e9af1STang Yizhou也就是说, 190171e9af1STang Yizhou 191171e9af1STang Yizhou- duty_cycle(p) == 25%,如果任务p在CPU0上按最大频率运行。 192171e9af1STang Yizhou- duty_cycle(p) == 75%,如果任务p在CPU1上按最大频率运行。 193171e9af1STang Yizhou 194171e9af1STang Yizhou任务利用率信号可按下面公式处理成CPU算力不变的:: 195171e9af1STang Yizhou 196171e9af1STang Yizhou task_util_cpu_inv(p) = duty_cycle(p) * (capacity(cpu) / max_capacity) 197171e9af1STang Yizhou 198171e9af1STang Yizhou其中 ``max_capacity`` 是系统中最高的CPU算力。对上面的例子运用该公式,可以算出CPU算力不变 199171e9af1STang Yizhou的任务利用率均为25%。 200171e9af1STang Yizhou 201171e9af1STang Yizhou2.4 任务利用率不变量 202171e9af1STang Yizhou-------------------- 203171e9af1STang Yizhou 204171e9af1STang Yizhou频率和CPU算力不变性都需要被应用到任务利用率的计算中,以便求出真正的不变信号。 205171e9af1STang Yizhou任务利用率的伪计算公式是同时具备CPU和频率不变性的,也就是说,对于指定任务p:: 206171e9af1STang Yizhou 207171e9af1STang Yizhou curr_frequency(cpu) capacity(cpu) 208171e9af1STang Yizhou task_util_inv(p) = duty_cycle(p) * ------------------- * ------------- 209171e9af1STang Yizhou max_frequency(cpu) max_capacity 210171e9af1STang Yizhou 211171e9af1STang Yizhou也就是说,任务利用率不变量假定任务在系统中最高算力CPU上以最高频率运行,以此描述任务的行为。 212171e9af1STang Yizhou 213171e9af1STang Yizhou在接下来的章节中提到的任何任务利用率,均是不变量的形式。 214171e9af1STang Yizhou 215171e9af1STang Yizhou2.5 利用率估算 216171e9af1STang Yizhou-------------- 217171e9af1STang Yizhou 218171e9af1STang Yizhou由于预测未来的水晶球不存在,当任务第一次变成可运行时,任务的行为和任务利用率均不能被准确预测。 219171e9af1STang YizhouCFS调度类基于实体负载跟踪机制(Per-Entity Load Tracking, PELT)维护了少量CPU和任务信号, 220171e9af1STang Yizhou其中之一可以算出平均利用率(与瞬时相反)。 221171e9af1STang Yizhou 222171e9af1STang Yizhou这意味着,尽管运用“真实的”任务利用率(凭借水晶球)写出算力感知调度的准则,但是它的实现将只能 223171e9af1STang Yizhou用任务利用率的估算值。 224171e9af1STang Yizhou 225171e9af1STang Yizhou3. 算力感知调度的需求 226171e9af1STang Yizhou===================== 227171e9af1STang Yizhou 228171e9af1STang Yizhou3.1 CPU算力 229171e9af1STang Yizhou----------- 230171e9af1STang Yizhou 231171e9af1STang Yizhou当前,Linux无法凭自身算出CPU算力,因此必须要有把这个信息传递给Linux的方式。每个架构必须为此 232171e9af1STang Yizhou定义arch_scale_cpu_capacity()函数。 233171e9af1STang Yizhou 234*5d89176aSSong Shuaiarm、arm64和RISC-V架构直接把这个信息映射到arch_topology驱动的CPU scaling数据中(译注:参考 235171e9af1STang Yizhouarch_topology.h的percpu变量cpu_scale),它是从capacity-dmips-mhz CPU binding中衍生计算 2367d207831SConor Dooley出来的。参见Documentation/devicetree/bindings/cpu/cpu-capacity.txt。 237171e9af1STang Yizhou 238171e9af1STang Yizhou3.2 频率不变性 239171e9af1STang Yizhou-------------- 240171e9af1STang Yizhou 241171e9af1STang Yizhou如2.2节所述,算力感知调度需要频率不变的任务利用率。每个架构必须为此定义 242171e9af1STang Yizhouarch_scale_freq_capacity(cpu)函数。 243171e9af1STang Yizhou 244171e9af1STang Yizhou实现该函数要求计算出每个CPU当前以什么频率在运行。实现它的一种方式是利用硬件计数器(x86的 245171e9af1STang YizhouAPERF/MPERF,arm64的AMU),它能按CPU当前频率动态可扩展地升降递增计数器的速率。另一种方式是 246171e9af1STang Yizhou在cpufreq频率变化时直接使用钩子函数,内核此时感知到将要被切换的频率(也被arm/arm64实现了)。 247171e9af1STang Yizhou 248171e9af1STang Yizhou4. 调度器拓扑结构 249171e9af1STang Yizhou================= 250171e9af1STang Yizhou 251171e9af1STang Yizhou在构建调度域时,调度器将会发现系统是否表现为非对称CPU算力。如果是,那么: 252171e9af1STang Yizhou 253171e9af1STang Yizhou- sched_asym_cpucapacity静态键(static key)将使能。 254171e9af1STang Yizhou- SD_ASYM_CPUCAPACITY_FULL标志位将在尽量最低调度域层级中被设置,同时要满足条件:调度域恰好 255171e9af1STang Yizhou 完整包含某个CPU算力值的全部CPU。 256171e9af1STang Yizhou- SD_ASYM_CPUCAPACITY标志将在所有包含非对称CPU的调度域中被设置。 257171e9af1STang Yizhou 258171e9af1STang Yizhousched_asym_cpucapacity静态键的设计意图是,保护为非对称CPU算力系统所准备的代码。不过要注意的 259171e9af1STang Yizhou是,这个键是系统范围可见的。想象下面使用了cpuset的步骤:: 260171e9af1STang Yizhou 261171e9af1STang Yizhou capacity C/2 C 262171e9af1STang Yizhou ________ ________ 263171e9af1STang Yizhou / \ / \ 264171e9af1STang Yizhou CPUs 0 1 2 3 4 5 6 7 265171e9af1STang Yizhou \__/ \______________/ 266171e9af1STang Yizhou cpusets cs0 cs1 267171e9af1STang Yizhou 268171e9af1STang Yizhou可以通过下面的方式创建: 269171e9af1STang Yizhou 270171e9af1STang Yizhou.. code-block:: sh 271171e9af1STang Yizhou 272171e9af1STang Yizhou mkdir /sys/fs/cgroup/cpuset/cs0 273171e9af1STang Yizhou echo 0-1 > /sys/fs/cgroup/cpuset/cs0/cpuset.cpus 274171e9af1STang Yizhou echo 0 > /sys/fs/cgroup/cpuset/cs0/cpuset.mems 275171e9af1STang Yizhou 276171e9af1STang Yizhou mkdir /sys/fs/cgroup/cpuset/cs1 277171e9af1STang Yizhou echo 2-7 > /sys/fs/cgroup/cpuset/cs1/cpuset.cpus 278171e9af1STang Yizhou echo 0 > /sys/fs/cgroup/cpuset/cs1/cpuset.mems 279171e9af1STang Yizhou 280171e9af1STang Yizhou echo 0 > /sys/fs/cgroup/cpuset/cpuset.sched_load_balance 281171e9af1STang Yizhou 282171e9af1STang Yizhou由于“这是”非对称CPU算力系统,sched_asym_cpucapacity静态键将使能。然而,CPU 0--1对应的 283171e9af1STang Yizhou调度域层级,算力值仅有一个,该层级中SD_ASYM_CPUCAPACITY未被设置,它描述的是一个SMP区域,也 284171e9af1STang Yizhou应该被以此处理。 285171e9af1STang Yizhou 286171e9af1STang Yizhou因此,“典型的”保护非对称CPU算力代码路径的代码模式是: 287171e9af1STang Yizhou 288171e9af1STang Yizhou- 检查sched_asym_cpucapacity静态键 289171e9af1STang Yizhou- 如果它被使能,接着检查调度域层级中SD_ASYM_CPUCAPACITY标志位是否出现 290171e9af1STang Yizhou 291171e9af1STang Yizhou5. 算力感知调度的实现 292171e9af1STang Yizhou===================== 293171e9af1STang Yizhou 294171e9af1STang Yizhou5.1 CFS 295171e9af1STang Yizhou------- 296171e9af1STang Yizhou 297171e9af1STang Yizhou5.1.1 算力适应性(fitness) 298171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~~~~~~~~ 299171e9af1STang Yizhou 300171e9af1STang YizhouCFS最主要的算力调度准则是:: 301171e9af1STang Yizhou 302171e9af1STang Yizhou task_util(p) < capacity(task_cpu(p)) 303171e9af1STang Yizhou 304171e9af1STang Yizhou它通常被称为算力适应性准则。也就是说,CFS必须保证任务“适合”在某个CPU上运行。如果准则被违反, 305171e9af1STang Yizhou任务将要更长地消耗该CPU,任务是CPU受限的(CPU-bound)。 306171e9af1STang Yizhou 307171e9af1STang Yizhou此外,uclamp允许用户空间指定任务的最小和最大利用率,要么以sched_setattr()的方式,要么以 308171e9af1STang Yizhoucgroup接口的方式(参阅Documentation/admin-guide/cgroup-v2.rst)。如其名字所暗示,uclamp 309171e9af1STang Yizhou可以被用在前一条准则中限制task_util()。 310171e9af1STang Yizhou 311171e9af1STang Yizhou5.1.2 被唤醒任务的CPU选择 312171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~~~~~~ 313171e9af1STang Yizhou 314171e9af1STang YizhouCFS任务唤醒的CPU选择,遵循上面描述的算力适应性准则。在此之上,uclamp被用来限制任务利用率, 315171e9af1STang Yizhou这令用户空间对CFS任务的CPU选择有更多的控制。也就是说,CFS被唤醒任务的CPU选择,搜索满足以下 316171e9af1STang Yizhou条件的CPU:: 317171e9af1STang Yizhou 318171e9af1STang Yizhou clamp(task_util(p), task_uclamp_min(p), task_uclamp_max(p)) < capacity(cpu) 319171e9af1STang Yizhou 320171e9af1STang Yizhou通过使用uclamp,举例来说,用户空间可以允许忙等待循环(100%使用率)在任意CPU上运行,只要给 321171e9af1STang Yizhou它设置低的uclamp.max值。相反,uclamp能强制一个小的周期性任务(比如,10%利用率)在最高性能 322171e9af1STang Yizhou的CPU上运行,只要给它设置高的uclamp.min值。 323171e9af1STang Yizhou 324171e9af1STang Yizhou.. note:: 325171e9af1STang Yizhou 326171e9af1STang Yizhou CFS的被唤醒的任务的CPU选择,可被能耗感知调度(Energy Aware Scheduling,EAS)覆盖,在 327171e9af1STang Yizhou Documentation/scheduler/sched-energy.rst中描述。 328171e9af1STang Yizhou 329171e9af1STang Yizhou5.1.3 负载均衡 330171e9af1STang Yizhou~~~~~~~~~~~~~~ 331171e9af1STang Yizhou 332171e9af1STang Yizhou被唤醒任务的CPU选择的一个病理性的例子是,任务几乎不睡眠,那么也几乎不发生唤醒。考虑:: 333171e9af1STang Yizhou 334171e9af1STang Yizhou w == wakeup event 335171e9af1STang Yizhou 336171e9af1STang Yizhou capacity(CPU0) = C 337171e9af1STang Yizhou capacity(CPU1) = C / 3 338171e9af1STang Yizhou 339171e9af1STang Yizhou workload on CPU0 340171e9af1STang Yizhou CPU work ^ 341171e9af1STang Yizhou | _________ _________ ____ 342171e9af1STang Yizhou | | | | | | 343171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> time 344171e9af1STang Yizhou w w w 345171e9af1STang Yizhou 346171e9af1STang Yizhou workload on CPU1 347171e9af1STang Yizhou CPU work ^ 348171e9af1STang Yizhou | ____________________________________________ 349171e9af1STang Yizhou | | 350171e9af1STang Yizhou +----+----+----+----+----+----+----+----+----+----+-> 351171e9af1STang Yizhou w 352171e9af1STang Yizhou 353171e9af1STang Yizhou该工作负载应该在CPU0上运行,不过如果任务满足以下条件之一: 354171e9af1STang Yizhou 355171e9af1STang Yizhou- 一开始发生不合适的调度(不准确的初始利用率估计) 356171e9af1STang Yizhou- 一开始调度正确,但突然需要更多的处理器功率 357171e9af1STang Yizhou 358171e9af1STang Yizhou则任务可能变为CPU受限的,也就是说 ``task_util(p) > capacity(task_cpu(p))`` ;CPU算力 359171e9af1STang Yizhou调度准则被违反,将不会有任何唤醒事件来修复这个错误的CPU选择。 360171e9af1STang Yizhou 361171e9af1STang Yizhou这种场景下的任务被称为“不合适的”(misfit)任务,处理这个场景的机制同样也以此命名。Misfit 362171e9af1STang Yizhou任务迁移借助CFS负载均衡器,更明确的说,是主动负载均衡的部分(用来迁移正在运行的任务)。 363171e9af1STang Yizhou当发生负载均衡时,如果一个misfit任务可以被迁移到一个相较当前运行的CPU具有更高算力的CPU上, 364171e9af1STang Yizhou那么misfit任务的主动负载均衡将被触发。 365171e9af1STang Yizhou 366171e9af1STang Yizhou5.2 实时调度 367171e9af1STang Yizhou------------ 368171e9af1STang Yizhou 369171e9af1STang Yizhou5.2.1 被唤醒任务的CPU选择 370171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~~~~~~ 371171e9af1STang Yizhou 372171e9af1STang Yizhou实时任务唤醒时的CPU选择,搜索满足以下条件的CPU:: 373171e9af1STang Yizhou 374171e9af1STang Yizhou task_uclamp_min(p) <= capacity(task_cpu(cpu)) 375171e9af1STang Yizhou 376171e9af1STang Yizhou同时仍然允许接着使用常规的优先级限制。如果没有CPU能满足这个算力准则,那么将使用基于严格 377171e9af1STang Yizhou优先级的调度,CPU算力将被忽略。 378171e9af1STang Yizhou 379171e9af1STang Yizhou5.3 最后期限调度 380171e9af1STang Yizhou---------------- 381171e9af1STang Yizhou 382171e9af1STang Yizhou5.3.1 被唤醒任务的CPU选择 383171e9af1STang Yizhou~~~~~~~~~~~~~~~~~~~~~~~~~ 384171e9af1STang Yizhou 385171e9af1STang Yizhou最后期限任务唤醒时的CPU选择,搜索满足以下条件的CPU:: 386171e9af1STang Yizhou 387171e9af1STang Yizhou task_bandwidth(p) < capacity(task_cpu(p)) 388171e9af1STang Yizhou 389171e9af1STang Yizhou同时仍然允许接着使用常规的带宽和截止期限限制。如果没有CPU能满足这个算力准则,那么任务依然 390171e9af1STang Yizhou在当前CPU队列中。 391