Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29 |
|
#
941e1c6d |
| 24-Apr-2024 |
Cheng Yu <serein.chengyu@huawei.com> |
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a
sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
[ Upstream commit 49217ea147df7647cb89161b805c797487783fc0 ]
In the cgroup v2 CPU subsystem, assuming we have a cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max # echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max 1000000 100000 # cat /sys/fs/cgroup/test/cpu.max.burst 1000000
Next we set cpu.max again and check cpu.max and cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max # cat /sys/fs/cgroup/test/cpu.max 2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst 1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned by tg_get_cfs_burst() is microseconds, while in cpu_max_write(), the burst unit used for calculation should be nanoseconds, which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Reported-by: Qixin Liao <liaoqixin@huawei.com> Signed-off-by: Cheng Yu <serein.chengyu@huawei.com> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
c4c2f7e6 |
| 09-Jun-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify tg_set_cfs_bandwidth()
[ Upstream commit 6fb45460615358157a6d3c990e74f9c1395247e2 ]
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <pet
sched: Simplify tg_set_cfs_bandwidth()
[ Upstream commit 6fb45460615358157a6d3c990e74f9c1395247e2 ]
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Stable-dep-of: 1aa09b9379a7 ("powercap: intel_rapl: Fix locking in TPMI RAPL") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
c4c2f7e6 |
| 09-Jun-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify tg_set_cfs_bandwidth()
[ Upstream commit 6fb45460615358157a6d3c990e74f9c1395247e2 ]
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <pet
sched: Simplify tg_set_cfs_bandwidth()
[ Upstream commit 6fb45460615358157a6d3c990e74f9c1395247e2 ]
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Stable-dep-of: 1aa09b9379a7 ("powercap: intel_rapl: Fix locking in TPMI RAPL") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
bf5dd542 |
| 12-Oct-2023 |
Hao Jia <jiahao.os@bytedance.com> |
sched/core: Fix RQCF_ACT_SKIP leak
commit 5ebde09d91707a4a9bec1e3d213e3c12ffde348f upstream.
Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.
This warning may be triggered in the
sched/core: Fix RQCF_ACT_SKIP leak
commit 5ebde09d91707a4a9bec1e3d213e3c12ffde348f upstream.
Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.
This warning may be triggered in the following situations:
CPU0 CPU1
__schedule() *rq->clock_update_flags <<= 1;* unregister_fair_sched_group() pick_next_task_fair+0x4a/0x410 destroy_cfs_bandwidth() newidle_balance+0x115/0x3e0 for_each_possible_cpu(i) *i=0* rq_unpin_lock(this_rq, rf) __cfsb_csd_unthrottle() raw_spin_rq_unlock(this_rq) rq_lock(*CPU0_rq*, &rf) rq_clock_start_loop_update() rq->clock_update_flags & RQCF_ACT_SKIP <-- raw_spin_rq_lock(this_rq)
The purpose of RQCF_ACT_SKIP is to skip the update rq clock, but the update is very early in __schedule(), but we clear RQCF_*_SKIP very late, causing it to span that gap above and triggering this warning.
In __schedule() we can clear the RQCF_*_SKIP flag immediately after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning. And set rq->clock_update_flags to RQCF_UPDATED to avoid rq->clock_update_flags < RQCF_ACT_SKIP warning that may be triggered later.
Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()") Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com Reported-by: Igor Raits <igor.raits@gmail.com> Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
d03b4817 |
| 10-Oct-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Fix stop_one_cpu_nowait() vs hotplug
[ Upstream commit f0498d2a54e7966ce23cd7c7ff42c64fa0059b07 ]
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU hotplug stress-test -- notab
sched: Fix stop_one_cpu_nowait() vs hotplug
[ Upstream commit f0498d2a54e7966ce23cd7c7ff42c64fa0059b07 ]
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU hotplug stress-test -- notably affine_move_task() remains stuck in wait_for_completion(), leading to a hung-task detector warning.
Specifically, it was reported that stop_one_cpu_nowait(.fn = migration_cpu_stop) returns false -- this stopper is responsible for the matching complete().
The race scenario is:
CPU0 CPU1
// doing _cpu_down()
__set_cpus_allowed_ptr() task_rq_lock(); takedown_cpu() stop_machine_cpuslocked(take_cpu_down..)
<PREEMPT: cpu_stopper_thread() MULTI_STOP_PREPARE ... __set_cpus_allowed_ptr_locked() affine_move_task() task_rq_unlock();
<PREEMPT: cpu_stopper_thread()\> ack_state() MULTI_STOP_RUN take_cpu_down() __cpu_disable(); stop_machine_park(); stopper->enabled = false; /> /> stop_one_cpu_nowait(.fn = migration_cpu_stop); if (stopper->enabled) // false!!!
That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the stopper thread gets a chance to preempt and allows the cpu-down for the target CPU to complete.
OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to issue a wakeup, it must not be ran under the scheduler locks.
Solve this apparent contradiction by keeping preemption disabled over the unlock + queue_stopper combination:
preempt_disable(); task_rq_unlock(...); if (!stop_pending) stop_one_cpu_nowait(...) preempt_enable();
This respects the lock ordering contraints while still avoiding the above race. That is, if we find the CPU is online under rq-lock, the targeted stop_one_cpu_nowait() must succeed.
Apply this pattern to all similar stop_one_cpu_nowait() invocations.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Reported-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Kuyo Chang (張建文)" <Kuyo.Chang@mediatek.com> Link: https://lkml.kernel.org/r/20231010200442.GA16515@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
cff9b233 |
| 15-Sep-2023 |
Liam R. Howlett <Liam.Howlett@oracle.com> |
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu(
kernel/sched: Modify initial boot task idle setup
Initial booting is setting the task flag to idle (PF_IDLE) by the call path sched_init() -> init_idle(). Having the task idle and calling call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be set. Subsequent calls to any cond_resched() will enable IRQs, potentially earlier than the IRQ setup has completed. Recent changes have caused just this scenario and IRQs have been enabled early.
This causes a warning later in start_kernel() as interrupts are enabled before they are fully set up.
Fix this issue by setting the PF_IDLE flag later in the boot sequence.
Although the boot task was marked as idle since (at least) d80e4fda576d, I am not sure that it is wrong to do so. The forced context-switch on idle task was introduced in the tiny_rcu update, so I'm going to claim this fixes 5f6130fa52ee.
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
show more ...
|
#
7170509c |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify sched_core_cpu_{starting,deactivate}()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schne
sched: Simplify sched_core_cpu_{starting,deactivate}()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.371787909@infradead.org
show more ...
|
#
b4e1fa1e |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify try_steal_cookie()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redha
sched: Simplify try_steal_cookie()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.304154828@infradead.org
show more ...
|
#
6dafc713 |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify sched_tick_remote()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redh
sched: Simplify sched_tick_remote()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.236247952@infradead.org
show more ...
|
#
4bdada79 |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify sched_exec()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
sched: Simplify sched_exec()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.168490417@infradead.org
show more ...
|
#
857d315f |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify ttwu()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link:
sched: Simplify ttwu()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.101069260@infradead.org
show more ...
|
#
4eb054f9 |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify wake_up_if_idle()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat
sched: Simplify wake_up_if_idle()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211812.032678917@infradead.org
show more ...
|
#
5bb76f1d |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify: migrate_swap_stop()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@red
sched: Simplify: migrate_swap_stop()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.964370836@infradead.org
show more ...
|
#
0f92cdf3 |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify sysctl_sched_uclamp_handler()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vsc
sched: Simplify sysctl_sched_uclamp_handler()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.896559109@infradead.org
show more ...
|
#
7537b90c |
| 01-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched: Simplify get_nohz_timer_target()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel
sched: Simplify get_nohz_timer_target()
Use guards to reduce gotos and simplify control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230801211811.828443100@infradead.org
show more ...
|
#
88c56cfe |
| 12-Jul-2023 |
Phil Auld <pauld@redhat.com> |
sched/fair: Block nohz tick_stop when cfs bandwidth in use
CFS bandwidth limits and NOHZ full don't play well together. Tasks can easily run well past their quotas before a remote tick does account
sched/fair: Block nohz tick_stop when cfs bandwidth in use
CFS bandwidth limits and NOHZ full don't play well together. Tasks can easily run well past their quotas before a remote tick does accounting. This leads to long, multi-period stalls before such tasks can run again. Currently, when presented with these conflicting requirements the scheduler is favoring nohz_full and letting the tick be stopped. However, nohz tick stopping is already best-effort, there are a number of conditions that can prevent it, whereas cfs runtime bandwidth is expected to be enforced.
Make the scheduler favor bandwidth over stopping the tick by setting TICK_DEP_BIT_SCHED when the only running task is a cfs task with runtime limit enabled. We use cfs_b->hierarchical_quota to determine if the task requires the tick.
Add check in pick_next_task_fair() as well since that is where we have a handle on the task that is actually going to be running.
Add check in sched_can_stop_tick() to cover some edge cases such as nr_running going from 2->1 and the 1 remains the running task.
Reviewed-By: Ben Segall <bsegall@google.com> Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230712133357.381137-3-pauld@redhat.com
show more ...
|
#
c98c1827 |
| 14-Jul-2023 |
Phil Auld <pauld@redhat.com> |
sched, cgroup: Restore meaning to hierarchical_quota
In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task groups due to the previous fix simply taking the min. It should reflect a limit
sched, cgroup: Restore meaning to hierarchical_quota
In cgroupv2 cfs_b->hierarchical_quota is set to -1 for all task groups due to the previous fix simply taking the min. It should reflect a limit imposed at that level or by an ancestor. Even though cgroupv2 does not require child quota to be less than or equal to that of its ancestors the task group will still be constrained by such a quota so this should be shown here. Cgroupv1 continues to set this correctly.
In both cases, add initialization when a new task group is created based on the current parent's value (or RUNTIME_INF in the case of root_task_group). Otherwise, the field is wrong until a quota is changed after creation and __cfs_schedulable() is called.
Fixes: c53593e5cb69 ("sched, cgroup: Don't reject lower cpu.max on ancestors") Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230714125746.812891-1-pauld@redhat.com
show more ...
|
Revision tags: v6.1.33, v6.1.32 |
|
#
e4ec3318 |
| 31-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice
EEVDF uses this tunable as the base request/slice -- make sure the name reflects this.
Signed-off-by: Peter Zijlstra (Int
sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice
EEVDF uses this tunable as the base request/slice -- make sure the name reflects this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124604.205287511@infradead.org
show more ...
|
#
147f3efa |
| 31-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
sched/fair: Implement an EEVDF-like scheduling policy
Where CFS is currently a WFQ based scheduler with only a single knob, the weight. The addition of a second, latency oriented parameter, makes so
sched/fair: Implement an EEVDF-like scheduling policy
Where CFS is currently a WFQ based scheduler with only a single knob, the weight. The addition of a second, latency oriented parameter, makes something like WF2Q or EEVDF based a much better fit.
Specifically, EEVDF does EDF like scheduling in the left half of the tree -- those entities that are owed service. Except because this is a virtual time scheduler, the deadlines are in virtual time as well, which is what allows over-subscription.
EEVDF has two parameters:
- weight, or time-slope: which is mapped to nice just as before
- request size, or slice length: which is used to compute the virtual deadline as: vd_i = ve_i + r_i/w_i
Basically, by setting a smaller slice, the deadline will be earlier and the task will be more eligible and ran earlier.
Tick driven preemption is driven by request/slice completion; while wakeup preemption is driven by the deadline.
Because the tree is now effectively an interval tree, and the selection is no longer 'leftmost', over-scheduling is less of a problem.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230531124603.931005524@infradead.org
show more ...
|