History log of /openbmc/linux/kernel/time/tick-sched.c (Results 76 – 100 of 668)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12
# 80d20d35 31-Jul-2018 Anna-Maria Gleixner <anna-maria@linutronix.de>

nohz: Fix local_timer_softirq_pending()

local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ

nohz: Fix local_timer_softirq_pending()

local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.

This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.

Use BIT(TIMER_SOFTIRQ) instead.

Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de

show more ...


Revision tags: v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17
# a3ed0e43 25-Apr-2018 Thomas Gleixner <tglx@linutronix.de>

Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME

Revert commits

92af4dcb4e1c ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f4342 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behav

Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME

Revert commits

92af4dcb4e1c ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f4342 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
7250a4047aa6 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
d6c7270e913d ("timekeeping: Remove boot time specific code")
f2d6fdbfd238 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
d6ed449afdb3 ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
72199320d49d ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")

As stated in the pull request for the unification of CLOCK_MONOTONIC and
CLOCK_BOOTTIME, it was clear that we might have to revert the change.

As reported by several folks systemd and other applications rely on the
documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
changes. After resume daemons time out and other timeout related issues are
observed. Rafael compiled this list:

* systemd kills daemons on resume, after >WatchdogSec seconds
of suspending (Genki Sky). [Verified that that's because systemd uses
CLOCK_MONOTONIC and expects it to not include the suspend time.]

* systemd-journald misbehaves after resume:
systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
corrupted or uncleanly shut down, renaming and replacing.
(Mike Galbraith).

* NetworkManager reports "networking disabled" and networking is broken
after resume 50% of the time (Pavel). [May be because of systemd.]

* MATE desktop dims the display and starts the screensaver right after
system resume (Pavel).

* Full system hang during resume (me). [May be due to systemd or NM or both.]

That happens on debian and open suse systems.

It's sad, that these problems were neither catched in -next nor by those
folks who expressed interest in this change.

Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reported-by: Genki Sky <sky@genki.is>,
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>

show more ...


# 1f71addd 24-Apr-2018 Thomas Gleixner <tglx@linutronix.de>

tick/sched: Do not mess with an enqueued hrtimer

Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
did great debugging, which provided enough context to decode the problem.

tick/sched: Do not mess with an enqueued hrtimer

Kaike reported that in tests rdma hrtimers occasionaly stopped working. He
did great debugging, which provided enough context to decode the problem.

CPU 3 CPU 2

idle
start sched_timer expires = 712171000000
queue->next = sched_timer
start rdmavt timer. expires = 712172915662
lock(baseof(CPU3))
tick_nohz_stop_tick()
tick = 716767000000 timerqueue_add(tmr)

hrtimer_set_expires(sched_timer, tick);
sched_timer->expires = 716767000000 <---- FAIL
if (tmr->expires < queue->next->expires)
hrtimer_start(sched_timer) queue->next = tmr;
lock(baseof(CPU3))
unlock(baseof(CPU3))
timerqueue_remove()
timerqueue_add()

ts->sched_timer is queued and queue->next is pointing to it, but then
ts->sched_timer.expires is modified.

This not only corrupts the ordering of the timerqueue RB tree, it also
makes CPU2 see the new expiry time of timerqueue->next->expires when
checking whether timerqueue->next needs to be updated. So CPU2 sees that
the rdma timer is earlier than timerqueue->next and sets the rdma timer as
new next.

Depending on whether it had also seen the new time at RB tree enqueue, it
might have queued the rdma timer at the wrong place and then after removing
the sched_timer the RB tree is completely hosed.

The problem was introduced with a commit which tried to solve inconsistency
between the hrtimer in the tick_sched data and the underlying hardware
clockevent. It split out hrtimer_set_expires() to store the new tick time
in both the NOHZ and the NOHZ + HIGHRES case, but missed the fact that in
the NOHZ + HIGHRES case the hrtimer might still be queued.

Use hrtimer_start(timer, tick...) for the NOHZ + HIGHRES case which sets
timer->expires after canceling the timer and move the hrtimer_set_expires()
invocation into the NOHZ only code path which is not affected as it merily
uses the hrtimer as next event storage so code pathes can be shared with
the NOHZ + HIGHRES case.

Fixes: d4af6d933ccf ("nohz: Fix spurious warning when hrtimer and clockevent get out of sync")
Reported-by: "Wan Kaike" <kaike.wan@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: "Marciniszyn Mike" <mike.marciniszyn@intel.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: linux-rdma@vger.kernel.org
Cc: "Dalessandro Dennis" <dennis.dalessandro@intel.com>
Cc: "Fleck John" <john.fleck@intel.com>
Cc: stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Weiny Ira" <ira.weiny@intel.com>
Cc: "linux-rdma@vger.kernel.org"
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804241637390.1679@nanos.tec.linutronix.de
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1804242119210.1597@nanos.tec.linutronix.de

show more ...


# bbe9a70a 09-Apr-2018 Arnd Bergmann <arnd@arndb.de>

tick-sched: avoid a maybe-uninitialized warning

The use of bitfields seems to confuse gcc, leading to a false-positive
warning in all compiler versions:

kernel/time/tick-sched.c: In function 'tick_

tick-sched: avoid a maybe-uninitialized warning

The use of bitfields seems to confuse gcc, leading to a false-positive
warning in all compiler versions:

kernel/time/tick-sched.c: In function 'tick_nohz_idle_exit':
kernel/time/tick-sched.c:538:2: error: 'now' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This introduces a temporary variable to track the flags so gcc
doesn't have to evaluate twice, eliminating the code path that
leads to the warning.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85301
Fixes: 1cae544d42d2 ("nohz: Gather tick_sched booleans under a common flag field")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# ff7de620 06-Apr-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

nohz: Avoid duplication of code related to got_idle_tick

Move the code setting ts->got_idle_tick into tick_sched_do_timer() to
avoid code duplication.

No intentional changes in functionality.

Sugg

nohz: Avoid duplication of code related to got_idle_tick

Move the code setting ts->got_idle_tick into tick_sched_do_timer() to
avoid code duplication.

No intentional changes in functionality.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

show more ...


# 2bc629a6 05-Apr-2018 Frederic Weisbecker <frederic@kernel.org>

nohz: Gather tick_sched booleans under a common flag field

Optimize the space and leave plenty of room for further flags.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[ rjw: Do not use

nohz: Gather tick_sched booleans under a common flag field

Optimize the space and leave plenty of room for further flags.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[ rjw: Do not use __this_cpu_read() to access tick_stopped and add
got_idle_tick to avoid overloading inidle ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


# 296bb1e5 05-Apr-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: menu: Refine idle state selection for running tick

If the tick isn't stopped, the target residency of the state selected
by the menu governor may be greater than the actual time to the next

cpuidle: menu: Refine idle state selection for running tick

If the tick isn't stopped, the target residency of the state selected
by the menu governor may be greater than the actual time to the next
tick and that means lost energy.

To avoid that, make tick_nohz_get_sleep_length() return the current
time to the next event (before stopping the tick) in addition to the
estimated one via an extra pointer argument and make menu_select()
use that value to refine the state selection when necessary.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

show more ...


# 554c8aa8 03-Apr-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

sched: idle: Select idle state before stopping the tick

In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped,
reorder the

sched: idle: Select idle state before stopping the tick

In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped,
reorder the code in cpuidle_idle_call() so that the governor idle
state selection runs before tick_nohz_idle_go_idle() and use the
"nohz" hint returned by cpuidle_select() to decide whether or not
to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_stop_tick(). Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.

Namely, tick_nohz_get_sleep_length() can be made call
tick_nohz_next_event(), introduced earlier, to get the time to the
next non-highres timer event. If that happens, tick_nohz_next_event()
need not be called by __tick_nohz_idle_stop_tick() again.

If it turns out that the scheduler tick cannot be stopped going
forward or the next timer event is too close for the tick to be
stopped, tick_nohz_get_sleep_length() can simply return the time to
the next event currently programmed into the corresponding clock
event device.

In addition to knowing the return value of tick_nohz_next_event(),
however, tick_nohz_get_sleep_length() needs to know the time to the
next highres timer event, but with the scheduler tick timer excluded,
which can be computed with the help of hrtimer_get_next_event().

That minimum of that number and the tick_nohz_next_event() return
value is the total time to the next timer event with the assumption
that the tick will be stopped. It can be returned to the idle
governor which can use it for predicting idle duration (under the
assumption that the tick will be stopped) and deciding whether or
not it makes sense to stop the tick before putting the CPU into the
selected idle state.

With the above, the sleep_length field in struct tick_sched is not
necessary any more, so drop it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227
Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

show more ...


# 23a8d888 05-Apr-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

time: tick-sched: Split tick_nohz_stop_sched_tick()

In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped, split
tick_nohz

time: tick-sched: Split tick_nohz_stop_sched_tick()

In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped, split
tick_nohz_stop_sched_tick() into two separate routines, one computing
the time to the next timer event and the other simply stopping the
tick when the time to the next timer event is known.

Prepare these two routines to be called separately, as one of them
will be called by the idle governor in the cpuidle_select() code
path after subsequent changes.

Update the former callers of tick_nohz_stop_sched_tick() to use
the new routines, tick_nohz_next_event() and tick_nohz_stop_tick(),
instead of it and move the updates of the sleep_length field in
struct tick_sched into __tick_nohz_idle_stop_tick() as it doesn't
need to be updated anywhere else.

There should be no intentional visible changes in functionality
resulting from this change.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

show more ...


Revision tags: v4.16
# 45f1ff59 22-Mar-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpuidle: Return nohz hint from cpuidle_select()

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the ti

cpuidle: Return nohz hint from cpuidle_select()

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
(1) the idle exit latency is constrained at 0, or
(2) the selected state is a polling one, or
(3) the expected idle period duration is within the tick period
range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values. To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs. Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

show more ...


# 2aaf709a 15-Mar-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

sched: idle: Do not stop the tick upfront in the idle loop

Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.

Stopping the tick upfront leads to unpleasant outcom

sched: idle: Do not stop the tick upfront in the idle loop

Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.

Stopping the tick upfront leads to unpleasant outcomes in case the
idle governor doesn't agree with the nohz code on the duration of the
upcoming idle period. Specifically, if the tick has been stopped and
the idle governor predicts short idle, the situation is bad regardless
of whether or not the prediction is accurate. If it is accurate, the
tick has been stopped unnecessarily which means excessive overhead.
If it is not accurate, the CPU is likely to spend too much time in
the (shallow, because short idle has been predicted) idle state
selected by the governor [1].

As the first step towards addressing this problem, change the code
to make the tick stopping decision inside of the loop in do_idle().
In particular, do not stop the tick in the cpu_idle_poll() code path.
Also don't do that in tick_nohz_irq_exit() which doesn't really have
enough information on whether or not to stop the tick.

Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

show more ...


# 0e776768 05-Apr-2018 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

time: tick-sched: Reorganize idle tick management code

Prepare the scheduler tick code for reworking the idle loop to
avoid stopping the tick in some cases.

The idea is to split the nohz idle entry

time: tick-sched: Reorganize idle tick management code

Prepare the scheduler tick code for reworking the idle loop to
avoid stopping the tick in some cases.

The idea is to split the nohz idle entry call to decouple the idle
time stats accounting and preparatory work from the actual tick stop
code, in order to later be able to delay the tick stop once we reach
more power-knowledgeable callers.

Move away the tick_nohz_start_idle() invocation from
__tick_nohz_idle_enter(), rename the latter to
__tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
as a wrapper around it for calling it from the outside.

Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
of calling the entire __tick_nohz_idle_enter(), add another wrapper
disabling and enabling interrupts around tick_nohz_idle_stop_tick()
and make the current callers of tick_nohz_idle_enter() call it too
to retain their current functionality.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

show more ...


# d6ed449a 01-Mar-2018 Thomas Gleixner <tglx@linutronix.de>

timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock

The MONOTONIC clock is not fast forwarded by the time spent in suspend on
resume. This is only done for the BOOTTIME clock. The r

timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock

The MONOTONIC clock is not fast forwarded by the time spent in suspend on
resume. This is only done for the BOOTTIME clock. The reason why the
MONOTONIC clock is not forwarded is historical: the original Linux
implementation was using jiffies as a base for the MONOTONIC clock and
jiffies have never been advanced after resume.

At some point when timekeeping was unified in the core code, the
MONONOTIC clock was advanced after resume which also advanced jiffies causing
interesting side effects. As a consequence the the MONOTONIC clock forwarding
was disabled again and the BOOTTIME clock was introduced, which allows to read
time since boot.

Back then it was not possible to completely distangle the MONOTONIC clock and
jiffies because there were still interfaces which exposed the MONOTONIC clock
behaviour based on the timer wheel and therefore jiffies.

As of today none of the MONOTONIC clock facilities depends on jiffies
anymore so the forwarding can be done seperately. This is achieved by
forwarding the variables which are used for the jiffies update after resume
before the tick is restarted,

In timekeeping resume, the change is rather simple. Instead of updating the
offset between the MONOTONIC clock and the REALTIME/BOOTTIME clocks, advance the
time keeper base for the MONOTONIC and the MONOTONIC_RAW clocks by the time
spent in suspend.

The MONOTONIC clock is now the same as the BOOTTIME clock and the offset between
the REALTIME and the MONOTONIC clocks is the same as before suspend.

There might be side effects in applications, which rely on the
(unfortunately) well documented behaviour of the MONOTONIC clock, but the
downsides of the existing behaviour are probably worse.

There is one obvious issue. Up to now it was possible to retrieve the time
spent in suspend by observing the delta between the MONOTONIC clock and the
BOOTTIME clock. This is not longer available, but the previously introduced
mechanism to read the active non-suspended monotonic time can mitigate that
in a detectable fashion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.062975504@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


Revision tags: v4.15
# 00357f5e 21-Dec-2017 Peter Zijlstra <peterz@infradead.org>

sched/nohz: Clean up nohz enter/exit

The primary observation is that nohz enter/exit is always from the
current CPU, therefore NOHZ_TICK_STOPPED does not in fact need to be
an atomic.

Secondary is

sched/nohz: Clean up nohz enter/exit

The primary observation is that nohz enter/exit is always from the
current CPU, therefore NOHZ_TICK_STOPPED does not in fact need to be
an atomic.

Secondary is that we appear to have 2 nearly identical hooks in the
nohz enter code, set_cpu_sd_state_idle() and
nohz_balance_enter_idle(). Fold the whole set_cpu_sd_state thing into
nohz_balance_{enter,exit}_idle.

Removes an atomic op from both enter and exit paths.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# dcdedb24 20-Feb-2018 Frederic Weisbecker <frederic@kernel.org>

sched/nohz: Remove the 1 Hz tick code

Now that the 1Hz tick is offloaded to workqueues, we can safely remove
the residual code that used to handle it locally.

Signed-off-by: Frederic Weisbecker <fr

sched/nohz: Remove the 1 Hz tick code

Now that the 1Hz tick is offloaded to workqueues, we can safely remove
the residual code that used to handle it locally.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-7-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 22ab8bc0 20-Feb-2018 Frederic Weisbecker <frederic@kernel.org>

nohz: Allow to check if remote CPU tick is stopped

This check is racy but provides a good heuristic to determine whether
a CPU may need a remote tick or not.

Signed-off-by: Frederic Weisbecker <fre

nohz: Allow to check if remote CPU tick is stopped

This check is racy but provides a good heuristic to determine whether
a CPU may need a remote tick or not.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-4-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# a3642983 20-Feb-2018 Frederic Weisbecker <frederic@kernel.org>

nohz: Convert tick_nohz_tick_stopped() to bool

It makes this function more self-explanatory about what it does and how
to use it.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fr

nohz: Convert tick_nohz_tick_stopped() to bool

It makes this function more self-explanatory about what it does and how
to use it.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# a7c8655b 30-Nov-2017 Paul E. McKenney <paulmck@linux.vnet.ibm.com>

sched/isolation: Eliminate NO_HZ_FULL_ALL

Commit 6f1982fedd59 ("sched/isolation: Handle the nohz_full= parameter")
broke CONFIG_NO_HZ_FULL_ALL=y kernels. This breakage is due to the code
under CONF

sched/isolation: Eliminate NO_HZ_FULL_ALL

Commit 6f1982fedd59 ("sched/isolation: Handle the nohz_full= parameter")
broke CONFIG_NO_HZ_FULL_ALL=y kernels. This breakage is due to the code
under CONFIG_NO_HZ_FULL_ALL failing to invoke the shiny new housekeeping
functions. This means that rcutorture scenario TREE04 now emits RCU CPU
stall warnings due to the RCU grace-period kthreads not being awakened
at a time of their choosing, or perhaps even not at all:

[ 27.731422] rcu_bh kthread starved for 21001 jiffies! g18446744073709551369 c18446744073709551368 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=3
[ 27.731423] rcu_bh I14936 9 2 0x80080000
[ 27.731435] Call Trace:
[ 27.731440] __schedule+0x31a/0x6d0
[ 27.731442] schedule+0x31/0x80
[ 27.731446] schedule_timeout+0x15a/0x320
[ 27.731453] ? call_timer_fn+0x130/0x130
[ 27.731457] rcu_gp_kthread+0x66c/0xea0
[ 27.731458] ? rcu_gp_kthread+0x66c/0xea0

Because no one has complained about CONFIG_NO_HZ_FULL_ALL=y being broken,
I hypothesize that no one is in fact using it, other than rcutorture.
This commit therefore eliminates CONFIG_NO_HZ_FULL_ALL and updates
rcutorture's config files to instead use the nohz_full= kernel parameter
to put the desired CPUs into nohz_full mode.

Fixes: 6f1982fedd59 ("sched/isolation: Handle the nohz_full= parameter")

Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>

show more ...


# ae67bada 14-Jan-2018 Thomas Gleixner <tglx@linutronix.de>

hrtimer: Optimize the hrtimer code by using static keys for migration_enable/nohz_active

The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing

hrtimer: Optimize the hrtimer code by using static keys for migration_enable/nohz_active

The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.

Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.

Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.

No change in functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 5d62c183 22-Dec-2017 Thomas Gleixner <tglx@linutronix.de>

nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

if ((i

nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()

The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:

if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))

If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.

Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....

Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.

Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos

show more ...


# 466a2b42 20-Dec-2017 Joel Fernandes <joelaf@google.com>

cpufreq: schedutil: Use idle_calls counter of the remote CPU

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies how

cpufreq: schedutil: Use idle_calls counter of the remote CPU

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

show more ...


Revision tags: v4.13.16, v4.14
# ebf3adba 06-Nov-2017 Frederic Weisbecker <frederic@kernel.org>

timers/nohz: Use lockdep to assert IRQs are disabled/enabled

Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correc

timers/nohz: Use lockdep to assert IRQs are disabled/enabled

Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correctness
debug code is enabled.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 6f1982fe 26-Oct-2017 Frederic Weisbecker <frederic@kernel.org>

sched/isolation: Handle the nohz_full= parameter

We want to centralize the isolation management, done by the housekeeping
subsystem. Therefore we need to handle the nohz_full= parameter from
there.

sched/isolation: Handle the nohz_full= parameter

We want to centralize the isolation management, done by the housekeeping
subsystem. Therefore we need to handle the nohz_full= parameter from
there.

Since nohz_full= so far has involved unbound timers, watchdog, RCU
and tilegx NAPI isolation, we keep that default behaviour.

nohz_full= will be deprecated in the future. We want to control
the isolation features from the isolcpus= parameter.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-10-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 78634061 26-Oct-2017 Frederic Weisbecker <frederic@kernel.org>

sched/isolation: Move housekeeping related code to its own file

The housekeeping code is currently tied to the NOHZ code. As we are
planning to make housekeeping independent from it, start with movi

sched/isolation: Move housekeeping related code to its own file

The housekeeping code is currently tied to the NOHZ code. As we are
planning to make housekeeping independent from it, start with moving
the relevant code to its own file.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


Revision tags: v4.13.5, v4.13
# 62cb1188 29-Aug-2017 Peter Zijlstra <peterz@infradead.org>

sched/idle: Move quiet_vmstate() into the NOHZ code

quiet_vmstat() is an expensive function that only makes sense when we
go into NOHZ.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

sched/idle: Move quiet_vmstate() into the NOHZ code

quiet_vmstat() is an expensive function that only makes sense when we
go into NOHZ.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: aubrey.li@linux.intel.com
Cc: cl@linux.com
Cc: fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


12345678910>>...27