#
59449359 |
| 27-May-2015 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Remove idle task special case
On nohz full early days, idle dynticks and full dynticks weren't well integrated and we couldn't risk full dynticks calls on idle without risking messing up tick
nohz: Remove idle task special case
On nohz full early days, idle dynticks and full dynticks weren't well integrated and we couldn't risk full dynticks calls on idle without risking messing up tick idle statistics. This is why we prevented such thing to happen.
Nowadays full dynticks and idle dynticks are better integrated and interact without known issue.
So lets remove that.
Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
#
683be13a |
| 26-May-2015 |
Thomas Gleixner <tglx@linutronix.de> |
timer: Minimize nohz off overhead
If nohz is disabled on the kernel command line the [hr]timer code still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty pointless exercise. Cache nohz_a
timer: Minimize nohz off overhead
If nohz is disabled on the kernel command line the [hr]timer code still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty pointless exercise. Cache nohz_active in [hr]timer per cpu bases and avoid the overhead.
Before: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu
After: 48.73% hog [.] main 15.36% [kernel] [k] _raw_spin_lock_irqsave 9.77% [kernel] [k] _raw_spin_unlock_irqrestore 6.61% [kernel] [k] lock_timer_base.isra.38 6.42% [kernel] [k] mod_timer 3.90% [kernel] [k] detach_if_pending 3.76% [kernel] [k] del_timer 2.41% [kernel] [k] internal_add_timer 1.39% [kernel] [k] __internal_add_timer 0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu bases as well to avoid the cpumask check. The base cache line is hot already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
bc7a34b8 |
| 26-May-2015 |
Thomas Gleixner <tglx@linutronix.de> |
timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature
timer: Reduce timer migration overhead if disabled
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base.
The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line.
With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what.
Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu
After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v4.1-rc5, v4.1-rc4, v4.1-rc3 |
|
#
6b442bc8 |
| 07-May-2015 |
Thomas Gleixner <tglx@linutronix.de> |
nohz: Fix !HIGH_RES_TIMERS hang
Simon Horman reported this crash on a system with high-res timers disabled but nohz enabled:
> ------------[ cut here ]------------ > kernel BUG at kernel/irq_wo
nohz: Fix !HIGH_RES_TIMERS hang
Simon Horman reported this crash on a system with high-res timers disabled but nohz enabled:
> ------------[ cut here ]------------ > kernel BUG at kernel/irq_work.c:135!
BUG_ON(!irqs_disabled());
So something enabled interrupts in the periodic tick handling machinery, and that code path indeed has a local_irq_disable()/enable pair in tick_nohz_switch_to_nohz() which causes havoc. Fix it.
This patch also fixes a +nohz -hrtimers hang reported by Ingo Molnar.
Reported-by: Simon Horman <horms@verge.net.au> Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Simon Horman <horms@verge.net.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505071425520.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.1-rc2, v4.1-rc1 |
|
#
c1ad348b |
| 14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: Nohz: Rework next timer evaluation
The evaluation of the next timer in the nohz code is based on jiffies while all the tick internals are nano seconds based. We have also to convert hrtimer na
tick: Nohz: Rework next timer evaluation
The evaluation of the next timer in the nohz code is based on jiffies while all the tick internals are nano seconds based. We have also to convert hrtimer nanoseconds to jiffies in the !highres case. That's just wrong and introduces interesting corner cases.
Turn it around and convert the next timer wheel timer expiry and the rcu event to clock monotonic and base all calculations on nanoseconds. That identifies the case where no timer is pending clearly with an absolute expiry value of KTIME_MAX.
Makes the code more readable and gets rid of the jiffies magic in the nohz code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
157d29e1 |
| 14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: Sched: Restructure code
Get rid of one indentation level. Preparatory patch for a major rework. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijls
tick: Sched: Restructure code
Get rid of one indentation level. Preparatory patch for a major rework. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.101563235@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
0ff53d09 |
| 14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: sched: Force tick interrupt and get rid of softirq magic
We already got rid of the hrtimer reprogramming loops and hoops as hrtimer now enforces an interrupt if the enqueued time is in the pas
tick: sched: Force tick interrupt and get rid of softirq magic
We already got rid of the hrtimer reprogramming loops and hoops as hrtimer now enforces an interrupt if the enqueued time is in the past.
Do the same for the nohz non highres mode. That gets rid of the need to raise the softirq which only serves the purpose of getting the machine out of the inner idle loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203502.023464878@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
afc08b15 |
| 14-Apr-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: sched: Remove hrtimer_active() checks
hrtimer_start() enforces a timer interrupt if the timer is already expired. Get rid of the checks and the forward loop.
Signed-off-by: Thomas Gleixner <t
tick: sched: Remove hrtimer_active() checks
hrtimer_start() enforces a timer interrupt if the timer is already expired. Get rid of the checks and the forward loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Link: http://lkml.kernel.org/r/20150414203501.943658239@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v4.0, v4.0-rc7, v4.0-rc6 |
|
#
c1797baf |
| 25-Mar-2015 |
Thomas Gleixner <tglx@linutronix.de> |
tick: Move core only declarations and functions to core
No point to expose everything to the world. People just believe such functions can be abused for whatever purposes. Sigh.
Signed-off-by: Thom
tick: Move core only declarations and functions to core
No point to expose everything to the world. People just believe such functions can be abused for whatever purposes. Sigh.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased on top of 4.0-rc5 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan [ Merged to latest timers/core ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1 |
|
#
ffda22c1 |
| 13-Feb-2015 |
Tejun Heo <tj@kernel.org> |
time: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_ar
time: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1 |
|
#
a5fd9733 |
| 18-Dec-2014 |
Thomas Gleixner <tglx@linutronix.de> |
tick/powerclamp: Remove tick_nohz_idle abuse
commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant mainta
tick/powerclamp: Remove tick_nohz_idle abuse
commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module use" was merged via the thermal tree without an explicit ack from the relevant maintainers.
The exports are abused by the intel powerclamp driver which implements a fake idle state from a sched FIFO task. This causes all kinds of wreckage in the NOHZ core code which rightfully assumes that tick_nohz_idle_enter/exit() are only called from the idle task itself.
Recent changes in the NOHZ core lead to a failure of the powerclamp driver and now people try to hack completely broken and backwards workarounds into the NOHZ core code. This is completely unacceptable and just papers over the real problem. There are way more subtle issues lurking around the corner.
The real solution is to fix the powerclamp driver by rewriting it with a sane concept, but that's beyond the scope of this.
So the only solution for now is to remove the calls into the core NOHZ code from the powerclamp trainwreck along with the exports.
Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Pan Jacob jun <jacob.jun.pan@intel.com> Cc: LKP <lkp@01.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2 |
|
#
aa6da514 |
| 21-Oct-2014 |
Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
rcu: Remove "cpu" argument to rcu_needs_cpu()
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks() to be re
rcu: Remove "cpu" argument to rcu_needs_cpu()
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks() to be removed, which allows the uses of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
show more ...
|
#
56e4dea8 |
| 27-Oct-2014 |
Christoph Lameter <cl@linux.com> |
percpu: Convert remaining __get_cpu_var uses in 3.18-rcX
During the 3.18 merge period additional __get_cpu_var uses were added. The patch converts these to this_cpu_ptr().
Signed-off-by: Christoph
percpu: Convert remaining __get_cpu_var uses in 3.18-rcX
During the 3.18 merge period additional __get_cpu_var uses were added. The patch converts these to this_cpu_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
Revision tags: v3.18-rc1, v3.17 |
|
#
fe0f4976 |
| 30-Sep-2014 |
Martin Schwidefsky <schwidefsky@de.ibm.com> |
s390/nohz: use a per-cpu flag for arch_needs_cpu
Move the nohz_delay bit from the s390_idle data structure to the per-cpu flags. Clear the nohz delay flag in __cpu_disable and remove the cpu hotplug
s390/nohz: use a per-cpu flag for arch_needs_cpu
Move the nohz_delay bit from the s390_idle data structure to the per-cpu flags. Clear the nohz delay flag in __cpu_disable and remove the cpu hotplug notifier that used to do this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
show more ...
|
Revision tags: v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2 |
|
#
9b01f5bf |
| 17-Aug-2014 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: nohz full depends on irq work self IPI support
The nohz full functionality depends on IRQ work to trigger its own interrupts. As it's used to restart the tick, we can't rely on the tick fallba
nohz: nohz full depends on irq work self IPI support
The nohz full functionality depends on IRQ work to trigger its own interrupts. As it's used to restart the tick, we can't rely on the tick fallback for irq work callbacks, ie: we can't use the tick to restart the tick itself.
Lets reject the full dynticks initialization if that arch support isn't available.
As a side effect, this makes sure that nohz kick is never called from the tick. That otherwise would result in illegal hrtimer self-cancellation and lockup.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
#
4327b15f |
| 17-Aug-2014 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Consolidate nohz full init code
The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel parameter both have their own way to do the same thing: allocate full dynticks cpumasks, fill
nohz: Consolidate nohz full init code
The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel parameter both have their own way to do the same thing: allocate full dynticks cpumasks, fill them and initialize some state variables.
Lets consolidate that all in the same place.
While at it, convert some regular printk message to warnings when fundamental allocations fail.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
Revision tags: v3.17-rc1 |
|
#
40bea039 |
| 13-Aug-2014 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Restore NMI safe local irq work for local nohz kick
The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9)
nohz: Restore NMI safe local irq work for local nohz kick
The local nohz kick is currently used by perf which needs it to be NMI-safe. Recent commit though (7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9) changed its implementation to fire the local kick using the remote kick API. It was convenient to make the code more generic but the remote kick isn't NMI-safe.
As a result:
WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140() CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34 0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37 0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180 0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000 Call Trace: <NMI> [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0 [<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20 [<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140 [<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90 [<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350 [<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff9a187934>] perf_event_overflow+0x14/0x20 [<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130 [<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff9a007b72>] nmi_handle+0xd2/0x390 [<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff9a0d131b>] ? lock_release+0xab/0x330 [<ffffffff9a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200 [<ffffffff9a008268>] do_nmi+0xb8/0x100
Lets fix this by restoring the use of local irq work for the nohz local kick.
Reported-by: Catalin Iacob <iacobcatalin@gmail.com> Reported-and-tested-by: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
#
4a32fea9 |
| 17-Aug-2014 |
Christoph Lameter <cl@linux.com> |
scheduler: Replace __get_cpu_var with this_cpu_ptr
Convert all uses of __get_cpu_var for address calculation to use this_cpu_ptr instead.
[Uses of __get_cpu_var with cpumask_var_t are no longer han
scheduler: Replace __get_cpu_var with this_cpu_ptr
Convert all uses of __get_cpu_var for address calculation to use this_cpu_ptr instead.
[Uses of __get_cpu_var with cpumask_var_t are no longer handled by this patch]
Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
22127e93 |
| 17-Aug-2014 |
Christoph Lameter <cl@linux.com> |
time: Replace __get_cpu_var uses
Convert uses of __get_cpu_var for creating a address from a percpu offset to this_cpu_ptr.
The two cases where get_cpu_var is used to actually access a percpu varia
time: Replace __get_cpu_var uses
Convert uses of __get_cpu_var for creating a address from a percpu offset to this_cpu_ptr.
The two cases where get_cpu_var is used to actually access a percpu variable are changed to use this_cpu_read/raw_cpu_read.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1 |
|
#
2a16fc93 |
| 12-Jun-2014 |
Viresh Kumar <viresh.kumar@linaro.org> |
nohz: Avoid tick's double reprogramming in highres mode
In highres mode, the tick reschedules itself unconditionally to the next jiffies.
However while this clock reprogramming is relevant when the
nohz: Avoid tick's double reprogramming in highres mode
In highres mode, the tick reschedules itself unconditionally to the next jiffies.
However while this clock reprogramming is relevant when the tick is in periodic mode, it's not that interesting when we run in dynticks mode because irq exit is likely going to overwrite the next tick to some randomly deferred future.
So lets just get rid of this tick self rescheduling in dynticks mode. This way we can avoid some clockevents double write in favourable scenarios like when we stop the tick completely in idle while no other hrtimer is pending.
Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
#
b5e995e6 |
| 12-Jun-2014 |
Viresh Kumar <viresh.kumar@linaro.org> |
nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the no
nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible.
Now in the best dynticks case, when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. What we want in this case is to cancel the next tick programmed by the previous one. That's what we do in high-res mode. OTOH we lack a low-res mode equivalent of hrtimer_cancel() so we simply don't do anything in this case and the next tick remains scheduled to jiffies + 1.
As a result, in low-res mode, when the dynticks code determines that no tick is needed in the future, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the previous tick handler setting to defer the next tick.
We are fixing this by introducing the ONESHOT_STOPPED mode which will let us pause a clockevent when no further interrupt is needed. Meanwhile we can't expect all drivers to support this new mode.
So lets reduce much of the symptoms by skipping the nohz-blind tick rescheduling from the tick-handler when the CPU is in dynticks mode. That tick rescheduling wrongly assumed periodicity and the low-res dynticks code can't cancel such decision. This breaks the recursive (and thus the worst) part of the problem. In the worst case now, we'll get only one extra tick due to uncancelled tick scheduled before we entered dynticks mode.
This also removes a needless clockevent write on idle ticks. Since those clock write are usually considered to be slow, it's a general win.
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
Revision tags: v3.15 |
|
#
c0f489d2 |
| 04-Jun-2014 |
Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
Binding the grace-period kthreads to the timekeeping CPU resulted in significant performance decreases for some workloads. For more detail, se
rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
Binding the grace-period kthreads to the timekeeping CPU resulted in significant performance decreases for some workloads. For more detail, see:
https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
https://lkml.org/lkml/2014/6/4/218 for CPU statistics
It turns out that it is necessary to bind the grace-period kthreads to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other. In other cases, it suffices to bind the grace-period kthreads to the set of non-nohz_full CPUs.
This commit therefore creates a tick_nohz_not_full_mask that is the complement of tick_nohz_full_mask, and then binds the grace-period kthread to the set of CPUs indicated by this new mask, which covers the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y case still binds the grace-period kthreads to the timekeeping CPU. This commit also includes the tick_nohz_full_enabled() check suggested by Frederic Weisbecker.
Reported-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Created housekeeping_affine() and housekeeping_mask per fweisbec feedback. ]
show more ...
|
#
3d36aebc |
| 04-Jun-2014 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Support nohz full remote kick
Remotely kicking a full nohz CPU in order to make it re-evaluate its next tick is currently implemented using the scheduler IPI.
However this bloats a scheduler
nohz: Support nohz full remote kick
Remotely kicking a full nohz CPU in order to make it re-evaluate its next tick is currently implemented using the scheduler IPI.
However this bloats a scheduler fast path with an off-topic feature. The scheduler tick was abused here for its cool "callable anywhere/anytime" properties.
But now that the irq work subsystem can queue remote callbacks, it's a perfect fit to safely queue IPIs when interrupts are disabled without worrying about concurrent callers.
So lets implement remote kick on top of irq work. This is going to be used when a new event requires the next tick to be recalculated: more than 1 task competing on the CPU, timer armed, ...
Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
show more ...
|
Revision tags: v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2 |
|
#
27630532 |
| 15-Apr-2014 |
Viresh Kumar <viresh.kumar@linaro.org> |
tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz enabled) the tick_nohz_switch_to_nohz() function returns because
tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz enabled) the tick_nohz_switch_to_nohz() function returns because it checks for the tick_nohz_active flag. This can't be set, because the function itself sets it.
Undo the change in tick_nohz_switch_to_nohz().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Cc: <stable@vger.kernel.org> # 3.13+ Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
03e6bdc5 |
| 15-Apr-2014 |
Viresh Kumar <viresh.kumar@linaro.org> |
tick-sched: Don't call update_wall_time() when delta is lesser than tick_period
In tick_do_update_jiffies64() we are processing ticks only if delta is greater than tick_period. This is what we are s
tick-sched: Don't call update_wall_time() when delta is lesser than tick_period
In tick_do_update_jiffies64() we are processing ticks only if delta is greater than tick_period. This is what we are supposed to do here and it broke a bit with this patch:
commit 47a1b796 (tick/timekeeping: Call update_wall_time outside the jiffies lock)
With above patch, we might end up calling update_wall_time() even if delta is found to be smaller that tick_period. Fix this by returning when the delta is less than tick period.
[ tglx: Made it a 3 liner and massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Cc: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> # v3.14+ Link: http://lkml.kernel.org/r/80afb18a494b0bd9710975bcc4de134ae323c74f.1397537987.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|