Revision tags: v3.9-rc8 |
|
#
cb41a290 |
| 20-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Add basic tracing
It's not obvious to find out why the full dynticks subsystem doesn't always stop the tick: whether this is due to kthreads, posix timers, perf events, etc...
These new trace
nohz: Add basic tracing
It's not obvious to find out why the full dynticks subsystem doesn't always stop the tick: whether this is due to kthreads, posix timers, perf events, etc...
These new tracepoints are here to help the user diagnose the failures and test this feature.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
99e5ada9 |
| 20-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Re-evaluate the tick for the new task after a context switch
When a task is scheduled in, it may have some properties of its own that could make the CPU reconsider the need for the tick: posix
nohz: Re-evaluate the tick for the new task after a context switch
When a task is scheduled in, it may have some properties of its own that could make the CPU reconsider the need for the tick: posix cpu timers, perf events, ...
So notify the full dynticks subsystem when a task gets scheduled in and re-check the tick dependency at this stage. This is done through a self IPI to avoid messing up with any current lock scenario.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
5811d996 |
| 20-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Prepare to stop the tick on irq exit
Interrupt exit is a natural place to stop the tick: it happens after all events happening before and during the irq which are liable to update the dependen
nohz: Prepare to stop the tick on irq exit
Interrupt exit is a natural place to stop the tick: it happens after all events happening before and during the irq which are liable to update the dependency on the tick occured. Also it makes sure that any check on tick dependency is well ordered against dynticks kick IPIs.
Bring in the infrastructure that performs the tick dependency checks on irq exit and shut it down if these checks show that we can do it safely.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
9014c45d |
| 20-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Implement full dynticks kick
Implement the full dynticks kick that is performed from IPIs sent by various subsystems (scheduler, posix timers, ...) when they want to notify about a new event t
nohz: Implement full dynticks kick
Implement the full dynticks kick that is performed from IPIs sent by various subsystems (scheduler, posix timers, ...) when they want to notify about a new event that may reconsider the dependency on the tick.
Most of the time, such an event end up restarting the tick.
(Part of the design with subsystems providing *_can_stop_tick() helpers suggested by Peter Zijlstra a while ago).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
ff442c51 |
| 20-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Re-evaluate the tick from the scheduler IPI
The scheduler IPI is used by the scheduler to kick full dynticks CPUs asynchronously when more than one task are running or when a new timer list ti
nohz: Re-evaluate the tick from the scheduler IPI
The scheduler IPI is used by the scheduler to kick full dynticks CPUs asynchronously when more than one task are running or when a new timer list timer is enqueued. This way the destination CPU can decide to restart the tick to handle this new situation.
Now let's call that kick in the scheduler IPI.
(Reusing the scheduler IPI rather than implementing a new IPI was suggested by Peter Zijlstra a while ago)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v3.9-rc7, v3.9-rc6, v3.9-rc5 |
|
#
f98823ac |
| 26-Mar-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: New option to default all CPUs in full dynticks range
Provide a new kernel config that defaults all CPUs to be part of the full dynticks range, except the boot one for timekeeping.
This defau
nohz: New option to default all CPUs in full dynticks range
Provide a new kernel config that defaults all CPUs to be part of the full dynticks range, except the boot one for timekeeping.
This default setting is overriden by the nohz_full= boot option if passed by the user.
This is helpful for those who don't need a finegrained range of full dynticks CPU and also for automated testing.
Suggested-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
d1e43fa5 |
| 26-Mar-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Ensure full dynticks CPUs are RCU nocbs
We need full dynticks CPU to also be RCU nocb so that we don't have to keep the tick to handle RCU callbacks.
Make sure the range passed to nohz_full=
nohz: Ensure full dynticks CPUs are RCU nocbs
We need full dynticks CPU to also be RCU nocb so that we don't have to keep the tick to handle RCU callbacks.
Make sure the range passed to nohz_full= boot parameter is a subset of rcu_nocbs=
The CPUs that fail to meet this requirement will be excluded from the nohz_full range. This is checked early in boot time, before any CPU has the opportunity to stop its tick.
Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
0453b435 |
| 26-Mar-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Force boot CPU outside full dynticks range
The timekeeping job must be able to run early on boot because there may be some pre-SMP (and thus pre-initcalls ) components that rely on it. The IO-
nohz: Force boot CPU outside full dynticks range
The timekeeping job must be able to run early on boot because there may be some pre-SMP (and thus pre-initcalls ) components that rely on it. The IO-APIC is one such users as it tests the timer health by watching jiffies progression.
Given that it happens before we know the initial online set, we can't rely on it to select a timekeeper. We need one before SMP time otherwise we simply crash on boot.
To fix this and keep things simple for now, force the boot CPU outside of the full dynticks range in any case and do this early on kernel parameter parsing time.
We might want a trickier solution later, expecially for aSMP architectures that need to assign housekeeping tasks to arbitrary low power CPUs.
But it's still first pass KISS time for now.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
76c24fb0 |
| 17-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: New APIs to re-evaluate the tick on full dynticks CPUs
Provide two new helpers in order to notify the full dynticks CPUs about some internal system changes against which they may reconsider th
nohz: New APIs to re-evaluate the tick on full dynticks CPUs
Provide two new helpers in order to notify the full dynticks CPUs about some internal system changes against which they may reconsider the state of their tick. Some practical examples include: posix cpu timers, perf tick and sched clock tick.
For now the notifying handler, implemented through IPIs, is a stub that will be implemented when we get the tick stop/restart infrastructure in.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
c5bfece2 |
| 12-Apr-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Switch from "extended nohz" to "full nohz" based naming
"Extended nohz" was used as a naming base for the full dynticks API and Kconfig symbols. It reflects the fact the system tries to stop t
nohz: Switch from "extended nohz" to "full nohz" based naming
"Extended nohz" was used as a naming base for the full dynticks API and Kconfig symbols. It reflects the fact the system tries to stop the tick in more places than just idle.
But that "extended" name is a bit opaque and vague. Rename it to "full" makes it clearer what the system tries to do under this config: try to shutdown the tick anytime it can. The various constraints that prevent that to happen shouldn't be considered as fundamental properties of this feature but rather technical issues that may be solved in the future.
Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
1034fc2f |
| 26-Mar-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Print final full dynticks CPUs range on boot
Given that we apply a few restrictions on the full dynticks CPUs range (keep an online timekeeper oustide the range, then in the future have the ra
nohz: Print final full dynticks CPUs range on boot
Given that we apply a few restrictions on the full dynticks CPUs range (keep an online timekeeper oustide the range, then in the future have the range be an RCU nocb CPUs subset), let's print the final resulting range of full dynticks CPUs to the user so that he knows what's really going to run.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
Revision tags: v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1, v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7, v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2 |
|
#
3451d024 |
| 10-Aug-2011 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON
We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implemen
nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON
We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implementations: constant periodic tick, idle dynticks, full dynticks.
As this implies a mutual exclusion, the two dynticks implementions need to converge on the selection of a common Kconfig option in order to ease the sharing of a common infrastructure.
It would thus seem pretty natural to reuse CONFIG_NO_HZ to that end. It already implements all the idle dynticks code and the full dynticks depends on all that code for now. So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
On the other hand we want to stay backward compatible: if CONFIG_NO_HZ is set in an older config file, we want to enable CONFIG_NO_HZ_IDLE by default.
But we can't afford both at the same time or we run into a circular dependency:
1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select CONFIG_NO_HZ 2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
We might be able to support that from Kconfig/Kbuild but it may not be wise to introduce such a confusing behaviour.
So to solve this, create a new CONFIG_NO_HZ_COMMON option which gathers the common code between idle and full dynticks (that common code for now is simply the idle dynticks code) and select it from their referring Kconfig.
Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ to it for backward compatibility.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
cfea7d7e |
| 08-Feb-2013 |
Rado Vrbovsky <rvrbovsk@redhat.com> |
tick: Change log level of NOHZ: local_softirq_pending message
The "NOHZ: local_softirq_pending" message is a largely informational message. This makes extra work for customers that have a policy of
tick: Change log level of NOHZ: local_softirq_pending message
The "NOHZ: local_softirq_pending" message is a largely informational message. This makes extra work for customers that have a policy of investigating all kernel log messages logged at <= KERN_ERR log level. This patch sets the message to a different log level.
[ tglx: Use pr_warn() ]
Signed-off-by: Rado Vrbovsky <rvrbovsk@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/2037057938.893524.1360345050772.JavaMail.root@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
a382bf93 |
| 18-Dec-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Assign timekeeping duty to a CPU outside the full dynticks range
This way the full nohz CPUs can safely run with the tick stopped with a guarantee that somebody else is taking care of the jiff
nohz: Assign timekeeping duty to a CPU outside the full dynticks range
This way the full nohz CPUs can safely run with the tick stopped with a guarantee that somebody else is taking care of the jiffies and GTOD progression.
Once the duty is attributed to a CPU, it won't change. Also that CPU can't enter into dyntick idle mode or be hot unplugged.
This may later be improved from a power consumption POV. At least we should be able to share the duty amongst all CPUs outside the full dynticks range. Then the duty could even be shared with full dynticks CPUs when those can't stop their tick for any reason.
But let's start with that very simple approach first.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> [fix have_nohz_full_mask offcase] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
show more ...
|
#
a831881b |
| 18-Dec-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Basic full dynticks interface
For extreme usecases such as Real Time or HPC, having the ability to shutdown the tick when a single task runs on a CPU is a desired feature:
* Reducing the amou
nohz: Basic full dynticks interface
For extreme usecases such as Real Time or HPC, having the ability to shutdown the tick when a single task runs on a CPU is a desired feature:
* Reducing the amount of interrupts improves throughput for CPU-bound tasks. The CPU is less distracted from its real job, from an execution time and from the cache point of views.
* This also improve latency response as we have less critical sections.
Start with introducing a very simple interface to define full dynticks CPU: use a boot time option defined cpumask through the "nohz_extended=" kernel parameter. CPUs that are part of this range will have their tick shutdown whenever possible: provided they run a single task and they don't do kernel activity that require the periodic tick. These details will be later documented in Documentation/*
An online CPU must be kept outside this range to handle the timekeeping.
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
af7bdbaf |
| 21-Feb-2013 |
Thomas Gleixner <tglx@linutronix.de> |
Revert "nohz: Make tick_nohz_irq_exit() irq safe"
This reverts commit 351429b2e62b6545bb10c756686393f29ba268a1. The extra local_irq_save() is not longer needed as the call site now always calls with
Revert "nohz: Make tick_nohz_irq_exit() irq safe"
This reverts commit 351429b2e62b6545bb10c756686393f29ba268a1. The extra local_irq_save() is not longer needed as the call site now always calls with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org>
show more ...
|
#
e5ab012c |
| 20-Feb-2013 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Make tick_nohz_irq_exit() irq safe
As it stands, irq_exit() may or may not be called with irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED that the arch can define.
It makes tick_noh
nohz: Make tick_nohz_irq_exit() irq safe
As it stands, irq_exit() may or may not be called with irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED that the arch can define.
It makes tick_nohz_irq_exit() unsafe. For example two interrupts can race in tick_nohz_stop_sched_tick(): the inner most one computes the expiring time on top of the timer list, then it's interrupted right before reprogramming the clock. The new interrupt enqueues a new timer list timer, it reprogram the clock to take it into account and it exits. The CPUs resumes the inner most interrupt and performs the clock reprogramming without considering the new timer list timer.
This regression has been introduced by: 280f06774afedf849f0b34248ed6aff57d0f6908 ("nohz: Separate out irq exit and idle loop dyntick logic")
Let's fix it right now with the appropriate protections.
A saner long term solution will be to remove __ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called with interrupts disabled.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: <stable@vger.kernel.org> #v3.2+ Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
3f4724ea |
| 16-Jul-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
cputime: Allow dynamic switch between tick/virtual based cputime accounting
Allow to dynamically switch between tick and virtual based cputime accounting. This way we can provide a kind of "on-deman
cputime: Allow dynamic switch between tick/virtual based cputime accounting
Allow to dynamically switch between tick and virtual based cputime accounting. This way we can provide a kind of "on-demand" virtual based cputime accounting. In this mode, the kernel relies on the context tracking subsystem to dynamically probe on kernel boundaries.
This is in preparation for being able to stop the timer tick in more places than just the idle state. Doing so will depend on CONFIG_VIRT_CPU_ACCOUNTING_GEN which makes it possible to account the cputime without the tick by hooking on kernel/user boundaries.
Depending whether the tick is stopped or not, we can switch between tick and vtime based accounting anytime in order to minimize the overhead associated to user hooks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
show more ...
|
#
4dbd2771 |
| 04-Jan-2013 |
Jacob Pan <jacob.jun.pan@linux.intel.com> |
tick: export nohz tick idle symbols for module use
Allow drivers such as intel_powerclamp to use these apis for turning on/off ticks during idle.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel
tick: export nohz tick idle symbols for module use
Allow drivers such as intel_powerclamp to use these apis for turning on/off ticks during idle.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
show more ...
|
#
74876a98 |
| 12-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
printk: Wake up klogd using irq_work
klogd is woken up asynchronously from the tick in order to do it safely.
However if printk is called when the tick is stopped, the reader won't be woken up unti
printk: Wake up klogd using irq_work
klogd is woken up asynchronously from the tick in order to do it safely.
However if printk is called when the tick is stopped, the reader won't be woken up until the next interrupt, which might not fire for a while. As a result, the user may miss some message.
To fix this, lets implement the printk tick using a lazy irq work. This subsystem takes care of the timer tick state and can fix up accordingly.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
show more ...
|
#
00b42959 |
| 07-Nov-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
irq_work: Don't stop the tick with pending works
Don't stop the tick if we have pending irq works on the queue, otherwise if the arch can't raise self-IPIs, we may not find an opportunity to execute
irq_work: Don't stop the tick with pending works
Don't stop the tick if we have pending irq works on the queue, otherwise if the arch can't raise self-IPIs, we may not find an opportunity to execute the pending works for a while.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
show more ...
|
#
33a5f626 |
| 11-Oct-2012 |
Frederic Weisbecker <fweisbec@gmail.com> |
nohz: Add API to check tick state
We need some quick way to check if the CPU has stopped its tick. This will be useful to implement the printk tick using the irq work subsystem.
Signed-off-by: Fred
nohz: Add API to check tick state
We need some quick way to check if the CPU has stopped its tick. This will be useful to implement the printk tick using the irq work subsystem.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
show more ...
|
#
69a37bea |
| 26-Oct-2012 |
Youquan Song <youquan.song@intel.com> |
cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state tha
cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving.
cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time.
There is a real case that turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early. turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat, following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%.
In the patch, a timer is added when menu governor detects a repeat mode and choose a shallow C-state. The timer is set to a time out value that greater than predicted time, and we conclude repeat mode prediction failure if timer is triggered. When repeat mode happens as expected, the timer is not triggered and CPU waken up from C-states and it will cancel the timer initiatively. When repeat mode does not happen, the timer will be time out and menu governor will quickly notice that the repeat mode prediction fails and then re-evaluates deeper C-states possibility.
Below is another case which will clearly show the patch much benefit:
#include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/time.h> #include <time.h> #include <pthread.h>
volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8;
void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); }
void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1;
if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(1000000); idle_num = 0; } idle_num++; }
}
static void sighand(int sig) { *shutdown = 1; }
int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024];
static char optstr[] = "n:l:t:h:";
while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); }
printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0;
sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand);
for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL);
for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL);
exit(0); }
Get powertop V2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop
We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has chance to sleep 1 second in deep C-state.
While after patched the kernel, we find that deep C-state will keep >99.6%.
Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
d6ad4187 |
| 28-Feb-2012 |
John Stultz <john.stultz@linaro.org> |
time: Kill xtime_lock, replacing it with jiffies_lock
Now that timekeeping is protected by its own locks, rename the xtime_lock to jifffies_lock to better describe what it protects.
CC: Thomas Glei
time: Kill xtime_lock, replacing it with jiffies_lock
Now that timekeeping is protected by its own locks, rename the xtime_lock to jifffies_lock to better describe what it protects.
CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
show more ...
|
#
b8f61116 |
| 24-Oct-2012 |
Chuansheng Liu <chuansheng.liu@intel.com> |
tick: Correct the comments for tick_sched_timer()
In the comments of function tick_sched_timer(), the sentence "timer->base->cpu_base->lock held" is not right.
In function __run_hrtimer(), before c
tick: Correct the comments for tick_sched_timer()
In the comments of function tick_sched_timer(), the sentence "timer->base->cpu_base->lock held" is not right.
In function __run_hrtimer(), before call timer->function(), the cpu_base->lock has been unlocked.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Cc: fei.li@intel.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1351098455.15558.1421.camel@cliu38-desktop-build Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
show more ...
|