Revision tags: v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1 |
|
#
51d7d520 |
| 07-Aug-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Add smp_mb() to arch_spin_is_locked()
The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked.
Using spin_is_locked() on a lock you d
powerpc: Add smp_mb() to arch_spin_is_locked()
The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked.
Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time.
There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update.
Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic value:
The first CPU does:
spin_lock(*A)
if spin_is_locked(*B) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r
spin_unlock(*A)
And the second CPU does:
spin_lock(*B)
if spin_is_locked(*A) # nothing else smp_mb() LOAD r = *COUNTER r++ STORE *COUNTER = r
spin_unlock(*B)
Although this is a strange locking construct, it should work.
It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked().
For now we assume spin_is_locked() is implemented as below, and we break it out in our examples:
bool spin_is_locked(*LOCK) { LOAD l = *LOCK return l.locked }
Our intuition is that there should be no problem even if the two code sequences run simultaneously such as:
CPU 0 CPU 1 ================================================== spin_lock(*A) spin_lock(*B) LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B)
If one CPU gets the lock before the other then it will do the update and the other CPU will back off:
CPU 0 CPU 1 ================================================== spin_lock(*A) LOAD b = *B spin_lock(*B) if b.locked # false LOAD a = *A else if a.locked # true smp_mb() # nothing LOAD r1 = *COUNTER spin_unlock(*B) r1++ STORE *COUNTER = r1 spin_unlock(*A)
However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional.
Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(*LOCK) { LOAD l = *LOCK l.locked = true STORE *LOCK = l ACQUIRE_BARRIER }
The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt:
This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system.
On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1".
As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of *LOCK.
Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future.
What this means in practice is that the following can occur:
CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true LOAD b = *B LOAD a = *A STORE *A = a STORE *B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = *COUNTER LOAD r2 = *COUNTER r1++ r2++ STORE *COUNTER = r1 STORE *COUNTER = r2 # Lost update spin_unlock(*A) spin_unlock(*B)
That is, the load of *B can occur prior to the store that makes *A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked.
The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us:
bool spin_is_locked(*LOCK) { smp_mb() LOAD l = *LOCK return l.locked }
The new barrier orders the store to the lock we are locking vs the load of the other lock:
CPU 0 CPU 1 ================================================== LOAD a = *A LOAD b = *B a.locked = true b.locked = true STORE *A = a STORE *B = b smp_mb() smp_mb() LOAD b = *B LOAD a = *A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(*A) spin_unlock(*B)
Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
Revision tags: v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1, v3.13 |
|
#
7179ba52 |
| 15-Jan-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
At a glance these are just the inverse of each other. The one subtlety is that arch_spin_value_unlocked() takes the lock by
powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()
At a glance these are just the inverse of each other. The one subtlety is that arch_spin_value_unlocked() takes the lock by value, rather than as a pointer, which is important for the lockref code.
On the other hand arch_spin_is_locked() doesn't really care, so implement it in terms of arch_spin_value_unlocked().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
#
3405d230 |
| 15-Jan-2014 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Add support for the optimised lockref implementation
This commit adds the architecture support required to enable the optimised implementation of lockrefs.
That's as simple as defining arc
powerpc: Add support for the optimised lockref implementation
This commit adds the architecture support required to enable the optimised implementation of lockrefs.
That's as simple as defining arch_spin_value_unlocked() and selecting the Kconfig option.
We also define cmpxchg64_relaxed(), because the lockref code does not need the cmpxchg to have barrier semantics.
Using Linus' test case[1] on one system I see a 4x improvement for the basic enablement, and a further 1.3x for cmpxchg64_relaxed(), for a total of 5.3x vs the baseline.
On another system I see more like 2x improvement.
[1]: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
Revision tags: v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4 |
|
#
919fc6e3 |
| 11-Dec-2013 |
Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
powerpc: Full barrier for smp_mb__after_unlock_lock()
The powerpc lock acquisition sequence is as follows:
lwarx; cmpwi; bne; stwcx.; lwsync;
Lock release is as follows:
lwsync; stw;
If CPU 0
powerpc: Full barrier for smp_mb__after_unlock_lock()
The powerpc lock acquisition sequence is as follows:
lwarx; cmpwi; bne; stwcx.; lwsync;
Lock release is as follows:
lwsync; stw;
If CPU 0 does a store (say, x=1) then a lock release, and CPU 1 does a lock acquisition then a load (say, r1=y), then there is no guarantee of a full memory barrier between the store to 'x' and the load from 'y'. To see this, suppose that CPUs 0 and 1 are hardware threads in the same core that share a store buffer, and that CPU 2 is in some other core, and that CPU 2 does the following:
y = 1; sync; r2 = x;
If 'x' and 'y' are both initially zero, then the lock acquisition and release sequences above can result in r1 and r2 both being equal to zero, which could not happen if unlock+lock was a full barrier.
This commit therefore makes powerpc's smp_mb__after_unlock_lock() be a full barrier.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-8-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5 |
|
#
54bb7f4b |
| 06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Make rwlocks endian safe
Our ppc64 spinlocks and rwlocks use a trick where a lock token and the paca index are placed in the lock with a single store. Since we are using two u16s they need
powerpc: Make rwlocks endian safe
Our ppc64 spinlocks and rwlocks use a trick where a lock token and the paca index are placed in the lock with a single store. Since we are using two u16s they need adjusting for little endian.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
#
f13c13a0 |
| 06-Aug-2013 |
Anton Blanchard <anton@samba.org> |
powerpc: Stop using non-architected shared_proc field in lppaca
Although the shared_proc field in the lppaca works today, it is not architected. A shared processor partition will always have a non z
powerpc: Stop using non-architected shared_proc field in lppaca
Although the shared_proc field in the lppaca works today, it is not architected. A shared processor partition will always have a non zero yield_count so use that instead. Create a wrapper so users don't have to know about the details.
In order for older kernels to continue to work on KVM we need to set the shared_proc bit. While here, remove the ugly bitfield.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
Revision tags: v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1, v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5 |
|
#
94c95cfb |
| 24-Jan-2013 |
Li Zhong <zhong@linux.vnet.ibm.com> |
powerpc: Avoid debug_smp_processor_id() check in SHARED_PROCESSOR
Use local_paca directly in macro SHARED_PROCESSOR, as all processors have the same value for the field shared_proc, so we don't need
powerpc: Avoid debug_smp_processor_id() check in SHARED_PROCESSOR
Use local_paca directly in macro SHARED_PROCESSOR, as all processors have the same value for the field shared_proc, so we don't need care racy here.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
Revision tags: v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3 |
|
#
1b041885 |
| 15-Mar-2012 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
Revision tags: v3.3-rc7, v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1, v2.6.39, v2.6.39-rc7, v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3, v2.6.39-rc2, v2.6.39-rc1, v2.6.38, v2.6.38-rc8, v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3, v2.6.38-rc2, v2.6.38-rc1, v2.6.37, v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6, v2.6.37-rc5, v2.6.37-rc4, v2.6.37-rc3, v2.6.37-rc2, v2.6.37-rc1, v2.6.36, v2.6.36-rc8, v2.6.36-rc7, v2.6.36-rc6, v2.6.36-rc5, v2.6.36-rc4, v2.6.36-rc3, v2.6.36-rc2, v2.6.36-rc1, v2.6.35, v2.6.35-rc6, v2.6.35-rc5, v2.6.35-rc4, v2.6.35-rc3, v2.6.35-rc2, v2.6.35-rc1, v2.6.34, v2.6.34-rc7, v2.6.34-rc6, v2.6.34-rc5, v2.6.34-rc4, v2.6.34-rc3, v2.6.34-rc2, v2.6.34-rc1, v2.6.33, v2.6.33-rc8 |
|
#
f10e2e5b |
| 09-Feb-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Rename LWSYNC_ON_SMP to PPC_RELEASE_BARRIER, ISYNC_ON_SMP to PPC_ACQUIRE_BARRIER
For performance reasons we are about to change ISYNC_ON_SMP to sometimes be lwsync. Now that the macro name
powerpc: Rename LWSYNC_ON_SMP to PPC_RELEASE_BARRIER, ISYNC_ON_SMP to PPC_ACQUIRE_BARRIER
For performance reasons we are about to change ISYNC_ON_SMP to sometimes be lwsync. Now that the macro name doesn't make sense, change it and LWSYNC_ON_SMP to better explain what the barriers are doing.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
#
4e14a4d1 |
| 09-Feb-2010 |
Anton Blanchard <anton@samba.org> |
powerpc: Use lwarx hint in spinlocks
Recent versions of the PowerPC architecture added a hint bit to the larx instructions to differentiate between an atomic operation and a lock operation:
> 0 Oth
powerpc: Use lwarx hint in spinlocks
Recent versions of the PowerPC architecture added a hint bit to the larx instructions to differentiate between an atomic operation and a lock operation:
> 0 Other programs might attempt to modify the word in storage addressed by EA > even if the subsequent Store Conditional succeeds. > > 1 Other programs will not attempt to modify the word in storage addressed by > EA until the program that has acquired the lock performs a subsequent store > releasing the lock.
To avoid a binutils dependency this patch create macros for the extended lwarx format and uses it in the spinlock code. To test this change I used a simple test case that acquires and releases a global pthread mutex:
pthread_mutex_lock(&mutex); pthread_mutex_unlock(&mutex);
On a 32 core POWER6, running 32 test threads we spend almost all our time in the futex spinlock code:
94.37% perf [kernel] [k] ._raw_spin_lock | |--99.95%-- ._raw_spin_lock | | | |--63.29%-- .futex_wake | | | |--36.64%-- .futex_wait_setup
Which is a good test for this patch. The results (in lock/unlock operations per second) are:
before: 1538203 ops/sec after: 2189219 ops/sec
An improvement of 42%
A 32 core POWER7 improves even more:
before: 1279529 ops/sec after: 2282076 ops/sec
An improvement of 78%
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
show more ...
|
Revision tags: v2.6.33-rc7, v2.6.33-rc6, v2.6.33-rc5, v2.6.33-rc4, v2.6.33-rc3, v2.6.33-rc2, v2.6.33-rc1 |
|
#
e5931943 |
| 03-Dec-2009 |
Thomas Gleixner <tglx@linutronix.de> |
locking: Convert raw_rwlock functions to arch_rwlock
Name space cleanup for rwlock functions. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <pet
locking: Convert raw_rwlock functions to arch_rwlock
Name space cleanup for rwlock functions. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
show more ...
|
#
fb3a6bbc |
| 03-Dec-2009 |
Thomas Gleixner <tglx@linutronix.de> |
locking: Convert raw_rwlock to arch_rwlock
Not strictly necessary for -rt as -rt does not have non sleeping rwlocks, but it's odd to not have a consistent naming convention.
No functional change.
locking: Convert raw_rwlock to arch_rwlock
Not strictly necessary for -rt as -rt does not have non sleeping rwlocks, but it's odd to not have a consistent naming convention.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
show more ...
|
Revision tags: v2.6.32 |
|
#
0199c4e6 |
| 02-Dec-2009 |
Thomas Gleixner <tglx@linutronix.de> |
locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Ac
locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
show more ...
|
#
445c8951 |
| 02-Dec-2009 |
Thomas Gleixner <tglx@linutronix.de> |
locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks
locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
show more ...
|
Revision tags: v2.6.32-rc8, v2.6.32-rc7, v2.6.32-rc6, v2.6.32-rc5, v2.6.32-rc4, v2.6.32-rc3, v2.6.32-rc1, v2.6.32-rc2, v2.6.31, v2.6.31-rc9 |
|
#
8307a980 |
| 31-Aug-2009 |
Heiko Carstens <heiko.carstens@de.ibm.com> |
locking, powerpc: Rename __spin_try_lock() and friends
Needed to avoid namespace conflicts when the common code function bodies of _spin_try_lock() etc. are moved to a header file where the function
locking, powerpc: Rename __spin_try_lock() and friends
Needed to avoid namespace conflicts when the common code function bodies of _spin_try_lock() etc. are moved to a header file where the function name would be __spin_try_lock().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Horst Hartmann <horsth@linux.vnet.ibm.com> Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <20090831124415.918799705@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
show more ...
|
Revision tags: v2.6.31-rc8, v2.6.31-rc7, v2.6.31-rc6, v2.6.31-rc5, v2.6.31-rc4, v2.6.31-rc3, v2.6.31-rc2, v2.6.31-rc1, v2.6.30, v2.6.30-rc8, v2.6.30-rc7, v2.6.30-rc6, v2.6.30-rc5, v2.6.30-rc4, v2.6.30-rc3, v2.6.30-rc2, v2.6.30-rc1 |
|
#
f5f7eac4 |
| 02-Apr-2009 |
Robin Holt <holt@sgi.com> |
Allow rwlocks to re-enable interrupts
Pass the original flags to rwlock arch-code, so that it can re-enable interrupts if implemented for that architecture.
Initially, make __raw_read_lock_flags an
Allow rwlocks to re-enable interrupts
Pass the original flags to rwlock arch-code, so that it can re-enable interrupts if implemented for that architecture.
Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs which just do the same thing as non-flags variants.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v2.6.29, v2.6.29-rc8, v2.6.29-rc7, v2.6.29-rc6, v2.6.29-rc5, v2.6.29-rc4, v2.6.29-rc3, v2.6.29-rc2, v2.6.29-rc1, v2.6.28, v2.6.28-rc9, v2.6.28-rc8, v2.6.28-rc7, v2.6.28-rc6, v2.6.28-rc5, v2.6.28-rc4 |
|
#
efc3624c |
| 05-Nov-2008 |
Paul Mackerras <paulus@samba.org> |
powerpc: Tell gcc when we clobber the carry in inline asm
We have several instances of inline assembly code that use the addic or addic. instructions, but don't include XER in the list of clobbers.
powerpc: Tell gcc when we clobber the carry in inline asm
We have several instances of inline assembly code that use the addic or addic. instructions, but don't include XER in the list of clobbers. The addic and addic. instructions affect the carry bit, which is in the XER register.
This adds "xer" to the list of clobbers for those inline asm statements that use addic or addic. and didn't already have it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
show more ...
|
Revision tags: v2.6.28-rc3, v2.6.28-rc2, v2.6.28-rc1, v2.6.27, v2.6.27-rc9, v2.6.27-rc8, v2.6.27-rc7, v2.6.27-rc6, v2.6.27-rc5, v2.6.27-rc4, v2.6.27-rc3, v2.6.27-rc2 |
|
#
b8b572e1 |
| 01-Aug-2008 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: Move include files to arch/powerpc/include/asm
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm
Foll
powerpc: Move include files to arch/powerpc/include/asm
from include/asm-powerpc. This is the result of a
mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm
Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
show more ...
|
Revision tags: v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54 |
|
#
20c0e826 |
| 24-Jul-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to kick and wait. This uses an un-directed yi
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to kick and wait. This uses an un-directed yield to any CPU rather than the directed yield to a pre-empted lock holder that paravirtualised simple spinlocks use, that requires no kick hcall. This is something that could be investigated and improved in future. Performance results can be found in the commit which added queued spinlocks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
show more ...
|
#
aa65ff6b |
| 24-Jul-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Implement queued spinlocks and rwlocks These have shown significantly improved performance and fairness when spinlock contention is moderate to high on very large systems.
powerpc/64s: Implement queued spinlocks and rwlocks These have shown significantly improved performance and fairness when spinlock contention is moderate to high on very large systems. With this series including subsequent patches, on a 16 socket 1536 thread POWER9, a stress test such as same-file open/close from all CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs 384158op/s (33x faster), where the difference in throughput between the fastest and slowest thread goes from 7x to 1.4x. Thanks to the fast path being identical in terms of atomics and barriers (after a subsequent optimisation patch), single threaded performance is not changed (no measurable difference). On smaller systems, performance and fairness seems to be generally improved. Using dbench on tmpfs as a test (that starts to run into kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was tested with bare metal and KVM guest configurations. Results can be found here: https://github.com/linuxppc/issues/issues/305#issuecomment-663487453 Observations are: - Queued spinlocks are equal when contention is insignificant, as expected and as measured with microbenchmarks. - When there is contention, on bare metal queued spinlocks have better throughput and max latency at all points. - When virtualised, queued spinlocks are slightly worse approaching peak throughput, but significantly better throughput and max latency at all points beyond peak, until queued spinlock maximum latency rises when clients are 2x vCPUs. The regressions haven't been analysed very well yet, there are a lot of things that can be tuned, particularly the paravirtualised locking, but the numbers already look like a good net win even on relatively small systems. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131423.1362108-4-npiggin@gmail.com
show more ...
|
#
12d0b9d6 |
| 24-Jul-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc: Move spinlock implementation to simple_spinlock To prepare for queued spinlocks. This is a simple rename except to update preprocessor guard name and a file reference.
powerpc: Move spinlock implementation to simple_spinlock To prepare for queued spinlocks. This is a simple rename except to update preprocessor guard name and a file reference. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131423.1362108-3-npiggin@gmail.com
show more ...
|
#
20d444d0 |
| 24-Jul-2020 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/pseries: Move some PAPR paravirt functions to their own file These functions will be used by the queued spinlock implementation, and may be useful elsewhere too, so move them out
powerpc/pseries: Move some PAPR paravirt functions to their own file These functions will be used by the queued spinlock implementation, and may be useful elsewhere too, so move them out of spinlock.h. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131423.1362108-2-npiggin@gmail.com
show more ...
|
Revision tags: v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43 |
|
#
455531e9 |
| 21-May-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: Remove IBM405 Erratum #77 This erratum is dedicated to IBM 405GP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christo
powerpc: Remove IBM405 Erratum #77 This erratum is dedicated to IBM 405GP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/44dbc08e9034681eb28324cbabc086e97044c36c.1590079969.git.christophe.leroy@csgroup.eu
show more ...
|
Revision tags: v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7 |
|
#
6da3eced |
| 23-Dec-2019 |
Jason A. Donenfeld <Jason@zx2c4.com> |
powerpc/spinlocks: Include correct header for static key Recently, the spinlock implementation grew a static key optimization, but the jump_label.h header include was left out, leading t
powerpc/spinlocks: Include correct header for static key Recently, the spinlock implementation grew a static key optimization, but the jump_label.h header include was left out, leading to build errors: linux/arch/powerpc/include/asm/spinlock.h:44:7: error: implicit declaration of function ‘static_branch_unlikely’ 44 | if (!static_branch_unlikely(&shared_processor)) This commit adds the missing header. mpe: The build break is only seen with CONFIG_JUMP_LABEL=n. Fixes: 656c21d6af5d ("powerpc/shared: Use static key to detect shared processor") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191223133147.129983-1-Jason@zx2c4.com
show more ...
|
Revision tags: v5.4.6, v5.4.5, v5.4.4, v5.4.3 |
|
#
656c21d6 |
| 05-Dec-2019 |
Srikar Dronamraju <srikar@linux.vnet.ibm.com> |
powerpc/shared: Use static key to detect shared processor With the static key shared processor available, is_shared_processor() can return without having to query the lppaca structure.
powerpc/shared: Use static key to detect shared processor With the static key shared processor available, is_shared_processor() can return without having to query the lppaca structure. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Phil Auld <pauld@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191213035036.6913-2-mpe@ellerman.id.au
show more ...
|