Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
febe950d |
| 31-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
arch: Remove cmpxchg_double
No moar users, remove the monster.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mar
arch: Remove cmpxchg_double
No moar users, remove the monster.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
show more ...
|
#
b23e139d |
| 31-May-2023 |
Peter Zijlstra <peterz@infradead.org> |
arch: Introduce arch_{,try_}_cmpxchg128{,_local}()
For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a
arch: Introduce arch_{,try_}_cmpxchg128{,_local}()
For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a saner interface.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org
show more ...
|
Revision tags: v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20 |
|
#
e5cacb54 |
| 14-Mar-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics: lse: improve cmpxchg implementation
For historical reasons, the LSE implementation of cmpxchg*() hard-codes the GPRs to use, and shuffles registers around with MOVs. This is no longe
arm64: atomics: lse: improve cmpxchg implementation
For historical reasons, the LSE implementation of cmpxchg*() hard-codes the GPRs to use, and shuffles registers around with MOVs. This is no longer necessary, and can be simplified.
When the LSE cmpxchg implementation was added in commit:
c342f78217e822d2 ("arm64: cmpxchg: patch in lse instructions when supported by the CPU")
... the LL/SC implementation of cmpxchg() would be placed out-of-line, and the in-line assembly for cmpxchg would default to:
NOP BL <ll_sc_cmpxchg*_implementation> NOP
The LL/SC implementation of each cmpxchg() function accepted arguments as per AAPCS64 rules, to it was necessary to place the pointer in x0, the older value in X1, and the new value in x2, and acquire the return value from x0. The LL/SC implementation required a temporary register (e.g. for the STXR status value). As the LL/SC implementation preserved the old value, the LSE implementation does likewise.
Since commit:
addfc38672c73efd ("arm64: atomics: avoid out-of-line ll/sc atomics")
... the LSE and LL/SC implementations of cmpxchg are inlined as separate asm blocks, with another branch choosing between thw two. Due to this, it is no longer necessary for the LSE implementation to match the register constraints of the LL/SC implementation. This was partially dealt with by removing the hard-coded use of x30 in commit:
3337cb5aea594e40 ("arm64: avoid using hard-coded registers for LSE atomics")
... but we didn't clean up the hard-coding of x0, x1, and x2.
This patch simplifies the LSE implementation of cmpxchg, removing the register shuffling and directly clobbering the 'old' argument. This gives the compiler greater freedom for register allocation, and avoids redundant work.
The new constraints permit 'old' (Rs) and 'new' (Rt) to be allocated to the same register when the initial values of the two are the same, e.g. resulting in:
CAS X0, X0, [X1]
This is safe as Rs is only written back after the initial values of Rs and Rt are consumed, and there are no UNPREDICTABLE behaviours to avoid when Rs == Rt.
The new constraints also permit 'new' to be allocated to the zero register, avoiding a MOV in a few cases. The same cannot be done for 'old' as it is both an input and output, and any caller of cmpxchg() should care about the output value. Note that for CAS* the use of the zero register never affects the ordering (while for SWP* the use of the zero regsiter for the 'old' value drops any ACQUIRE semantic).
Compared to v6.2-rc4, a defconfig vmlinux is ~116KiB smaller, though the resulting Image is the same size due to internal alignment and padding:
[mark@lakrids:~/src/linux]% ls -al vmlinux-* -rwxr-xr-x 1 mark mark 137269304 Jan 16 11:59 vmlinux-after -rwxr-xr-x 1 mark mark 137387936 Jan 16 10:54 vmlinux-before [mark@lakrids:~/src/linux]% ls -al Image-* -rw-r--r-- 1 mark mark 38711808 Jan 16 11:59 Image-after -rw-r--r-- 1 mark mark 38711808 Jan 16 10:54 Image-before
This patch does not touch cmpxchg_double*() as that requires contiguous register pairs, and separate patches will replace it with cmpxchg128*().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230314153700.787701-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4 |
|
#
031af500 |
| 04-Jan-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64: cmpxchg_double*: hazard against entire exchange variable
The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory l
arm64: cmpxchg_double*: hazard against entire exchange variable
The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location.
GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems.
This is similar to what we fixed back in commit:
fee960bed5e857eb ("arm64: xchg: hazard against entire exchange variable")
... but we forgot to adjust cmpxchg_double*() similarly at the same time.
The same problem applies, as demonstrated with the following test:
| struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | }
... which GCC 12.1.0 compiles as:
| 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // #18 | 10: d2800681 mov x1, #0x34 // #52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // #86 | 20: d2800f03 mov x3, #0x78 // #120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // #18 | 44: d2800681 mov x1, #0x34 // #52 | 48: d2800ac2 mov x2, #0x56 // #86 | 4c: d2800f03 mov x3, #0x78 // #120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret
Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double().
This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified.
With this change, GCC 12.1.0 compiles the above test as:
| 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, #8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // #18 | 14: d2800681 mov x1, #0x34 // #52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // #86 | 24: d2800f03 mov x3, #0x78 // #120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, #8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // #18 | 4c: d2800681 mov x1, #0x34 // #52 | 50: d2800ac2 mov x2, #0x56 // #86 | 54: d2800f03 mov x3, #0x78 // #120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, #8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret
... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect.
For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities.
I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1.
Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b795652f ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62 |
|
#
78f6f5c9 |
| 17-Aug-2022 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomic: always inline the assembly
The __lse_*() and __ll_sc_*() atomic implementations are marked as inline rather than __always_inline, permitting a compiler to generate out-of-line version
arm64: atomic: always inline the assembly
The __lse_*() and __ll_sc_*() atomic implementations are marked as inline rather than __always_inline, permitting a compiler to generate out-of-line versions, which may be instrumented.
We marked the atomic wrappers as __always_inline in commit:
c35a824c31834d94 ("arm64: make atomic helpers __always_inline")
... but did not think to do the same for the underlying implementations.
If the compiler were to out-of-line an LSE or LL/SC atomic, this could break noinstr code. Ensure this doesn't happen by marking the underlying implementations as __always_inline.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220817155914.3975112-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
Revision tags: v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15 |
|
#
3364c6ce |
| 12-Jan-2022 |
Kees Cook <keescook@chromium.org> |
arm64: atomics: lse: Dereference matching size
When building with -Warray-bounds, the following warning is generated:
In file included from ./arch/arm64/include/asm/lse.h:16, from
arm64: atomics: lse: Dereference matching size
When building with -Warray-bounds, the following warning is generated:
In file included from ./arch/arm64/include/asm/lse.h:16, from ./arch/arm64/include/asm/cmpxchg.h:14, from ./arch/arm64/include/asm/atomic.h:16, from ./include/linux/atomic.h:7, from ./include/asm-generic/bitops/atomic.h:5, from ./arch/arm64/include/asm/bitops.h:25, from ./include/linux/bitops.h:33, from ./include/linux/kernel.h:22, from kernel/printk/printk.c:22: ./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds] 247 | asm volatile( \ | ^~~ ./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE' 266 | __CMPXCHG_CASE(w, , acq_, 32, a, "memory") | ^~~~~~~~~~~~~~ kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner' 3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1); | ^~~~~~~~~~~~~~~~~~~~
This is due to the compiler seeing an unsigned long * cast against something (atomic_t) that is int sized. Replace the cast with the matching size cast. This results in no change in binary output.
Note that __ll_sc__cmpxchg_case_##name##sz already uses the same constraint:
[v] "+Q" (*(u##sz *)ptr
Which is why only the LSE form needs updating and not the LL/SC form, so this change is unlikely to be problematic.
Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
Revision tags: v5.16, v5.15.10, v5.15.9, v5.15.8 |
|
#
053f58ba |
| 10-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics: lse: define RETURN ops in terms of FETCH ops
The FEAT_LSE atomic instructions include LD* instructions which return the original value of a memory location can be used to directly im
arm64: atomics: lse: define RETURN ops in terms of FETCH ops
The FEAT_LSE atomic instructions include LD* instructions which return the original value of a memory location can be used to directly implement FETCH opertations. Each RETURN op is implemented as a copy of the corresponding FETCH op with a trailing instruction to generate the new value of the memory location. We only directly implement *_fetch_add*(), for which we have a trailing `add` instruction.
As the compiler has no visibility of the `add`, this leads to less than optimal code generation when consuming the result.
For example, the compiler cannot constant-fold the addition into later operations, and currently GCC 11.1.0 will compile:
return __lse_atomic_sub_return(1, v) == 0;
As:
mov w1, #0xffffffff ldaddal w1, w2, [x0] add w1, w1, w2 cmp w1, #0x0 cset w0, eq // eq = none ret
This patch improves this by replacing the `add` with C addition after the inline assembly block, e.g.
ret += i;
This allows the compiler to manipulate `i`. This permits the compiler to merge the `add` and `cmp` for the above, e.g.
mov w1, #0xffffffff ldaddal w1, w1, [x0] cmp w1, #0x1 cset w0, eq // eq = none ret
With this change the assembly for each RETURN op is identical to the corresponding FETCH op (including barriers and clobbers) so I've removed the inline assembly and rewritten each RETURN op in terms of the corresponding FETCH op, e.g.
| static inline void __lse_atomic_add_return(int i, atomic_t *v) | { | return __lse_atomic_fetch_add(i, v) + i | }
The new construction does not adversely affect the common case, and before and after this patch GCC 11.1.0 can compile:
__lse_atomic_add_return(i, v)
As:
ldaddal w0, w2, [x1] add w0, w0, w2
... while having the freedom to do better elsewhere.
This is intended as an optimization and cleanup. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
8a578a75 |
| 10-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics: lse: improve constraints for simple ops
We have overly conservative assembly constraints for the basic FEAT_LSE atomic instructions, and using more accurate and permissive constraint
arm64: atomics: lse: improve constraints for simple ops
We have overly conservative assembly constraints for the basic FEAT_LSE atomic instructions, and using more accurate and permissive constraints will allow for better code generation.
The FEAT_LSE basic atomic instructions have come in two forms:
LD{op}{order}{size} <Rs>, <Rt>, [<Rn>] ST{op}{order}{size} <Rs>, [<Rn>]
The ST* forms are aliases of the LD* forms where:
ST{op}{order}{size} <Rs>, [<Rn>] Is: LD{op}{order}{size} <Rs>, XZR, [<Rn>]
For either form, both <Rs> and <Rn> are read but not written back to, and <Rt> is written with the original value of the memory location. Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the other register value(s) are consumed. There are no UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or <Rn> are the same register.
Our current inline assembly always uses <Rs> == <Rt>, treating this register as both an input and an output (using a '+r' constraint). This forces the compiler to do some unnecessary register shuffling and/or redundant value generation.
For example, the compiler cannot reuse the <Rs> value, and currently GCC 11.1.0 will compile:
__lse_atomic_add(1, a); __lse_atomic_add(1, b); __lse_atomic_add(1, c);
As:
mov w3, #0x1 mov w4, w3 stadd w4, [x0] mov w0, w3 stadd w0, [x1] stadd w3, [x2]
We can improve this with more accurate constraints, separating <Rs> and <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an output-only value ('=r'). As <Rt> is written back after <Rs> is consumed, it does not need to be earlyclobber ('=&r'), leaving the compiler free to use the same register for both <Rs> and <Rt> where this is desirable.
At the same time, the redundant 'r' constraint for `v` is removed, as the `+Q` constraint is sufficient.
With this change, the above example becomes:
mov w3, #0x1 stadd w3, [x0] stadd w3, [x1] stadd w3, [x2]
I've made this change for the non-value-returning and FETCH ops. The RETURN ops have a multi-instruction sequence for which we cannot use the same constraints, and a subsequent patch will rewrite hte RETURN ops in terms of the FETCH ops, relying on the ability for the compiler to reuse the <Rs> value.
This is intended as an optimization. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
5e9e43c9 |
| 10-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics: lse: define ANDs in terms of ANDNOTs
The FEAT_LSE atomic instructions include atomic bit-clear instructions (`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT ope
arm64: atomics: lse: define ANDs in terms of ANDNOTs
The FEAT_LSE atomic instructions include atomic bit-clear instructions (`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT operations. Each AND op is implemented as a copy of the corresponding ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the `i` argument.
As the compiler has no visibility of the `mvn`, this leads to less than optimal code generation when generating `i` into a register. For example, __lse_atomic_fetch_and(0xf, v) can be compiled to:
mov w1, #0xf mvn w1, w1 ldclral w1, w1, [x2]
This patch improves this by replacing the `mvn` with NOT in C before the inline assembly block, e.g.
i = ~i;
This allows the compiler to generate `i` into a register more optimally, e.g.
mov w1, #0xfffffff0 ldclral w1, w1, [x2]
With this change the assembly for each AND op is identical to the corresponding ANDNOT op (including barriers and clobbers), so I've removed the inline assembly and rewritten each AND op in terms of the corresponding ANDNOT op, e.g.
| static inline void __lse_atomic_and(int i, atomic_t *v) | { | return __lse_atomic_andnot(~i, v); | }
This is intended as an optimization and cleanup. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
ef532450 |
| 10-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics lse: define SUBs in terms of ADDs
The FEAT_LSE atomic instructions include atomic ADD instructions (`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so we must buil
arm64: atomics lse: define SUBs in terms of ADDs
The FEAT_LSE atomic instructions include atomic ADD instructions (`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so we must build all of the SUB operations using the ADD instructions. We open-code these today, with each SUB op implemented as a copy of the corresponding ADD op with a leading `neg` instruction in the inline assembly to negate the `i` argument.
As the compiler has no visibility of the `neg`, this leads to less than optimal code generation when generating `i` into a register. For example, __les_atomic_fetch_sub(1, v) can be compiled to:
mov w1, #0x1 neg w1, w1 ldaddal w1, w1, [x2]
This patch improves this by replacing the `neg` with negation in C before the inline assembly block, e.g.
i = -i;
This allows the compiler to generate `i` into a register more optimally, e.g.
mov w1, #0xffffffff ldaddal w1, w1, [x2]
With this change the assembly for each SUB op is identical to the corresponding ADD op (including barriers and clobbers), so I've removed the inline assembly and rewritten each SUB op in terms of the corresponding ADD op, e.g.
| static inline void __lse_atomic_sub(int i, atomic_t *v) | { | __lse_atomic_add(-i, v); | }
For clarity I've moved the definition of each SUB op immediately after the corresponding ADD op, and used a single macro to create the RETURN forms of both ops.
This is intended as an optimization and cleanup. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
#
8e6082e9 |
| 10-Dec-2021 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomics: format whitespace consistently
The code for the atomic ops is formatted inconsistently, and while this is not a functional problem it is rather distracting when working on them.
Som
arm64: atomics: format whitespace consistently
The code for the atomic ops is formatted inconsistently, and while this is not a functional problem it is rather distracting when working on them.
Some have ops have consistent indentation, e.g.
| #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \ | static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \ | { \ | u32 tmp; \ | \ | asm volatile( \ | __LSE_PREAMBLE \ | " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \ | " add %w[i], %w[i], %w[tmp]" \ | : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \ | : "r" (v) \ | : cl); \ | \ | return i; \ | }
While others have negative indentation for some lines, and/or have misaligned trailing backslashes, e.g.
| static inline void __lse_atomic_##op(int i, atomic_t *v) \ | { \ | asm volatile( \ | __LSE_PREAMBLE \ | " " #asm_op " %w[i], %[v]\n" \ | : [i] "+r" (i), [v] "+Q" (v->counter) \ | : "r" (v)); \ | }
This patch makes the indentation consistent and also aligns the trailing backslashes. This makes the code easier to read for those (like myself) who are easily distracted by these inconsistencies.
This is intended as a cleanup. There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
Revision tags: v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9 |
|
#
e0d5896b |
| 31-Oct-2019 |
Sami Tolvanen <samitolvanen@google.com> |
arm64: lse: fix LSE atomics with LLVM's integrated assembler
Unlike gcc, clang considers each inline assembly block to be independent and therefore, when using the integrated assembler for inline as
arm64: lse: fix LSE atomics with LLVM's integrated assembler
Unlike gcc, clang considers each inline assembly block to be independent and therefore, when using the integrated assembler for inline assembly, any preambles that enable features must be repeated in each block.
This change defines __LSE_PREAMBLE and adds it to each inline assembly block that has LSE instructions, which allows them to be compiled also with clang's assembler.
Link: https://github.com/ClangBuiltLinux/linux/issues/671 Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Andrew Murray <andrew.murray@arm.com> Tested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3 |
|
#
a48e61de |
| 01-Oct-2019 |
Will Deacon <will@kernel.org> |
arm64: Mark functions using explicit register variables as '__always_inline'
As of ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"), inline functions are no longer annotated with
arm64: Mark functions using explicit register variables as '__always_inline'
As of ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"), inline functions are no longer annotated with '__always_inline', which allows the compiler to decide whether inlining is really a good idea or not. Although this is a great idea on paper, the reality is that AArch64 GCC prior to 9.1 has been shown to get confused when creating an out-of-line copy of a function passing explicit 'register' variables into an inline assembly block:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111
It's not clear whether this is specific to arm64 or not but, for now, ensure that all of our functions using 'register' variables are marked as '__always_inline' so that the old behaviour is effectively preserved.
Hopefully other architectures are luckier with their compilers.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11 |
|
#
3337cb5a |
| 28-Aug-2019 |
Andrew Murray <andrew.murray@arm.com> |
arm64: avoid using hard-coded registers for LSE atomics
Now that we have removed the out-of-line ll/sc atomics we can give the compiler the freedom to choose its own register allocation.
Remove the
arm64: avoid using hard-coded registers for LSE atomics
Now that we have removed the out-of-line ll/sc atomics we can give the compiler the freedom to choose its own register allocation.
Remove the hard-coded use of x30.
Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
addfc386 |
| 28-Aug-2019 |
Andrew Murray <andrew.murray@arm.com> |
arm64: atomics: avoid out-of-line ll/sc atomics
When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware or toolchain doesn't support it the existing code will fallback to ll/sc ato
arm64: atomics: avoid out-of-line ll/sc atomics
When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware or toolchain doesn't support it the existing code will fallback to ll/sc atomics. It achieves this by branching from inline assembly to a function that is built with special compile flags. Further this results in the clobbering of registers even when the fallback isn't used increasing register pressure.
Improve this by providing inline implementations of both LSE and ll/sc and use a static key to select between them, which allows for the compiler to generate better atomics code. Put the LL/SC fallback atomics in their own subsection to improve icache performance.
Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7 |
|
#
caab277b |
| 03-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 503 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.6, v5.1.5 |
|
#
16f18688 |
| 22-May-2019 |
Mark Rutland <mark.rutland@arm.com> |
locking/atomic, arm64: Use s64 for atomic64
As a step towards making the atomic64 API use consistent types treewide, let's have the arm64 atomic64 implementation use s64 as the underlying type for a
locking/atomic, arm64: Use s64 for atomic64
As a step towards making the atomic64 API use consistent types treewide, let's have the arm64 atomic64 implementation use s64 as the underlying type for atomic64_t, rather than long, matching the generated headers.
As atomic64_read() depends on the generic defintion of atomic64_t, this still returns long. This will be converted in a subsequent patch.
Note that in arch_atomic64_dec_if_positive(), the x0 variable is left as long, as this variable is also used to hold the pointer to the atomic64_t.
Otherwise, there should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: aou@eecs.berkeley.edu Cc: arnd@arndb.de Cc: bp@alien8.de Cc: davem@davemloft.net Cc: fenghua.yu@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: ink@jurassic.park.msu.ru Cc: jhogan@kernel.org Cc: linux@armlinux.org.uk Cc: mattst88@gmail.com Cc: mpe@ellerman.id.au Cc: palmer@sifive.com Cc: paul.burton@mips.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: tony.luck@intel.com Cc: vgupta@synopsys.com Link: https://lkml.kernel.org/r/20190522132250.26499-8-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9 |
|
#
b4f9209b |
| 13-Sep-2018 |
Will Deacon <will.deacon@arm.com> |
arm64: Avoid masking "old" for LSE cmpxchg() implementation
The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-cas
arm64: Avoid masking "old" for LSE cmpxchg() implementation
The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation.
Move the casting into the LL/SC code and remove it altogether for the LSE implementation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
#
5ef3fe4c |
| 13-Sep-2018 |
Will Deacon <will.deacon@arm.com> |
arm64: Avoid redundant type conversions in xchg() and cmpxchg()
Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory acce
arm64: Avoid redundant type conversions in xchg() and cmpxchg()
Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory accesses so there is no need to mask the data register prior to being stored.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
Revision tags: v4.18.7, v4.18.6 |
|
#
c0df1081 |
| 04-Sep-2018 |
Mark Rutland <mark.rutland@arm.com> |
arm64, locking/atomics: Use instrumented atomics
Now that the generic atomic headers provide instrumented wrappers of all the atomics implemented by arm64, let's migrate arm64 over to these.
The ad
arm64, locking/atomics: Use instrumented atomics
Now that the generic atomic headers provide instrumented wrappers of all the atomics implemented by arm64, let's migrate arm64 over to these.
The additional instrumentation will help to find bugs (e.g. when fuzzing with Syzkaller).
Mostly this change involves adding an arch_ prefix to a number of function names and macro definitions. When LSE atomics are used, the out-of-line LL/SC atomics will be named __ll_sc_arch_atomic_${OP}.
Adding the arch_ prefix requires some whitespace fixups to keep things aligned. Some other unusual whitespace is fixed up at the same time (e.g. in the cmpxchg wrappers).
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linuxdrivers@attotech.com Cc: dvyukov@google.com Cc: boqun.feng@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: glider@google.com Link: http://lkml.kernel.org/r/20180904104830.2975-7-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17 |
|
#
32c3fa7c |
| 21-May-2018 |
Will Deacon <will.deacon@arm.com> |
arm64: lse: Add early clobbers to some input/output asm operands
For LSE atomics that read and write a register operand, we need to ensure that these operands are annotated as "early clobber" if the
arm64: lse: Add early clobbers to some input/output asm operands
For LSE atomics that read and write a register operand, we need to ensure that these operands are annotated as "early clobber" if the register is written before all of the input operands have been consumed. Failure to do so can result in the compiler allocating the same register to both operands, leading to splats such as:
Unable to handle kernel paging request at virtual address 11111122222221 [...] x1 : 1111111122222222 x0 : 1111111122222221 Process swapper/0 (pid: 1, stack limit = 0x000000008209f908) Call trace: test_atomic64+0x1360/0x155c
where x0 has been allocated as both the value to be stored and also the atomic_t pointer.
This patch adds the missing clobbers.
Cc: <stable@vger.kernel.org> Cc: Dave Martin <dave.martin@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Reported-by: Mark Salter <msalter@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
Revision tags: v4.16, v4.15, v4.13.16, v4.14, v4.13.5, v4.13 |
|
#
32fb5d73 |
| 07-Jul-2017 |
Will Deacon <will.deacon@arm.com> |
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
The lse implementation of atomic64_dec_if_positive uses the '+&' constraint, but the '&' is redundant and confusing in this case, s
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
The lse implementation of atomic64_dec_if_positive uses the '+&' constraint, but the '&' is redundant and confusing in this case, since early clobber on a read/write operand is a strange concept.
Replace the constraint with '+'.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
Revision tags: v4.12, v4.10.17, v4.10.16, v4.10.15, v4.10.14 |
|
#
8997c934 |
| 03-May-2017 |
Mark Rutland <mark.rutland@arm.com> |
arm64: atomic_lse: match asm register sizes
The LSE atomic code uses asm register variables to ensure that parameters are allocated in specific registers. In the majority of cases we specifically as
arm64: atomic_lse: match asm register sizes
The LSE atomic code uses asm register variables to ensure that parameters are allocated in specific registers. In the majority of cases we specifically ask for an x register when using 64-bit values, but in a couple of cases we use a w regsiter for a 64-bit value.
For asm register variables, the compiler only cares about the register index, with wN and xN having the same meaning. The compiler determines the register size to use based on the type of the variable. Thus, this inconsistency is merely confusing, and not harmful to code generation.
For consistency, this patch updates those cases to use the x register alias. There should be no functional change as a result of this patch.
Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
show more ...
|
Revision tags: v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2, v4.10.1, v4.10, v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20 |
|
#
05492f2f |
| 06-Sep-2016 |
Will Deacon <will.deacon@arm.com> |
arm64: lse: convert lse alternatives NOP padding to use __nops
The LSE atomics are implemented using alternative code sequences of different lengths, and explicit NOP padding is used to ensure the p
arm64: lse: convert lse alternatives NOP padding to use __nops
The LSE atomics are implemented using alternative code sequences of different lengths, and explicit NOP padding is used to ensure the patching works correctly.
This patch converts the bulk of the LSE code over to using the __nops macro, which makes it slightly clearer as to what is going on and also consolidates all of the padding at the end of the various sequences.
Signed-off-by: Will Deacon <will.deacon@arm.com>
show more ...
|
Revision tags: v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9 |
|
#
2efe95fe |
| 22-Apr-2016 |
Will Deacon <will.deacon@arm.com> |
locking/atomic, arch/arm64: Implement atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() for LSE instructions
Implement FETCH-OP atomic primitives, these are very similar t
locking/atomic, arch/arm64: Implement atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}() for LSE instructions
Implement FETCH-OP atomic primitives, these are very similar to the existing OP-RETURN primitives we already have, except they return the value of the atomic variable _before_ modification.
This is especially useful for irreversible operations -- such as bitops (because it becomes impossible to reconstruct the state prior to modification).
This patch implements the LSE variants.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1461344493-8262-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|