/openbmc/linux/include/linux/atomic/ |
H A D | atomic-arch-fallback.h | 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
/openbmc/linux/scripts/atomic/ |
H A D | gen-atomic-fallback.sh | 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 6d2779ecaeb56f92d7105c56772346c71c88c278 Tue Sep 19 12:14:29 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: fix fallback ifdeffery
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and atomic*_set_release() erroneously fall back to the implictly relaxed atomic*_read() and atomic*_set() variants respectively, without any additional barriers. This loses the ACQUIRE and RELEASE ordering semantics, which can result in a wide variety of problems, even on strongly-ordered architectures where the implementation of atomic*_read() and/or atomic*_set() allows the compiler to reorder those relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64, resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to be defined in terms of FULL ops, but where an op had RELAXED ordering by default, this unintentionally permitted the ACQUIRE/RELEASE ops to be defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly RELAXED ops, resulting in the same behaviour as prior to commit 9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined wrappers of the atomics. Prior to this patch the compiler generates sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldr x0, [x0] | ret | | <outlined_atomic64_set_release>: | str x1, [x0] | ret
With this patch applied the compiler generates sequences using the intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>: | ldar x0, [x0] | ret | | <outlined_atomic64_set_release>: | stlr x1, [x0] | ret
To make sure that there were no other victims of the ifdeffery rewrite, I generated outlined copies of all of the {atomic,atomic64,atomic_long} atomic operations before and after commit 9257959a6e5b4fca. A diff of the generated assembly on arm64 shows that only the read_acquire() and set_release() operations were changed, and only lost their intended ordering:
| [mark@lakrids:~/src/linux]% diff -u \ | <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o) | <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o) | --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100 | +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100 | @@ -1,5 +1,5 @@ | | -before-9257959a6e5b4fca.o: file format elf64-littleaarch64 | +after-9257959a6e5b4fca.o: file format elf64-littleaarch64 | | | Disassembly of section .text: | @@ -9,7 +9,7 @@ | 4: d65f03c0 ret | | 0000000000000008 <outlined_atomic_read_acquire>: | - 8: 88dffc00 ldar w0, [x0] | + 8: b9400000 ldr w0, [x0] | c: d65f03c0 ret | | 0000000000000010 <outlined_atomic_set>: | @@ -17,7 +17,7 @@ | 14: d65f03c0 ret | | 0000000000000018 <outlined_atomic_set_release>: | - 18: 889ffc01 stlr w1, [x0] | + 18: b9000001 str w1, [x0] | 1c: d65f03c0 ret | | 0000000000000020 <outlined_atomic_add>: | @@ -1230,7 +1230,7 @@ | 1070: d65f03c0 ret | | 0000000000001074 <outlined_atomic64_read_acquire>: | - 1074: c8dffc00 ldar x0, [x0] | + 1074: f9400000 ldr x0, [x0] | 1078: d65f03c0 ret | | 000000000000107c <outlined_atomic64_set>: | @@ -1238,7 +1238,7 @@ | 1080: d65f03c0 ret | | 0000000000001084 <outlined_atomic64_set_release>: | - 1084: c89ffc01 stlr x1, [x0] | + 1084: f9000001 str x1, [x0] | 1088: d65f03c0 ret | | 000000000000108c <outlined_atomic64_add>: | @@ -2427,7 +2427,7 @@ | 207c: d65f03c0 ret | | 0000000000002080 <outlined_atomic_long_read_acquire>: | - 2080: c8dffc00 ldar x0, [x0] | + 2080: f9400000 ldr x0, [x0] | 2084: d65f03c0 ret | | 0000000000002088 <outlined_atomic_long_set>: | @@ -2435,7 +2435,7 @@ | 208c: d65f03c0 ret | | 0000000000002090 <outlined_atomic_long_set_release>: | - 2090: c89ffc01 stlr x1, [x0] | + 2090: f9000001 str x1, [x0] | 2094: d65f03c0 ret | | 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64, csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv, s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I was unable to build test for ia64 and parisc due to existing build breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery") Reported-by: Ming Lei <ming.lei@redhat.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Baokun Li <libaokun1@huawei.com> Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | gen-atomics.sh | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
/openbmc/linux/scripts/atomic/fallbacks/ |
H A D | xchg | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | cmpxchg | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | dec | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | sub_and_test | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | add_unless | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | dec_and_test | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | acquire | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | dec_unless_positive | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | inc_not_zero | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | dec_if_positive | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | inc_and_test | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | inc_unless_negative | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | release | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | try_cmpxchg | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | fetch_add_unless | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | fence | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | inc | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | andnot | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | read_acquire | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
H A D | add_negative | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|
/openbmc/linux/include/linux/ |
H A D | atomic.h | 9257959a6e5b4fca6fc8e985790bff62c2046f20 Mon Jun 05 02:01:17 CDT 2023 Mark Rutland <mark.rutland@arm.com> locking/atomic: scripts: restructure fallback ifdeffery
Currently the various ordering variants of an atomic operation are defined in groups of full/acquire/release/relaxed ordering variants with some shared ifdeffery and several potential definitions of each ordering variant in different branches of the shared ifdeffery.
As an ordering variant can have several potential definitions down different branches of the shared ifdeffery, it can be painful for a human to find a relevant definition, and we don't have a good location to place anything common to all definitions of an ordering variant (e.g. kerneldoc).
Historically the grouping of full/acquire/release/relaxed ordering variants was necessary as we filled in the missing atomics in the same namespace as the architecture used. It would be easy to accidentally define one ordering fallback in terms of another ordering fallback with redundant barriers, and avoiding that would otherwise require a lot of baroque ifdeffery.
With recent changes we no longer need to fill in the missing atomics in the arch_atomic*_<op>() namespace, and only need to fill in the raw_atomic*_<op>() namespace. Due to this, there's no risk of a namespace collision, and we can define each raw_atomic*_<op> ordering variant with its own ifdeffery checking for the arch_atomic*_<op> ordering variants.
Restructure the fallbacks in this way, with each ordering variant having its own ifdeffery of the form:
| #if defined(arch_atomic_fetch_andnot_acquire) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire | #elif defined(arch_atomic_fetch_andnot_relaxed) | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | int ret = arch_atomic_fetch_andnot_relaxed(i, v); | __atomic_acquire_fence(); | return ret; | } | #elif defined(arch_atomic_fetch_andnot) | #define raw_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot | #else | static __always_inline int | raw_atomic_fetch_andnot_acquire(int i, atomic_t *v) | { | return raw_atomic_fetch_and_acquire(~i, v); | } | #endif
Note that where there's no relevant arch_atomic*_<op>() ordering variant, we'll define the operation in terms of a distinct raw_atomic*_<otherop>(), as this itself might have been filled in with a fallback.
As we now generate the raw_atomic*_<op>() implementations directly, we no longer need the trivial wrappers, so they are removed.
This makes the ifdeffery easier to follow, and will allow for further improvements in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-21-mark.rutland@arm.com
|