e69eb9c4 | 24-Jun-2022 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
bitops: wrap non-atomic bitops with a transparent macro
In preparation for altering the non-atomic bitops with a macro, wrap them in a transparent definition. This requires prepending one more '_' t
bitops: wrap non-atomic bitops with a transparent macro
In preparation for altering the non-atomic bitops with a macro, wrap them in a transparent definition. This requires prepending one more '_' to their names in order to be able to do that seamlessly. It is a simple change, given that all the non-prefixed definitions are now in asm-generic. sparc32 already has several triple-underscored functions, so I had to rename them ('___' -> 'sp32_').
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
bb7379bf | 24-Jun-2022 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
bitops: define const_*() versions of the non-atomics
Define const_*() variants of the non-atomic bitops to be used when the input arguments are compile-time constants, so that the compiler will be a
bitops: define const_*() versions of the non-atomics
Define const_*() variants of the non-atomic bitops to be used when the input arguments are compile-time constants, so that the compiler will be always able to resolve those to compile-time constants as well. Those are mostly direct aliases for generic_*() with one exception for const_test_bit(): the original one is declared atomic-safe and thus doesn't discard the `volatile` qualifier, so in order to let optimize code, define it separately disregarding the qualifier. Add them to the compile-time type checks as well just in case.
Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
0e862838 | 24-Jun-2022 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
bitops: unify non-atomic bitops prototypes across architectures
Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures:
ret bool, int, unsigned l
bitops: unify non-atomic bitops prototypes across architectures
Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures:
ret bool, int, unsigned long nr int, long, unsigned int, unsigned long addr volatile unsigned long *, volatile void *
Thankfully, it doesn't provoke any bugs, but can sometimes make the compiler angry when it's not handy at all. Adjust all the prototypes to the following standard:
ret bool retval can be only 0 or 1 nr unsigned long native; signed makes no sense addr volatile unsigned long * bitmaps are arrays of ulongs
Next, some architectures don't define 'arch_' versions as they don't support instrumentation, others do. To make sure there is always the same set of callables present and to ease any potential future changes, make them all follow the rule: * architecture-specific files define only 'arch_' versions; * non-prefixed versions can be defined only in asm-generic files; and place the non-prefixed definitions into a new file in asm-generic to be included by non-instrumented architectures.
Finally, add some static assertions in order to prevent people from making a mess in this room again. I also used the %__always_inline attribute consistently, so that they always get resolved to the actual operations.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
8f76f9c4 | 05-Aug-2021 |
Vineet Gupta <Vineet.Gupta1@synopsys.com> |
bitops/non-atomic: make @nr unsigned to avoid any DIV
signed math causes generation of costlier instructions such as DIV when they could be done by barrerl shifter.
Worse part is this is not caught
bitops/non-atomic: make @nr unsigned to avoid any DIV
signed math causes generation of costlier instructions such as DIV when they could be done by barrerl shifter.
Worse part is this is not caught by things like bloat-o-meter since instruction length / symbols are typically same size.
e.g.
stock (signed math) __________________
919b4614 <test_taint>: 919b4614: div r2,r0,0x20 ^^^ 919b4618: add2 r2,0x920f6050,r2 919b4620: ld_s r2,[r2,0] 919b4622: lsr r0,r2,r0 919b4626: j_s.d [blink] 919b4628: bmsk_s r0,r0,0 919b462a: nop_s
(patched) unsigned math __________________
919b4614 <test_taint>: 919b4614: lsr r2,r0,0x5 @nr/32 ^^^ 919b4618: add2 r2,0x920f6050,r2 919b4620: ld_s r2,[r2,0] 919b4622: lsr r0,r2,r0 #test_bit() 919b4626: j_s.d [blink] 919b4628: bmsk_s r0,r0,0 919b462a: nop_s
Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
show more ...
|
9248e52f | 21-Jul-2021 |
Mark Rutland <mark.rutland@arm.com> |
locking/atomic: simplify non-atomic wrappers
Since the non-atomic arch_*() bitops use plain accesses, they are implicitly instrumnted by the compiler, and we work around this in the instrumented wra
locking/atomic: simplify non-atomic wrappers
Since the non-atomic arch_*() bitops use plain accesses, they are implicitly instrumnted by the compiler, and we work around this in the instrumented wrappers to avoid double instrumentation.
It's simpler to avoid the wrappers entirely, and use the preprocessor to alias the arch_*() bitops to their regular versions, removing the need for checks in the instrumented wrappers.
Suggested-by: Marco Elver <elver@google.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/20210721155813.17082-1-mark.rutland@arm.com
show more ...
|
2cc7b6a4 | 06-May-2021 |
Yury Norov <yury.norov@gmail.com> |
lib: add fast path for find_first_*_bit() and find_last_bit()
Similarly to bitmap functions, users would benefit if we'll handle a case of small-size bitmaps that fit into a single word.
While here
lib: add fast path for find_first_*_bit() and find_last_bit()
Similarly to bitmap functions, users would benefit if we'll handle a case of small-size bitmaps that fit into a single word.
While here, move the find_last_bit() declaration to bitops/find.h where other find_*_bit() functions sit.
Link: https://lkml.kernel.org/r/20210401003153.97325-11-yury.norov@gmail.com Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Sterba <dsterba@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jianpeng Ma <jianpeng.ma@intel.com> Cc: Joe Perches <joe@perches.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
277a20a4 | 06-May-2021 |
Yury Norov <yury.norov@gmail.com> |
lib: add fast path for find_next_*_bit()
Similarly to bitmap functions, find_next_*_bit() users will benefit if we'll handle a case of bitmaps that fit into a single word inline. In the very best c
lib: add fast path for find_next_*_bit()
Similarly to bitmap functions, find_next_*_bit() users will benefit if we'll handle a case of bitmaps that fit into a single word inline. In the very best case, the compiler may replace a function call with a few instructions.
This is the quite typical find_next_bit() user:
unsigned int cpumask_next(int n, const struct cpumask *srcp) { /* -1 is a legal arg here. */ if (n != -1) cpumask_check(n); return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); } EXPORT_SYMBOL(cpumask_next);
Currently, on ARM64 the generated code looks like this: 0000000000000000 <cpumask_next>: 0: a9bf7bfd stp x29, x30, [sp, #-16]! 4: 11000402 add w2, w0, #0x1 8: aa0103e0 mov x0, x1 c: d2800401 mov x1, #0x40 // #64 10: 910003fd mov x29, sp 14: 93407c42 sxtw x2, w2 18: 94000000 bl 0 <find_next_bit> 1c: a8c17bfd ldp x29, x30, [sp], #16 20: d65f03c0 ret 24: d503201f nop
After applying this patch: 0000000000000140 <cpumask_next>: 140: 11000400 add w0, w0, #0x1 144: 93407c00 sxtw x0, w0 148: f100fc1f cmp x0, #0x3f 14c: 54000168 b.hi 178 <cpumask_next+0x38> // b.pmore 150: f9400023 ldr x3, [x1] 154: 92800001 mov x1, #0xffffffffffffffff // #-1 158: 9ac02020 lsl x0, x1, x0 15c: 52800802 mov w2, #0x40 // #64 160: 8a030001 and x1, x0, x3 164: dac00020 rbit x0, x1 168: f100003f cmp x1, #0x0 16c: dac01000 clz x0, x0 170: 1a800040 csel w0, w2, w0, eq // eq = none 174: d65f03c0 ret 178: 52800800 mov w0, #0x40 // #64 17c: d65f03c0 ret
find_next_bit() call is replaced with 6 instructions. find_next_bit() itself is 41 instructions plus function call overhead.
Despite inlining, the scripts/bloat-o-meter report smaller .text size after applying the series: add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)
Link: https://lkml.kernel.org/r/20210401003153.97325-10-yury.norov@gmail.com Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Sterba <dsterba@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jianpeng Ma <jianpeng.ma@intel.com> Cc: Joe Perches <joe@perches.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
068df053 | 13-Aug-2020 |
Marco Elver <elver@google.com> |
bitops, kcsan: Partially revert instrumentation for non-atomic bitops
Previous to the change to distinguish read-write accesses, when CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y is set, KCSAN would co
bitops, kcsan: Partially revert instrumentation for non-atomic bitops
Previous to the change to distinguish read-write accesses, when CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y is set, KCSAN would consider the non-atomic bitops as atomic. We want to partially revert to this behaviour, but with one important distinction: report racing modifications, since lost bits due to non-atomicity are certainly possible.
Given the operations here only modify a single bit, assuming non-atomicity of the writer is sufficient may be reasonable for certain usage (and follows the permissible nature of the "assume plain writes atomic" rule). In other words:
1. We want non-atomic read-modify-write races to be reported; this is accomplished by kcsan_check_read(), where any concurrent write (atomic or not) will generate a report.
2. We do not want to report races with marked readers, but -do- want to report races with unmarked readers; this is accomplished by the instrument_write() ("assume atomic write" with Kconfig option set).
With the above rules, when KCSAN_ASSUME_PLAIN_WRITES_ATOMIC is selected, it is hoped that KCSAN's reporting behaviour is better aligned with current expected permissible usage for non-atomic bitops.
Note that, a side-effect of not telling KCSAN that the accesses are read-writes, is that this information is not displayed in the access summary in the report. It is, however, visible in inline-expanded stack traces. For now, it does not make sense to introduce yet another special case to KCSAN's runtime, only to cater to the case here.
Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|