Revision tags: v5.15.68 |
|
#
e79864f3 |
| 14-Sep-2022 |
Yury Norov <yury.norov@gmail.com> |
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended with parameters that modify its behavior to implement and- zero- and le- flavo
lib/find_bit: optimize find_next_bit() functions
Over the past couple years, the function _find_next_bit() was extended with parameters that modify its behavior to implement and- zero- and le- flavors. The parameters are passed at compile time, but current design prevents a compiler from optimizing out the conditionals.
As find_next_bit() API grows, I expect that more parameters will be added. Current design would require more conditional code in _find_next_bit(), which would bloat the helper even more and make it barely readable.
This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds a set of wrappers, so that the compile-time optimizations become possible.
The common logic is moved to the new macro, and all flavors may be generated by providing a FETCH macro parameter, like in this example:
#define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ...
find_next_xornot_and_bit(addr1, addr2, addr3, size, start) { return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx], /* nop */, size, start); }
The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be empty.
I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top of 6.0-rc2 + this series. The results for kvm/x86_64 are:
v6.0-rc2 Optimized Difference Z-score Random dense bitmap ns ns ns % find_next_bit: 787735 670546 117189 14.9 3.97 find_next_zero_bit: 777492 664208 113284 14.6 10.51 find_last_bit: 830925 687573 143352 17.3 2.35 find_first_bit: 3874366 3306635 567731 14.7 1.84 find_first_and_bit: 40677125 37739887 2937238 7.2 1.36 find_next_and_bit: 347865 304456 43409 12.5 1.35
Random sparse bitmap find_next_bit: 19816 14021 5795 29.2 6.10 find_next_zero_bit: 1318901 1223794 95107 7.2 1.41 find_last_bit: 14573 13514 1059 7.3 6.92 find_first_bit: 1313321 1249024 64297 4.9 1.53 find_first_and_bit: 8921 8098 823 9.2 4.56 find_next_and_bit: 9796 7176 2620 26.7 5.39
Where the statistics is significant (z-score > 3), the improvement is ~15%.
According to the bloat-o-meter, the Image size is 10-11K less:
x86_64/defconfig: add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177)
arm64/defconfig: add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
#
14a99e13 |
| 14-Sep-2022 |
Yury Norov <yury.norov@gmail.com> |
lib/find_bit: create find_first_zero_bit_le()
find_first_zero_bit_le() is an alias to find_next_zero_bit_le(), despite that 'next' is known to be slower than 'first' version.
Now that we have commo
lib/find_bit: create find_first_zero_bit_le()
find_first_zero_bit_le() is an alias to find_next_zero_bit_le(), despite that 'next' is known to be slower than 'first' version.
Now that we have common FIND_FIRST_BIT() macro helper, it's trivial to implement find_first_zero_bit_le() as a real function.
Reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
#
58414bbb |
| 14-Sep-2022 |
Yury Norov <yury.norov@gmail.com> |
lib/find_bit: introduce FIND_FIRST_BIT() macro
Now that we have many flavors of find_first_bit(), and expect even more, it's better to have one macro that generates optimal code for all and makes ma
lib/find_bit: introduce FIND_FIRST_BIT() macro
Now that we have many flavors of find_first_bit(), and expect even more, it's better to have one macro that generates optimal code for all and makes maintaining of slightly different functions simpler.
The logic common to all versions is moved to the new macro, and all the flavors are generated by providing an FETCH macro-parameter, like in this example:
#define FIND_FIRST_BIT(FETCH, MUNGE, size) ...
find_first_ornot_and_bit(addr1, addr2, addr3, size) { return FIND_FIRST_BIT(addr1[idx] | ~addr2[idx] & addr3[idx], /* nop */, size); }
The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx.
MUNGE is here to support _le code generation for BE builds. May be empty.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
show more ...
|
Revision tags: v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
03ab8e62 |
| 31-May-2022 |
Konstantin Komarov <almaz.alexandrovich@paragon-software.com> |
Merge tag 'v5.18'
Linux 5.18
|
Revision tags: v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
de4fb176 |
| 01-Apr-2022 |
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> |
Merge branches 'fixes' and 'misc' into for-linus
|
Revision tags: v5.15.32 |
|
#
41237041 |
| 23-Mar-2022 |
Jiri Kosina <jkosina@suse.cz> |
Merge branch 'for-5.18/apple' into for-linus
- Apple magic keyboard support improvements for newer models (José Expósito) - Apple T2 Macs support improvements (Aun-Ali Zaidi, Paul Pawlowski)
|
Revision tags: v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26 |
|
#
1136fa0c |
| 01-Mar-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.17-rc4' into for-linus
Merge with mainline to get the Intel ASoC generic helpers header and other changes.
|
Revision tags: v5.15.25 |
|
#
986c6f7c |
| 18-Feb-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.17-rc4' into next
Sync up with mainline to get the latest changes in HID subsystem.
|
Revision tags: v5.15.24, v5.15.23, v5.15.22 |
|
#
542898c5 |
| 07-Feb-2022 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
First backmerge into drm-misc-next. Required for more helpers backmerged, and to pull in 5.17 (rc2).
Signed-off-by: Maarten Lankhorst
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
First backmerge into drm-misc-next. Required for more helpers backmerged, and to pull in 5.17 (rc2).
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
show more ...
|
Revision tags: v5.15.21, v5.15.20 |
|
#
7e6a6b40 |
| 04-Feb-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SEr
Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been delivered
- Workaround for Cortex-A510's single-step[ erratum
show more ...
|
#
876f7a43 |
| 03-Feb-2022 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Backmerge to bring in 5.17-rc2 to introduce a common baseline to merge i915_regs changes from drm-intel-next.
Signed-off-by: Joonas Lahtinen <joonas.lahtin
Merge drm/drm-next into drm-intel-gt-next
Backmerge to bring in 5.17-rc2 to introduce a common baseline to merge i915_regs changes from drm-intel-next.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v5.15.19 |
|
#
063565ac |
| 31-Jan-2022 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Viv
Merge drm/drm-next into drm-intel-next
Catch-up with 5.17-rc2 and trying to align with drm-intel-gt-next for a possible topic branch for merging the split of i915_regs...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
Revision tags: v5.15.18 |
|
#
72d044e4 |
| 27-Jan-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
Revision tags: v5.15.17 |
|
#
48ee4835 |
| 26-Jan-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging drm/drm-fixes into drm-misc-fixes for v5.17-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
3689f9f8 |
| 22-Jan-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where p
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
f68edc92 |
| 14-Aug-2021 |
Yury Norov <yury.norov@gmail.com> |
lib: add find_first_and_bit()
Currently find_first_and_bit() is an alias to find_next_and_bit(). However, it is widely used in cpumask, so it worth to optimize it. This patch adds its own implementa
lib: add find_first_and_bit()
Currently find_first_and_bit() is an alias to find_next_and_bit(). However, it is widely used in cpumask, so it worth to optimize it. This patch adds its own implementation for find_first_and_bit().
On x86_64 find_bit_benchmark says:
Before (#define find_first_and_bit(...) find_next_and_bit(..., 0): Start testing find_bit() with random-filled bitmap [ 140.291468] find_first_and_bit: 46890919 ns, 32671 iterations Start testing find_bit() with sparse bitmap [ 140.295028] find_first_and_bit: 7103 ns, 1 iterations
After: Start testing find_bit() with random-filled bitmap [ 162.574907] find_first_and_bit: 25045813 ns, 32846 iterations Start testing find_bit() with sparse bitmap [ 162.578458] find_first_and_bit: 4900 ns, 1 iterations
(Thanks to Alexey Klimov for thorough testing.)
Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Alexey Klimov <aklimov@redhat.com>
show more ...
|
#
8be98d2f |
| 05-Sep-2021 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.15 merge window.
|
Revision tags: v5.10.53, v5.10.52, v5.10.51 |
|
#
320424c7 |
| 18-Jul-2021 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.13' into next
Sync up with the mainline to get the latest parport API.
|
Revision tags: v5.10.50, v5.10.49 |
|
#
5a94296b |
| 30-Jun-2021 |
Jiri Kosina <jkosina@suse.cz> |
Merge branch 'for-5.14/amd-sfh' into for-linus
- support for Renoir and Cezanne SoCs - support for Ambient Light Sensor - support for Human Presence Detection sensor
all from Basavaraj Natikar
|
Revision tags: v5.13, v5.10.46, v5.10.43 |
|
#
c441bfb5 |
| 09-Jun-2021 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.13-rc3' into asoc-5.13
Linux 5.13-rc3
|
Revision tags: v5.10.42 |
|
#
942baad2 |
| 02-Jun-2021 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Pulling in -rc2 fixes and TTM changes that next upcoming patches depend on.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
Revision tags: v5.10.41, v5.10.40, v5.10.39 |
|
#
c37fe6af |
| 18-May-2021 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.13-rc2' into spi-5.13
Linux 5.13-rc2
|
#
85ebe5ae |
| 18-May-2021 |
Tony Lindgren <tony@atomide.com> |
Merge branch 'fixes-rc1' into fixes
|
#
d22fe808 |
| 17-May-2021 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Time to get back in sync...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
a4345a7c |
| 17-May-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.13, take #1
- Fix regression with irqbypass not restarting the guest o
Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.13, take #1
- Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code
show more ...
|