f2091321 | 06-Sep-2023 |
WANG Xuerui <git@xen0n.name> |
raid6: Add LoongArch SIMD recovery implementation
Similar to the syndrome calculation, the recovery algorithms also work on 64 bytes at a time to align with the L1 cache line size of current and fut
raid6: Add LoongArch SIMD recovery implementation
Similar to the syndrome calculation, the recovery algorithms also work on 64 bytes at a time to align with the L1 cache line size of current and future LoongArch cores (that we care about). Which means unrolled-by-4 LSX and unrolled-by-2 LASX code.
The assembly is originally based on the x86 SSSE3/AVX2 ports, but register allocation has been redone to take advantage of LSX/LASX's 32 vector registers, and instruction sequence has been optimized to suit (e.g. LoongArch can perform per-byte srl and andi on vectors, but x86 cannot).
Performance numbers measured by instrumenting the raid6test code, on a 3A5000 system clocked at 2.5GHz:
> lasx 2data: 354.987 MiB/s > lasx datap: 350.430 MiB/s > lsx 2data: 340.026 MiB/s > lsx datap: 337.318 MiB/s > intx1 2data: 164.280 MiB/s > intx1 datap: 187.966 MiB/s
Because recovery algorithms are chosen solely based on priority and availability, lasx is marked as priority 2 and lsx priority 1. At least for the current generation of LoongArch micro-architectures, LASX should always be faster than LSX whenever supported, and have similar power consumption characteristics (because the only known LASX-capable uarch, the LA464, always compute the full 256-bit result for vector ops).
Acked-by: Song Liu <song@kernel.org> Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
7b3c70c4 | 31-Jul-2023 |
WANG Xuerui <git@xen0n.name> |
raid6: test: only check for Altivec if building on powerpc hosts
Altivec is only available for powerpc hosts, so only check for its availability when the host is powerpc, to avoid error messages bei
raid6: test: only check for Altivec if building on powerpc hosts
Altivec is only available for powerpc hosts, so only check for its availability when the host is powerpc, to avoid error messages being shown on architectures other than x86, arm or powerpc.
Signed-off-by: WANG Xuerui <git@xen0n.name> Link: https://lore.kernel.org/r/20230731104911.411964-6-kernel@xen0n.name Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
6601f5e1 | 31-Jul-2023 |
WANG Xuerui <git@xen0n.name> |
raid6: test: make sure all intermediate and artifact files are .gitignored
Currently when the raid6test utility is built, the resulting binary and an int.uc file are not being ignored, which can get
raid6: test: make sure all intermediate and artifact files are .gitignored
Currently when the raid6test utility is built, the resulting binary and an int.uc file are not being ignored, which can get inadvertently committed as a result when one works on the raid6 code. Ignore them to make `git status` clean at all times.
Signed-off-by: WANG Xuerui <git@xen0n.name> Link: https://lore.kernel.org/r/20230731104911.411964-5-kernel@xen0n.name Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
2008d89f | 31-Jul-2023 |
WANG Xuerui <git@xen0n.name> |
raid6: test: cosmetic cleanups for the test Makefile
Use tabs/spaces consistently: hard tabs for marking recipe lines only, spaces for everything else.
Also, the OPTFLAGS declaration actually inclu
raid6: test: cosmetic cleanups for the test Makefile
Use tabs/spaces consistently: hard tabs for marking recipe lines only, spaces for everything else.
Also, the OPTFLAGS declaration actually included the tabs preceding the line comment, making compiler invocation lines unnecessarily long. As the entire block of declarations are meant for ad-hoc customization (otherwise they would probably make use of `?=` instead of `=`), move the "Adjust as desired" comment above the block too to fix the long invocation lines.
Signed-off-by: WANG Xuerui <git@xen0n.name> Link: https://lore.kernel.org/r/20230731104911.411964-4-kernel@xen0n.name Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
9dd6e1da | 31-Jul-2023 |
WANG Xuerui <git@xen0n.name> |
raid6: guard the tables.c include of <linux/export.h> with __KERNEL__
The export directives for the tables are already emitted with __KERNEL__ guards, but the <linux/export.h> include is not, causin
raid6: guard the tables.c include of <linux/export.h> with __KERNEL__
The export directives for the tables are already emitted with __KERNEL__ guards, but the <linux/export.h> include is not, causing errors when building the raid6test program. Guard this include too to fix the raid6test build.
Signed-off-by: WANG Xuerui <git@xen0n.name> Link: https://lore.kernel.org/r/20230731104911.411964-3-kernel@xen0n.name Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
5b401e4e | 08-Feb-2022 |
Paul Menzel <pmenzel@molgen.mpg.de> |
lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR
On Ubuntu 21.10 (ppc64le) building raid6test with gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 fails with the error below.
gcc -I.. -I ../../../include
lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR
On Ubuntu 21.10 (ppc64le) building raid6test with gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 fails with the error below.
gcc -I.. -I ../../../include -g -O2 \ -I../../../arch/powerpc/include -DCONFIG_ALTIVEC \ -c -o vpermxor1.o vpermxor1.c vpermxor1.c: In function ‘raid6_vpermxor1_gen_syndrome_real’: vpermxor1.c:64:29: error: expected string literal before ‘VPERMXOR’ 64 | asm(VPERMXOR(%0,%1,%2,%3):"=v"(wq0):"v"(gf_high), "v"(gf_low), "v"(wq0)); | ^~~~~~~~ make: *** [Makefile:58: vpermxor1.o] Error 1
So, include the header asm/ppc-opcode.h defining this macro also when not building the Linux kernel but only this too.
Cc: Matt Brown <matthew.brown.dev@gmail.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
633174a7 | 08-Feb-2022 |
Paul Menzel <pmenzel@molgen.mpg.de> |
lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the errors below:
$ cd lib/raid6/test/ $ make <stdi
lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the errors below:
$ cd lib/raid6/test/ $ make <stdin>:1:1: error: stray ‘\’ in program <stdin>:1:2: error: stray ‘#’ in program <stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \ before ‘<’ token
[...]
The errors come from the HAS_ALTIVEC test, which fails, and the POWER optimized versions are not built. That’s also reason nobody noticed on the other architectures.
GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release announcment:
> * WARNING: Backward-incompatibility! > Number signs (#) appearing inside a macro reference or function invocation > no longer introduce comments and should not be escaped with backslashes: > thus a call such as: > foo := $(shell echo '#') > is legal. Previously the number sign needed to be escaped, for example: > foo := $(shell echo '\#') > Now this latter will resolve to "\#". If you want to write makefiles > portable to both versions, assign the number sign to a variable: > H := \# > foo := $(shell echo '$H') > This was claimed to be fixed in 3.81, but wasn't, for some reason. > To detect this change search for 'nocomment' in the .FEATURES variable.
So, do the same as commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd files for future Make") and commit 929bef467771 ("bpf: Use $(pound) instead of \# in Makefiles") and define and use a $(pound) variable.
Reference for the change in make: https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57
Cc: Matt Brown <matthew.brown.dev@gmail.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
36dacddb | 05-Jan-2022 |
Dirk Müller <dmueller@suse.de> |
lib/raid6: Use strict priority ranking for pq gen() benchmarking
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2 and 3 variants of SSE2 are benchmarked on initialization, taking betwee
lib/raid6: Use strict priority ranking for pq gen() benchmarking
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2 and 3 variants of SSE2 are benchmarked on initialization, taking between 144-153 jiffies. Testing across a hardware pool of various generations of intel cpus I could not find a single case where SSE2 won over AVX2 or AVX512. There are cases where AVX2 wins over AVX512 however.
Change "prefer" into an integer priority field (similar to how recov selection works) to have more than one ranking level available, which is backwards compatible with existing behavior.
Give AVX2/512 variants higher priority over SSE2 in order to skip SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this saves in the order of 200ms of initialization time.
Signed-off-by: Dirk Müller <dmueller@suse.de> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Song Liu <song@kernel.org>
show more ...
|
92203b02 | 26-Mar-2020 |
Masahiro Yamada <masahiroy@kernel.org> |
x86: remove always-defined CONFIG_AS_SSSE3
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6a4 ("x86/raid6: correctly check for assembler capabilities").
We raise the minimal supported binutils v
x86: remove always-defined CONFIG_AS_SSSE3
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3e6a4 ("x86/raid6: correctly check for assembler capabilities").
We raise the minimal supported binutils version from time to time. The last bump was commit 1fb12b35e5ff ("kbuild: Raise the minimum required binutils version to 2.21").
I confirmed the code in $(call as-instr,...) can be assembled by the binutils 2.21 assembler and also by LLVM integrated assembler.
Remove CONFIG_AS_SSSE3, which is always defined.
I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors on non-x86 architectures.
lib/raid6/algos.c is built not only for the kernel but also for testing the library code from userspace. I added -DCONFIG_X86 to lib/raid6/test/Makefile to cator to this usecase.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
f591df3c | 19-Dec-2019 |
Zhengyuan Liu <liuzhengyuan@kylinos.cn> |
md/raid6: fix algorithm choice under larger PAGE_SIZE
There are several algorithms available for raid6 to generate xor and syndrome parity, including basic int1, int2 ... int32 and SIMD optimized im
md/raid6: fix algorithm choice under larger PAGE_SIZE
There are several algorithms available for raid6 to generate xor and syndrome parity, including basic int1, int2 ... int32 and SIMD optimized implementation like sse and neon. To test and choose the best algorithms at the initial stage, we need provide enough disk data to feed the algorithms. However, the disk number we provided depends on page size and gfmul table, seeing bellow:
const int disks = (65536/PAGE_SIZE) + 2;
So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk, as a result the chosed algorithm is not reliable. For example, on my arm64 machine with 64K page enabled, it will choose intx32 as the best one, although the NEON implementation is better.
This patch tries to fix the problem by defining a constant raid6 disk number to supporting arbitrary page size.
Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
show more ...
|