ab0f7746 | 24-Feb-2023 |
Andrew Jones <ajones@ventanamicro.com> |
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where 20 are branches. clear_page(), with Zicboz and a 64 byte block size, takes 169 to
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where 20 are branches. clear_page(), with Zicboz and a 64 byte block size, takes 169 total instructions, where 4 are branches and 33 are nops. Even though the block size is a variable, thanks to alternatives, we can still implement a Duff device without having to do any preliminary calculations. This is achieved by using the alternatives' cpufeature value (the upper 16 bits of patch_id). The value used is the maximum zicboz block size order accepted at the patch site. This enables us to stop patching / unrolling when 4K bytes have been zeroed (we would loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to only loop a few times and larger block sizes to not loop at all. Since cbo.zero doesn't take an offset, we also need an 'add' after each instruction, making the loop body 112 to 160 bytes. Hopefully this is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
81a1dd10 | 28-Feb-2023 |
Björn Töpel <bjorn@rivosinc.com> |
riscv, lib: Fix Zbb strncmp
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B per iteration, and a slow that does one byte per iteration.
The idea is to compare aligned XLEN ch
riscv, lib: Fix Zbb strncmp
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B per iteration, and a slow that does one byte per iteration.
The idea is to compare aligned XLEN chunks for most of strings, and do the remainder tail in the slow path.
The Zbb strncmp has two issues in the fast path:
Incorrect remainder handling (wrong compare): Assume that the string length is 9. On 64b systems, the fast path should do one iteration, and one iteration in the slow path. Instead, both were done in the fast path, which lead to incorrect results. An example:
strncmp("/dev/vda", "/dev/", 5);
Correct by changing "bgt" to "bge".
Missing NULL checks in the second string: This could lead to incorrect results for:
strncmp("/dev/vda", "/dev/vda\0", 8);
Correct by adding an additional check.
Fixes: b6fcdb191e36 ("RISC-V: add zbb support to string functions") Suggested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
b6fcdb19 | 13-Jan-2023 |
Heiko Stuebner <heiko.stuebner@vrull.eu> |
RISC-V: add zbb support to string functions
Add handling for ZBB extension and add support for using it as a variant for optimized string functions.
Support for the Zbb-str-variants is limited to t
RISC-V: add zbb support to string functions
Add handling for ZBB extension and add support for using it as a variant for optimized string functions.
Support for the Zbb-str-variants is limited to the GNU-assembler for now, as LLVM has not yet acquired the functionality to selectively change the arch option in assembler code. This is still under review at https://reviews.llvm.org/D123515
Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu> Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu> Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230113212301.3534711-3-heiko@sntech.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
9d504f9a | 18-Nov-2021 |
Jisheng Zhang <jszhang@kernel.org> |
riscv: lib: uaccess: fold fixups into body
uaccess functions such __asm_copy_to_user(), __arch_copy_from_user() and __clear_user() place their exception fixups in the `.fixup` section without any c
riscv: lib: uaccess: fold fixups into body
uaccess functions such __asm_copy_to_user(), __arch_copy_from_user() and __clear_user() place their exception fixups in the `.fixup` section without any clear association with themselves. If we backtrace the fixup code, it will be symbolized as an offset from the nearest prior symbol.
Similar as arm64 does, we must move fixups into the body of the functions themselves, after the usual fast-path returns.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
ea196c54 | 20-Jul-2021 |
Akira Tsukamoto <akira.tsukamoto@gmail.com> |
riscv: __asm_copy_to-from_user: Fix: Typos in comments
Fixing typos and grammar mistakes and using more intuitive label name.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6ea
riscv: __asm_copy_to-from_user: Fix: Typos in comments
Fixing typos and grammar mistakes and using more intuitive label name.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|
d4b3e010 | 20-Jul-2021 |
Akira Tsukamoto <akira.tsukamoto@gmail.com> |
riscv: __asm_copy_to-from_user: Remove unnecessary size check
Clean up:
The size of 0 will be evaluated in the next step. Not required here.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.c
riscv: __asm_copy_to-from_user: Remove unnecessary size check
Clean up:
The size of 0 will be evaluated in the next step. Not required here.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|
22b5f16f | 20-Jul-2021 |
Akira Tsukamoto <akira.tsukamoto@gmail.com> |
riscv: __asm_copy_to-from_user: Fix: fail on RV32
Had a bug when converting bytes to bits when the cpu was rv32.
The a3 contains the number of bytes and multiple of 8 would be the bits. The LGREG i
riscv: __asm_copy_to-from_user: Fix: fail on RV32
Had a bug when converting bytes to bits when the cpu was rv32.
The a3 contains the number of bytes and multiple of 8 would be the bits. The LGREG is holding 2 for RV32 and 3 for RV32, so to achieve multiple of 8 it must always be constant 3. The 2 was mistakenly used for rv32.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com> Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
show more ...
|