Searched hist:"2 fffee536c6875bdf546cee0045fed8faa5ea51f" (Results 1 – 2 of 2) sorted by relevance
/openbmc/linux/arch/arm64/crypto/ |
H A D | crct10dif-ce-core.S | diff 2fffee536c6875bdf546cee0045fed8faa5ea51f Mon Aug 27 10:38:12 CDT 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org> crypto: arm64/crct10dif - implement non-Crypto Extensions alternative
The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient.
So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
H A D | crct10dif-ce-glue.c | diff 2fffee536c6875bdf546cee0045fed8faa5ea51f Mon Aug 27 10:38:12 CDT 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org> crypto: arm64/crct10dif - implement non-Crypto Extensions alternative
The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient.
So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|