Home
last modified time | relevance | path

Searched hist:"2 fffee536c6875bdf546cee0045fed8faa5ea51f" (Results 1 – 2 of 2) sorted by relevance

/openbmc/linux/arch/arm64/crypto/
H A Dcrct10dif-ce-core.Sdiff 2fffee536c6875bdf546cee0045fed8faa5ea51f Mon Aug 27 10:38:12 CDT 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org> crypto: arm64/crct10dif - implement non-Crypto Extensions alternative

The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit
polynomial multiplication instructions, which are optional in the
architecture, and if these instructions are not available, we fall back
to the C routine which is slow and inefficient.

So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that
uses a sequence of ~40 instructions involving 8x8 bit PMULL and some
shifting and masking. This is a lot slower than the original, but it is
still twice as fast as the current [unoptimized] C code on Cortex-A53,
and it is time invariant and much easier on the D-cache.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
H A Dcrct10dif-ce-glue.cdiff 2fffee536c6875bdf546cee0045fed8faa5ea51f Mon Aug 27 10:38:12 CDT 2018 Ard Biesheuvel <ard.biesheuvel@linaro.org> crypto: arm64/crct10dif - implement non-Crypto Extensions alternative

The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit
polynomial multiplication instructions, which are optional in the
architecture, and if these instructions are not available, we fall back
to the C routine which is slow and inefficient.

So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that
uses a sequence of ~40 instructions involving 8x8 bit PMULL and some
shifting and masking. This is a lot slower than the original, but it is
still twice as fast as the current [unoptimized] C code on Cortex-A53,
and it is time invariant and much easier on the D-cache.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>