History log of /openbmc/linux/arch/x86/crypto/Makefile (Results 76 – 100 of 181)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v3.5-rc6, v3.5-rc5, v3.5-rc4
# 596d8750 18-Jun-2012 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: serpent-sse2 - split generic glue code to new helper module

Now that serpent-sse2 glue code has been made generic, it can be split to
separate module.

Signed-off-by: Jussi Kivilinna <jussi.

crypto: serpent-sse2 - split generic glue code to new helper module

Now that serpent-sse2 glue code has been made generic, it can be split to
separate module.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# ffaf9156 18-Jun-2012 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code to shared module

Move ablk-* functions to separate module to share common code between cipher
implementations.

Signed-off

crypto: ablk_helper - move ablk_* functions from serpent-sse2/avx glue code to shared module

Move ablk-* functions to separate module to share common code between cipher
implementations.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.5-rc3
# 7efe4076 12-Jun-2012 Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>

crypto: serpent - add x86_64/avx assembler implementation

This patch adds a x86_64/avx assembler implementation of the Serpent block
cipher. The implementation is very similar to the sse2 implementa

crypto: serpent - add x86_64/avx assembler implementation

This patch adds a x86_64/avx assembler implementation of the Serpent block
cipher. The implementation is very similar to the sse2 implementation and
processes eight blocks in parallel. Because of the new non-destructive three
operand syntax all move-instructions can be removed and therefore a little
performance increase is provided.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)

serpent-avx-x86_64 vs. serpent-sse2-x86_64
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.03x 1.01x 1.01x 1.01x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x
64B 1.00x 1.00x 1.00x 1.00x 1.00x 0.99x 1.00x 1.01x 1.00x 1.00x
256B 1.05x 1.03x 1.00x 1.02x 1.05x 1.06x 1.05x 1.02x 1.05x 1.02x
1024B 1.05x 1.02x 1.00x 1.02x 1.05x 1.06x 1.05x 1.03x 1.05x 1.02x
8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.03x 1.04x 1.02x

256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.01x 1.00x 1.01x 1.01x 1.00x 1.00x 0.99x 1.03x 1.01x 1.01x
64B 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.00x 1.01x 1.00x 1.02x
256B 1.05x 1.02x 1.00x 1.02x 1.05x 1.02x 1.04x 1.05x 1.05x 1.02x
1024B 1.06x 1.02x 1.00x 1.02x 1.07x 1.06x 1.05x 1.04x 1.05x 1.02x
8192B 1.05x 1.02x 1.00x 1.02x 1.06x 1.06x 1.04x 1.05x 1.05x 1.02x

serpent-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.26x 1.73x
ecb-dec 1.20x 1.64x
cbc-enc 0.33x 0.45x
cbc-dec 1.24x 1.67x
ctr-enc 1.32x 1.76x
ctr-dec 1.32x 1.76x
lrw-enc 1.20x 1.60x
lrw-dec 1.15x 1.54x
xts-enc 1.22x 1.64x
xts-dec 1.17x 1.57x

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.5-rc2, v3.5-rc1
# 107778b5 28-May-2012 Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>

crypto: twofish - add x86_64/avx assembler implementation

This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (tw

crypto: twofish - add x86_64/avx assembler implementation

This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose registers.
For small blocksizes the 3way-parallel functions from the twofish-x86_64-3way
module are called. A good performance increase is provided for blocksizes
greater or equal to 128B.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

Intel Core i5-2500 CPU (fam:6, model:42, step:7)

twofish-avx-x86_64 vs. twofish-x86_64-3way
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.97x 1.00x 0.95x 0.97x 0.97x 0.96x 0.95x 0.95x 0.98x
64B 0.99x 0.99x 1.00x 0.99x 0.98x 0.98x 0.99x 0.98x 0.99x 0.98x
256B 1.20x 1.21x 1.00x 1.19x 1.15x 1.14x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.24x 1.26x 1.28x 1.26x 1.27x
8192B 1.31x 1.32x 1.00x 1.31x 1.25x 1.25x 1.28x 1.29x 1.28x 1.30x

256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 0.96x 0.96x 1.00x 0.96x 0.97x 0.98x 0.95x 0.95x 0.95x 0.96x
64B 1.00x 0.99x 1.00x 0.98x 0.98x 1.01x 0.98x 0.98x 0.98x 0.98x
256B 1.20x 1.21x 1.00x 1.21x 1.15x 1.15x 1.19x 1.20x 1.18x 1.19x
1024B 1.29x 1.30x 1.00x 1.28x 1.23x 1.23x 1.26x 1.27x 1.26x 1.27x
8192B 1.31x 1.33x 1.00x 1.31x 1.26x 1.26x 1.29x 1.29x 1.28x 1.30x

twofish-avx-x86_64 vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.19x 1.63x
ecb-dec 1.18x 1.62x
cbc-enc 0.75x 1.03x
cbc-dec 1.23x 1.67x
ctr-enc 1.24x 1.65x
ctr-dec 1.24x 1.65x
lrw-enc 1.15x 1.53x
lrw-dec 1.14x 1.52x
xts-enc 1.16x 1.56x
xts-dec 1.16x 1.56x

Signed-off-by: Johannes Goetzfried <Johannes.Goetzfried@informatik.stud.uni-erlangen.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# 65df5774 24-May-2012 Mathias Krause <minipli@googlemail.com>

crypto: sha1 - use Kbuild supplied flags for AVX test

Commit ea4d26ae ("raid5: add AVX optimized RAID5 checksumming")
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support i

crypto: sha1 - use Kbuild supplied flags for AVX test

Commit ea4d26ae ("raid5: add AVX optimized RAID5 checksumming")
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support in binutils based on the same test we have in x86/crypto/ right
now. To minimize duplication drop our implementation in favour to the
one in x86/.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7
# 0b95ec56 05-Mar-2012 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: camellia - add assembler implementation for x86_64

Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
functions are provided. First set is regular 'one-block at

crypto: camellia - add assembler implementation for x86_64

Patch adds x86_64 assembler implementation of Camellia block cipher. Two set of
functions are provided. First set is regular 'one-block at time' encrypt/decrypt
functions. Second is 'two-block at time' functions that gain performance increase
on out-of-order CPUs. Performance of 2-way functions should be equal to 1-way
functions with in-order CPUs.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmark results:

AMD Phenom II 1055T (fam:16, model:10):

camellia-asm vs camellia_generic:
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.27x 1.22x 1.30x 1.42x 1.30x 1.34x 1.19x 1.05x 1.23x 1.24x
64B 1.74x 1.79x 1.43x 1.87x 1.81x 1.87x 1.48x 1.38x 1.55x 1.62x
256B 1.90x 1.87x 1.43x 1.94x 1.94x 1.95x 1.63x 1.62x 1.67x 1.70x
1024B 1.96x 1.93x 1.43x 1.95x 1.98x 2.01x 1.67x 1.69x 1.74x 1.80x
8192B 1.96x 1.96x 1.39x 1.93x 2.01x 2.03x 1.72x 1.64x 1.71x 1.76x

256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.23x 1.23x 1.33x 1.39x 1.34x 1.38x 1.04x 1.18x 1.21x 1.29x
64B 1.72x 1.69x 1.42x 1.78x 1.81x 1.89x 1.57x 1.52x 1.56x 1.65x
256B 1.85x 1.88x 1.42x 1.86x 1.93x 1.96x 1.69x 1.65x 1.70x 1.75x
1024B 1.88x 1.86x 1.45x 1.95x 1.96x 1.95x 1.77x 1.71x 1.77x 1.78x
8192B 1.91x 1.86x 1.42x 1.91x 2.03x 1.98x 1.73x 1.71x 1.78x 1.76x

camellia-asm vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.15x 1.22x
ecb-dec 1.16x 1.16x
cbc-enc 0.85x 0.90x
cbc-dec 1.20x 1.23x
ctr-enc 1.28x 1.30x
ctr-dec 1.27x 1.28x
lrw-enc 1.12x 1.16x
lrw-dec 1.08x 1.10x
xts-enc 1.11x 1.15x
xts-dec 1.14x 1.15x

Intel Core2 T8100 (fam:6, model:23, step:6):

camellia-asm vs camellia_generic:
128bit key: (lrw:256bit) (xts:256bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.10x 1.12x 1.14x 1.16x 1.16x 1.15x 1.02x 1.02x 1.08x 1.08x
64B 1.61x 1.60x 1.17x 1.68x 1.67x 1.66x 1.43x 1.42x 1.44x 1.42x
256B 1.65x 1.73x 1.17x 1.77x 1.81x 1.80x 1.54x 1.53x 1.58x 1.54x
1024B 1.76x 1.74x 1.18x 1.80x 1.85x 1.85x 1.60x 1.59x 1.65x 1.60x
8192B 1.77x 1.75x 1.19x 1.81x 1.85x 1.86x 1.63x 1.61x 1.66x 1.62x

256bit key: (lrw:384bit) (xts:512bit)
size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec lrw-enc lrw-dec xts-enc xts-dec
16B 1.10x 1.07x 1.13x 1.16x 1.11x 1.16x 1.03x 1.02x 1.08x 1.07x
64B 1.61x 1.62x 1.15x 1.66x 1.63x 1.68x 1.47x 1.46x 1.47x 1.44x
256B 1.71x 1.70x 1.16x 1.75x 1.69x 1.79x 1.58x 1.57x 1.59x 1.55x
1024B 1.78x 1.72x 1.17x 1.75x 1.80x 1.80x 1.63x 1.62x 1.65x 1.62x
8192B 1.76x 1.73x 1.17x 1.78x 1.80x 1.81x 1.64x 1.62x 1.68x 1.64x

camellia-asm vs aes-asm (8kB block):
128bit 256bit
ecb-enc 1.17x 1.21x
ecb-dec 1.17x 1.20x
cbc-enc 0.80x 0.82x
cbc-dec 1.22x 1.24x
ctr-enc 1.25x 1.26x
ctr-dec 1.25x 1.26x
lrw-enc 1.14x 1.18x
lrw-dec 1.13x 1.17x
xts-enc 1.14x 1.18x
xts-dec 1.14x 1.17x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2
# 251496db 09-Nov-2011 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation

Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in four block chunks.

Patch h

crypto: serpent - add 4-way parallel i586/SSE2 assembler implementation

Patch adds i586/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in four block chunks.

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

Intel Atom N270:

size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16 0.95x 1.12x 1.02x 1.07x 0.97x 0.98x
64 1.73x 1.82x 1.08x 1.82x 1.72x 1.73x
256 2.08x 2.00x 1.04x 2.07x 1.99x 2.01x
1024 2.28x 2.18x 1.05x 2.23x 2.17x 2.20x
8192 2.28x 2.13x 1.05x 2.23x 2.18x 2.20x

Full output:
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-generic.txt
http://koti.mbnet.fi/axh/kernel/crypto/atom-n270/serpent-sse2.txt

Userspace test results:

Encryption/decryption of sse2-i586 vs generic on Intel Atom N270:
encrypt: 2.35x
decrypt: 2.54x

Encryption/decryption of sse2-i586 vs generic on AMD Phenom II:
encrypt: 1.82x
decrypt: 2.51x

Encryption/decryption of sse2-i586 vs generic on Intel Xeon E7330:
encrypt: 2.99x
decrypt: 3.48x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# 937c30d7 09-Nov-2011 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: serpent - add 8-way parallel x86_64/SSE2 assembler implementation

Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in eigth block chunks (two

crypto: serpent - add 8-way parallel x86_64/SSE2 assembler implementation

Patch adds x86_64/SSE2 assembler implementation of serpent cipher. Assembler
functions crypt data in eigth block chunks (two 4 block chunk SSE2 operations
in parallel to improve performance on out-of-order CPUs). Glue code is based
on one from AES-NI implementation, so requests from irq context are redirected
to cryptd.

v2:
- add missing include of linux/module.h
(appearently crypto.h used to include module.h, which changed for 3.2 by
commit 7c926402a7e8c9b279968fd94efec8700ba3859e)

Patch has been tested with tcrypt and automated filesystem tests.

Tcrypt benchmarks results (serpent-sse2/serpent_generic speed ratios):

AMD Phenom II 1055T (fam:16, model:10):

size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B 1.03x 1.01x 1.03x 1.05x 1.00x 0.99x
64B 1.00x 1.01x 1.02x 1.04x 1.02x 1.01x
256B 2.34x 2.41x 0.99x 2.43x 2.39x 2.40x
1024B 2.51x 2.57x 1.00x 2.59x 2.56x 2.56x
8192B 2.50x 2.54x 1.00x 2.55x 2.57x 2.57x

Intel Celeron T1600 (fam:6, model:15, step:13):

size ecb-enc ecb-dec cbc-enc cbc-dec ctr-enc ctr-dec
16B 0.97x 0.97x 1.01x 1.01x 1.01x 1.02x
64B 1.00x 1.00x 1.00x 1.02x 1.01x 1.01x
256B 3.41x 3.35x 1.00x 3.39x 3.42x 3.44x
1024B 3.75x 3.72x 0.99x 3.74x 3.75x 3.75x
8192B 3.70x 3.68x 0.99x 3.68x 3.69x 3.69x

Full output:
http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-generic.txt
http://koti.mbnet.fi/axh/kernel/crypto/phenom-ii-1055t/serpent-sse2.txt
http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-generic.txt
http://koti.mbnet.fi/axh/kernel/crypto/celeron-t1600/serpent-sse2.txt

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8
# 8280daad 26-Sep-2011 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: twofish - add 3-way parallel x86_64 assembler implemention

Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three bloc

crypto: twofish - add 3-way parallel x86_64 assembler implemention

Patch adds 3-way parallel x86_64 assembly implementation of twofish as new
module. New assembler functions crypt data in three blocks chunks, improving
cipher performance on out-of-order CPUs.

Patch has been tested with tcrypt and automated filesystem tests.

Summary of the tcrypt benchmarks:

Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB)
encrypt: 1.3x speed
decrypt: 1.3x speed

Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC)
encrypt: 1.07x speed
decrypt: 1.4x speed

Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR)
encrypt: 1.4x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block ECB)
encrypt: 1.0x speed
decrypt: 1.0x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block CBC)
encrypt: 0.84x speed
decrypt: 1.09x speed

Twofish 3-way-asm vs AES asm (128bit 8kb block CTR)
encrypt: 1.15x speed

Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt

Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor

Also userspace test were run on:
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz
stepping : 11

Userspace test results:

Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II:
encrypt: 1.27x
decrypt: 1.25x

Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330:
encrypt: 1.36x
decrypt: 1.36x

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.1-rc7, v3.1-rc6, v3.1-rc5
# 64b94cea 01-Sep-2011 Jussi Kivilinna <jussi.kivilinna@mbnet.fi>

crypto: blowfish - add x86_64 assembly implementation

Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
enc

crypto: blowfish - add x86_64 assembly implementation

Patch adds x86_64 assembly implementation of blowfish. Two set of assembler
functions are provided. First set is regular 'one-block at time'
encrypt/decrypt functions. Second is 'four-block at time' functions that
gain performance increase on out-of-order CPUs. Performance of 4-way
functions should be equal to 1-way functions with in-order CPUs.

Summary of the tcrypt benchmarks:

Blowfish assembler vs blowfish C (256bit 8kb block ECB)
encrypt: 2.2x speed
decrypt: 2.3x speed

Blowfish assembler vs blowfish C (256bit 8kb block CBC)
encrypt: 1.12x speed
decrypt: 2.5x speed

Blowfish assembler vs blowfish C (256bit 8kb block CTR)
encrypt: 2.5x speed

Full output:
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-asm-x86_64.txt
http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-blowfish-c-x86_64.txt

Tests were run on:
vendor_id : AuthenticAMD
cpu family : 16
model : 10
model name : AMD Phenom(tm) II X6 1055T Processor
stepping : 0

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1
# 66be8951 04-Aug-2011 Mathias Krause <minipli@googlemail.com>

crypto: sha1 - SSSE3 based SHA1 implementation for x86-64

This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced

crypto: sha1 - SSSE3 based SHA1 implementation for x86-64

This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

frame size sha1-generic sha1-ssse3 delta
64 byte 37.5 MBit/s 37.5 MBit/s 0.0%
128 byte 56.3 MBit/s 62.5 MBit/s +11.0%
256 byte 87.5 MBit/s 100.0 MBit/s +14.3%
512 byte 131.3 MBit/s 150.0 MBit/s +14.2%
1024 byte 162.5 MBit/s 193.8 MBit/s +19.3%
1280 byte 175.0 MBit/s 212.5 MBit/s +21.4%
1420 byte 175.0 MBit/s 218.7 MBit/s +25.0%
1518 byte 150.0 MBit/s 181.2 MBit/s +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1, v2.6.39
# b23b6451 16-May-2011 Andy Lutomirski <luto@mit.edu>

crypto: aesni-intel - Merge with fpu.ko

Loading fpu without aesni-intel does nothing. Loading aesni-intel
without fpu causes modes like xts to fail. (Unloading
aesni-intel will restore those modes

crypto: aesni-intel - Merge with fpu.ko

Loading fpu without aesni-intel does nothing. Loading aesni-intel
without fpu causes modes like xts to fail. (Unloading
aesni-intel will restore those modes.)

One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.

This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390

Signed-off-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.39-rc7, v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3, v2.6.39-rc2, v2.6.39-rc1, v2.6.38, v2.6.38-rc8, v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3, v2.6.38-rc2, v2.6.38-rc1, v2.6.37, v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6, v2.6.37-rc5, v2.6.37-rc4, v2.6.37-rc3, v2.6.37-rc2, v2.6.37-rc1, v2.6.36, v2.6.36-rc8, v2.6.36-rc7, v2.6.36-rc6, v2.6.36-rc5, v2.6.36-rc4, v2.6.36-rc3, v2.6.36-rc2, v2.6.36-rc1, v2.6.35, v2.6.35-rc6, v2.6.35-rc5, v2.6.35-rc4, v2.6.35-rc3, v2.6.35-rc2, v2.6.35-rc1, v2.6.34, v2.6.34-rc7, v2.6.34-rc6, v2.6.34-rc5, v2.6.34-rc4, v2.6.34-rc3, v2.6.34-rc2, v2.6.34-rc1, v2.6.33, v2.6.33-rc8, v2.6.33-rc7, v2.6.33-rc6, v2.6.33-rc5, v2.6.33-rc4, v2.6.33-rc3, v2.6.33-rc2, v2.6.33-rc1, v2.6.32, v2.6.32-rc8, v2.6.32-rc7, v2.6.32-rc6
# 0e1227d3 18-Oct-2009 Huang Ying <ying.huang@intel.com>

crypto: ghash - Add PCLMULQDQ accelerated implementation

PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
fou

crypto: ghash - Add PCLMULQDQ accelerated implementation

PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
found at:

http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

Because PCLMULQDQ changes XMM state, its usage must be enclosed with
kernel_fpu_begin/end, which can be used only in process context, the
acceleration is implemented as crypto_ahash. That is, request in soft
IRQ context will be defered to the cryptd kernel thread.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.32-rc5, v2.6.32-rc4, v2.6.32-rc3, v2.6.32-rc1, v2.6.32-rc2, v2.6.31, v2.6.31-rc9, v2.6.31-rc8, v2.6.31-rc7, v2.6.31-rc6, v2.6.31-rc5, v2.6.31-rc4, v2.6.31-rc3, v2.6.31-rc2, v2.6.31-rc1, v2.6.30, v2.6.30-rc8, v2.6.30-rc7, v2.6.30-rc6, v2.6.30-rc5, v2.6.30-rc4, v2.6.30-rc3, v2.6.30-rc2, v2.6.30-rc1
# 150c7e85 29-Mar-2009 Huang Ying <ying.huang@intel.com>

crypto: fpu - Add template for blkcipher touching FPU

Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
kernel_fpu_end(). If they are invoked in cipher algorithm
implementation, t

crypto: fpu - Add template for blkcipher touching FPU

Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
kernel_fpu_end(). If they are invoked in cipher algorithm
implementation, they will be invoked for each block, so that
performance will be hurt, because they are "slow" operations. This
patch implements "fpu" template, which makes these operations to be
invoked for each request.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.29, v2.6.29-rc8, v2.6.29-rc7, v2.6.29-rc6, v2.6.29-rc5, v2.6.29-rc4, v2.6.29-rc3
# 54b6a1bd 17-Jan-2009 Huang Ying <ying.huang@intel.com>

crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the

crypto: aes-ni - Add support to Intel AES-NI instructions for x86_64 platform

Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD)
instructions that are going to be introduced in the next generation of
Intel processor, as of 2009. These instructions enable fast and secure
data encryption and decryption, using the Advanced Encryption Standard
(AES), defined by FIPS Publication number 197. The architecture
introduces six instructions that offer full hardware support for
AES. Four of them support high performance data encryption and
decryption, and the other two instructions support the AES key
expansion procedure.

The white paper can be downloaded from:

http://softwarecommunity.intel.com/isn/downloads/intelavx/AES-Instructions-Set_WP.pdf

AES may be used in soft_irq context, but MMX/SSE context can not be
touched safely in soft_irq context. So in_interrupt() is checked, if
in IRQ or soft_irq context, the general x86_64 implementation are used
instead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.29-rc2, v2.6.29-rc1, v2.6.28, v2.6.28-rc9, v2.6.28-rc8, v2.6.28-rc7, v2.6.28-rc6, v2.6.28-rc5, v2.6.28-rc4, v2.6.28-rc3, v2.6.28-rc2, v2.6.28-rc1, v2.6.27, v2.6.27-rc9, v2.6.27-rc8, v2.6.27-rc7, v2.6.27-rc6, v2.6.27-rc5, v2.6.27-rc4, v2.6.27-rc3
# 8cb51ba8 06-Aug-2008 Austin Zhang <austin.zhang@intel.com>

crypto: crc32c - Use Intel CRC32 instruction

From NHM processor onward, Intel processors can support hardware accelerated
CRC32c algorithm with the new CRC32 instruction in SSE 4.2 instruction set.

crypto: crc32c - Use Intel CRC32 instruction

From NHM processor onward, Intel processors can support hardware accelerated
CRC32c algorithm with the new CRC32 instruction in SSE 4.2 instruction set.
The patch detects the availability of the feature, and chooses the most proper
way to calculate CRC32c checksum.
Byte code instructions are used for compiler compatibility.
No MMX / XMM registers is involved in the implementation.

Signed-off-by: Austin Zhang <austin.zhang@intel.com>
Signed-off-by: Kent Liu <kent.liu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.27-rc2, v2.6.27-rc1, v2.6.26, v2.6.26-rc9, v2.6.26-rc8, v2.6.26-rc7, v2.6.26-rc6, v2.6.26-rc5, v2.6.26-rc4, v2.6.26-rc3, v2.6.26-rc2, v2.6.26-rc1, v2.6.25, v2.6.25-rc9, v2.6.25-rc8, v2.6.25-rc7, v2.6.25-rc6, v2.6.25-rc5, v2.6.25-rc4, v2.6.25-rc3, v2.6.25-rc2, v2.6.25-rc1, v2.6.24, v2.6.24-rc8
# 15e7b445 14-Jan-2008 Sebastian Siewior <sebastian@breakpoint.cc>

[CRYPTO] twofish: Merge common glue code

There is almost no difference between 32 & 64 bit glue code.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@g

[CRYPTO] twofish: Merge common glue code

There is almost no difference between 32 & 64 bit glue code.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.24-rc7, v2.6.24-rc6
# 9a7dafbb 17-Dec-2007 Tan Swee Heng <thesweeheng@gmail.com>

[CRYPTO] salsa20: Add x86-64 assembly version

This is the x86-64 version of the Salsa20 stream cipher algorithm. The
original assembly code came from
<http://cr.yp.to/snuffle/salsa20/amd64-3/salsa20

[CRYPTO] salsa20: Add x86-64 assembly version

This is the x86-64 version of the Salsa20 stream cipher algorithm. The
original assembly code came from
<http://cr.yp.to/snuffle/salsa20/amd64-3/salsa20.s>. It has been
reformatted for clarity.

Signed-off-by: Tan Swee Heng <thesweeheng@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.24-rc5
# 974e4b75 10-Dec-2007 Tan Swee Heng <thesweeheng@gmail.com>

[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)

This patch contains the salsa20-i586 implementation. The original
assembly code came from
<http://cr.yp.to/snuffle/salsa20/x86-p

[CRYPTO] salsa20_i586: Salsa20 stream cipher algorithm (i586 version)

This patch contains the salsa20-i586 implementation. The original
assembly code came from
<http://cr.yp.to/snuffle/salsa20/x86-pm/salsa20.s>. I have reformatted
it (added indents) so that it matches the other algorithms in
arch/x86/crypto.

Signed-off-by: Tan Swee Heng <thesweeheng@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.24-rc4
# 06e1a8f0 29-Nov-2007 Sebastian Siewior <sebastian@breakpoint.cc>

[CRYPTO] aes-asm: Merge common glue code

32 bit and 64 bit glue code is using (now) the same
piece code. This patch unifies them.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-o

[CRYPTO] aes-asm: Merge common glue code

32 bit and 64 bit glue code is using (now) the same
piece code. This patch unifies them.

Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


Revision tags: v2.6.24-rc3, v2.6.24-rc2, v2.6.24-rc1
# 9b58aebc 23-Oct-2007 Thomas Gleixner <tglx@linutronix.de>

x86: merge arch/x86/crypto Makefiles

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 987c75d7 11-Oct-2007 Thomas Gleixner <tglx@linutronix.de>

x86_64: move crypto

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9c201942 11-Oct-2007 Thomas Gleixner <tglx@linutronix.de>

i386: move crypto

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6
# e6abef61 26-Mar-2020 Jason A. Donenfeld <Jason@zx2c4.com>

x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2

Now that the kernel specifies binutils 2.23 as the minimum version, we
can remove ifdefs for AVX2 and ADX throughout.

x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2

Now that the kernel specifies binutils 2.23 as the minimum version, we
can remove ifdefs for AVX2 and ADX throughout.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

show more ...


# d7e40ea8 26-Mar-2020 Masahiro Yamada <masahiroy@kernel.org>

crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'

poly1305-x86_64-cryptogams.S is a generated file, so it should be
cleaned up by 'make clean'.

Assigning it to

crypto: x86 - clean up poly1305-x86_64-cryptogams.S by 'make clean'

poly1305-x86_64-cryptogams.S is a generated file, so it should be
cleaned up by 'make clean'.

Assigning it to the variable 'targets' teaches Kbuild that it is a
generated file. However, this line is not evaluated when cleaning
because scripts/Makefile.clean does not include include/config/auto.conf.

Remove the ifneq-conditional, so this file is correctly cleaned up.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ingo Molnar <mingo@kernel.org>

show more ...


12345678