History log of /openbmc/linux/arch/x86/lib/csum-partial_64.c (Results 1 – 25 of 151)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23
# 1b0b4c42 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.13' into dev-6.6

This is the 6.6.13 stable release


Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36
# 2f09679b 27-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

x86/csum: clean up `csum_partial' further

[ Upstream commit a476aae3f1dc78a162a0d2e7945feea7d2b29401 ]

Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`")
ended up improving the

x86/csum: clean up `csum_partial' further

[ Upstream commit a476aae3f1dc78a162a0d2e7945feea7d2b29401 ]

Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`")
ended up improving the code generation for the IP csum calculations, and
in particular special-casing the 40-byte case that is a hot case for
IPv6 headers.

It then had _another_ special case for the 64-byte unrolled loop, which
did two chains of 32-byte blocks, which allows modern CPU's to improve
performance by doing the chains in parallel thanks to renaming the carry
flag.

This just unifies the special cases and combines them into just one
single helper the 40-byte csum case, and replaces the 64-byte case by a
80-byte case that just does that single helper twice. It avoids having
all these different versions of inline assembly, and actually improved
performance further in my tests.

There was never anything magical about the 64-byte unrolled case, even
though it happens to be a common size (and typically is the cacheline
size).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 1078f257 24-Sep-2023 Noah Goldstein <goldstein.w.n@gmail.com>

x86/csum: Remove unnecessary odd handling

[ Upstream commit 5d4acb62853abac1da2deebcb1c1c5b79219bf3b ]

The special case for odd aligned buffers is unnecessary and mostly
just adds overhead. Aligned

x86/csum: Remove unnecessary odd handling

[ Upstream commit 5d4acb62853abac1da2deebcb1c1c5b79219bf3b ]

The special case for odd aligned buffers is unnecessary and mostly
just adds overhead. Aligned buffers is the expectations, and even for
unaligned buffer, the only case that was helped is if the buffer was
1-byte from word aligned which is ~1/7 of the cases. Overall it seems
highly unlikely to be worth to extra branch.

It was left in the previous perf improvement patch because I was
erroneously comparing the exact output of `csum_partial(...)`, but
really we only need `csum_fold(csum_partial(...))` to match so its
safe to remove.

All csum kunit tests pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Laight <david.laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 2612e3bb 07-Aug-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo V

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# 9f771739 07-Aug-2023 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/1

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/121735/

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


# 61b73694 24-Jul-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.5-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 0791faeb 17-Jul-2023 Mark Brown <broonie@kernel.org>

ASoC: Merge v6.5-rc2

Get a similar baseline to my other branches, and fixes for people using
the branch.


# 2f98e686 11-Jul-2023 Maxime Ripard <mripard@kernel.org>

Merge v6.5-rc1 into drm-misc-fixes

Boris needs 6.5-rc1 in drm-misc-fixes to prevent a conflict.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# 44f10dbe 30-Jun-2023 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-hotfixes-stable


# 4baa098a 27-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

- Remove the local symbols prefix of the get/put_user() exception

Merge tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Borislav Petkov:

- Remove the local symbols prefix of the get/put_user() exception
handling symbols so that tools do not get confused by the presence of
code belonging to the wrong symbol/not belonging to any symbol

- Improve csum_partial()'s performance

- Some improvements to the kcpuid tool

* tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/lib: Make get/put_user() exception handling a visible symbol
x86/csum: Fix clang -Wuninitialized in csum_partial()
x86/csum: Improve performance of `csum_partial`
tools/x86/kcpuid: Add .gitignore
tools/x86/kcpuid: Dump the correct CPUID function in error

show more ...


Revision tags: v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31
# 2fe1e67e 26-May-2023 Nathan Chancellor <nathan@kernel.org>

x86/csum: Fix clang -Wuninitialized in csum_partial()

Clang warns:

arch/x86/lib/csum-partial_64.c:74:20: error: variable 'result' is uninitialized when used here [-Werror,-Wuninitialized]

x86/csum: Fix clang -Wuninitialized in csum_partial()

Clang warns:

arch/x86/lib/csum-partial_64.c:74:20: error: variable 'result' is uninitialized when used here [-Werror,-Wuninitialized]
return csum_tail(result, temp64, odd);
^~~~~~
arch/x86/lib/csum-partial_64.c:48:22: note: initialize the variable 'result' to silence this warning
unsigned odd, result;
^
= 0
1 error generated.

The only initialization and uses of result in csum_partial() were moved
into csum_tail() but result is still being passed by value to
csum_tail() (clang's -Wuninitialized does not do interprocedural
analysis to realize that result is always assigned in csum_tail()
however). Sink the declaration of result into csum_tail() to clear up
the warning.

Closes: https://lore.kernel.org/202305262039.3HUYjWJk-lkp@intel.com/
Fixes: 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230526-csum_partial-wuninitialized-v1-1-ebc0108dcec1%40kernel.org

show more ...


Revision tags: v6.1.30, v6.1.29, v6.1.28
# 688eb819 10-May-2023 Noah Goldstein <goldstein.w.n@gmail.com>

x86/csum: Improve performance of `csum_partial`

1) Add special case for len == 40 as that is the hottest value. The
nets a ~8-9% latency improvement and a ~30% throughput improvement
in the le

x86/csum: Improve performance of `csum_partial`

1) Add special case for len == 40 as that is the hottest value. The
nets a ~8-9% latency improvement and a ~30% throughput improvement
in the len == 40 case.

2) Use multiple accumulators in the 64-byte loop. This dramatically
improves ILP and results in up to a 40% latency/throughput
improvement (better for more iterations).

Results from benchmarking on Icelake. Times measured with rdtsc()
len lat_new lat_old r tput_new tput_old r
8 3.58 3.47 1.032 3.58 3.51 1.021
16 4.14 4.02 1.028 3.96 3.78 1.046
24 4.99 5.03 0.992 4.23 4.03 1.050
32 5.09 5.08 1.001 4.68 4.47 1.048
40 5.57 6.08 0.916 3.05 4.43 0.690
48 6.65 6.63 1.003 4.97 4.69 1.059
56 7.74 7.72 1.003 5.22 4.95 1.055
64 6.65 7.22 0.921 6.38 6.42 0.994
96 9.43 9.96 0.946 7.46 7.54 0.990
128 9.39 12.15 0.773 8.90 8.79 1.012
200 12.65 18.08 0.699 11.63 11.60 1.002
272 15.82 23.37 0.677 14.43 14.35 1.005
440 24.12 36.43 0.662 21.57 22.69 0.951
952 46.20 74.01 0.624 42.98 53.12 0.809
1024 47.12 78.24 0.602 46.36 58.83 0.788
1552 72.01 117.30 0.614 71.92 96.78 0.743
2048 93.07 153.25 0.607 93.28 137.20 0.680
2600 114.73 194.30 0.590 114.28 179.32 0.637
3608 156.34 268.41 0.582 154.97 254.02 0.610
4096 175.01 304.03 0.576 175.89 292.08 0.602

There is no such thing as a free lunch, however, and the special case
for len == 40 does add overhead to the len != 40 cases. This seems to
amount to be ~5% throughput and slightly less in terms of latency.

Testing:
Part of this change is a new kunit test. The tests check all
alignment X length pairs in [0, 64) X [0, 512).
There are three cases.
1) Precomputed random inputs/seed. The expected results where
generated use the generic implementation (which is assumed to be
non-buggy).
2) An input of all 1s. The goal of this test is to catch any case
a carry is missing.
3) An input that never carries. The goal of this test si to catch
any case of incorrectly carrying.

More exhaustive tests that test all alignment X length pairs in
[0, 8192) X [0, 8192] on random data are also available here:
https://github.com/goldsteinn/csum-reproduction

The reposity also has the code for reproducing the above benchmark
numbers.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com

show more ...


Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13
# 4f2c0a4a 13-Dec-2022 Nick Terrell <terrelln@fb.com>

Merge branch 'main' into zstd-linus


Revision tags: v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4
# 14e77332 21-Oct-2022 Nick Terrell <terrelln@fb.com>

Merge branch 'main' into zstd-next


Revision tags: v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59
# 8bb5e7f4 02-Aug-2022 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 5.20 (or 6.0) merge window.


Revision tags: v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45
# 03ab8e62 31-May-2022 Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

Merge tag 'v5.18'

Linux 5.18


Revision tags: v5.15.44
# 690e1790 27-May-2022 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v5.18' into next

Sync up with mainline to get updates to OMAP4 keypad driver and other
upstream goodies.


Revision tags: v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34
# 651a8879 13-Apr-2022 Takashi Iwai <tiwai@suse.de>

Merge branch 'topic/cs35l41' into for-next

Pull CS35L41 codec updates

Signed-off-by: Takashi Iwai <tiwai@suse.de>


# c16c8bfa 12-Apr-2022 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Pull in TTM changes needed for DG2 CCS enabling from Ram.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 83970cd6 11-Apr-2022 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync up with v5.18-rc1, in particular to get 5e3094cfd9fb
("drm/i915/xehpsdv: Add has_flat_ccs to device info").

Signed-off-by: Jani Nikula <jani.nikula@intel

Merge drm/drm-next into drm-intel-next

Sync up with v5.18-rc1, in particular to get 5e3094cfd9fb
("drm/i915/xehpsdv: Add has_flat_ccs to device info").

Signed-off-by: Jani Nikula <jani.nikula@intel.com>

show more ...


Revision tags: v5.15.33
# 9cbbd694 05-Apr-2022 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-next into drm-misc-next

Let's start the 5.19 development cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


# 0aea30a0 19-Apr-2022 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v5.18-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.18

A collection of fixes that came in since the merge window, plus

Merge tag 'asoc-fix-v5.18-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.18

A collection of fixes that came in since the merge window, plus one new
device ID for an x86 laptop. Nothing that really stands out with
particularly big impact outside of the affected device.

show more ...


# cf5c5763 05-Apr-2022 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-fixes into drm-misc-fixes

Let's start the 5.18 fixes cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


# 88e6c020 01-Apr-2022 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs updates from Al Viro:
"Assorted bits and pieces"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/ker

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs updates from Al Viro:
"Assorted bits and pieces"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: drop needless assignment in aio_read()
clean overflow checks in count_mounts() a bit
seq_file: fix NULL pointer arithmetic warning
uml/x86: use x86 load_unaligned_zeropad()
asm/user.h: killed unused macros
constify struct path argument of finish_automount()/do_add_mount()
fs: Remove FIXME comment in generic_write_checks()

show more ...


# de4fb176 01-Apr-2022 Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Merge branches 'fixes' and 'misc' into for-linus


1234567