Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23 |
|
#
1b0b4c42 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.13' into dev-6.6
This is the 6.6.13 stable release
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36 |
|
#
2f09679b |
| 27-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
x86/csum: clean up `csum_partial' further
[ Upstream commit a476aae3f1dc78a162a0d2e7945feea7d2b29401 ]
Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") ended up improving the
x86/csum: clean up `csum_partial' further
[ Upstream commit a476aae3f1dc78a162a0d2e7945feea7d2b29401 ]
Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") ended up improving the code generation for the IP csum calculations, and in particular special-casing the 40-byte case that is a hot case for IPv6 headers.
It then had _another_ special case for the 64-byte unrolled loop, which did two chains of 32-byte blocks, which allows modern CPU's to improve performance by doing the chains in parallel thanks to renaming the carry flag.
This just unifies the special cases and combines them into just one single helper the 40-byte csum case, and replaces the 64-byte case by a 80-byte case that just does that single helper twice. It avoids having all these different versions of inline assembly, and actually improved performance further in my tests.
There was never anything magical about the 64-byte unrolled case, even though it happens to be a common size (and typically is the cacheline size).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1078f257 |
| 24-Sep-2023 |
Noah Goldstein <goldstein.w.n@gmail.com> |
x86/csum: Remove unnecessary odd handling
[ Upstream commit 5d4acb62853abac1da2deebcb1c1c5b79219bf3b ]
The special case for odd aligned buffers is unnecessary and mostly just adds overhead. Aligned
x86/csum: Remove unnecessary odd handling
[ Upstream commit 5d4acb62853abac1da2deebcb1c1c5b79219bf3b ]
The special case for odd aligned buffers is unnecessary and mostly just adds overhead. Aligned buffers is the expectations, and even for unaligned buffer, the only case that was helped is if the buffer was 1-byte from word aligned which is ~1/7 of the cases. Overall it seems highly unlikely to be worth to extra branch.
It was left in the previous perf improvement patch because I was erroneously comparing the exact output of `csum_partial(...)`, but really we only need `csum_fold(csum_partial(...))` to match so its safe to remove.
All csum kunit tests pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Laight <david.laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
2612e3bb |
| 07-Aug-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo V
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
#
9f771739 |
| 07-Aug-2023 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/1
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/121735/
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
#
61b73694 |
| 24-Jul-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v6.5-rc2.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
0791faeb |
| 17-Jul-2023 |
Mark Brown <broonie@kernel.org> |
ASoC: Merge v6.5-rc2
Get a similar baseline to my other branches, and fixes for people using the branch.
|
#
2f98e686 |
| 11-Jul-2023 |
Maxime Ripard <mripard@kernel.org> |
Merge v6.5-rc1 into drm-misc-fixes
Boris needs 6.5-rc1 in drm-misc-fixes to prevent a conflict.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
44f10dbe |
| 30-Jun-2023 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
#
4baa098a |
| 27-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Remove the local symbols prefix of the get/put_user() exception
Merge tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- Remove the local symbols prefix of the get/put_user() exception handling symbols so that tools do not get confused by the presence of code belonging to the wrong symbol/not belonging to any symbol
- Improve csum_partial()'s performance
- Some improvements to the kcpuid tool
* tag 'x86_misc_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib: Make get/put_user() exception handling a visible symbol x86/csum: Fix clang -Wuninitialized in csum_partial() x86/csum: Improve performance of `csum_partial` tools/x86/kcpuid: Add .gitignore tools/x86/kcpuid: Dump the correct CPUID function in error
show more ...
|
Revision tags: v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31 |
|
#
2fe1e67e |
| 26-May-2023 |
Nathan Chancellor <nathan@kernel.org> |
x86/csum: Fix clang -Wuninitialized in csum_partial()
Clang warns:
arch/x86/lib/csum-partial_64.c:74:20: error: variable 'result' is uninitialized when used here [-Werror,-Wuninitialized]
x86/csum: Fix clang -Wuninitialized in csum_partial()
Clang warns:
arch/x86/lib/csum-partial_64.c:74:20: error: variable 'result' is uninitialized when used here [-Werror,-Wuninitialized] return csum_tail(result, temp64, odd); ^~~~~~ arch/x86/lib/csum-partial_64.c:48:22: note: initialize the variable 'result' to silence this warning unsigned odd, result; ^ = 0 1 error generated.
The only initialization and uses of result in csum_partial() were moved into csum_tail() but result is still being passed by value to csum_tail() (clang's -Wuninitialized does not do interprocedural analysis to realize that result is always assigned in csum_tail() however). Sink the declaration of result into csum_tail() to clear up the warning.
Closes: https://lore.kernel.org/202305262039.3HUYjWJk-lkp@intel.com/ Fixes: 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230526-csum_partial-wuninitialized-v1-1-ebc0108dcec1%40kernel.org
show more ...
|
Revision tags: v6.1.30, v6.1.29, v6.1.28 |
|
#
688eb819 |
| 10-May-2023 |
Noah Goldstein <goldstein.w.n@gmail.com> |
x86/csum: Improve performance of `csum_partial`
1) Add special case for len == 40 as that is the hottest value. The nets a ~8-9% latency improvement and a ~30% throughput improvement in the le
x86/csum: Improve performance of `csum_partial`
1) Add special case for len == 40 as that is the hottest value. The nets a ~8-9% latency improvement and a ~30% throughput improvement in the len == 40 case.
2) Use multiple accumulators in the 64-byte loop. This dramatically improves ILP and results in up to a 40% latency/throughput improvement (better for more iterations).
Results from benchmarking on Icelake. Times measured with rdtsc() len lat_new lat_old r tput_new tput_old r 8 3.58 3.47 1.032 3.58 3.51 1.021 16 4.14 4.02 1.028 3.96 3.78 1.046 24 4.99 5.03 0.992 4.23 4.03 1.050 32 5.09 5.08 1.001 4.68 4.47 1.048 40 5.57 6.08 0.916 3.05 4.43 0.690 48 6.65 6.63 1.003 4.97 4.69 1.059 56 7.74 7.72 1.003 5.22 4.95 1.055 64 6.65 7.22 0.921 6.38 6.42 0.994 96 9.43 9.96 0.946 7.46 7.54 0.990 128 9.39 12.15 0.773 8.90 8.79 1.012 200 12.65 18.08 0.699 11.63 11.60 1.002 272 15.82 23.37 0.677 14.43 14.35 1.005 440 24.12 36.43 0.662 21.57 22.69 0.951 952 46.20 74.01 0.624 42.98 53.12 0.809 1024 47.12 78.24 0.602 46.36 58.83 0.788 1552 72.01 117.30 0.614 71.92 96.78 0.743 2048 93.07 153.25 0.607 93.28 137.20 0.680 2600 114.73 194.30 0.590 114.28 179.32 0.637 3608 156.34 268.41 0.582 154.97 254.02 0.610 4096 175.01 304.03 0.576 175.89 292.08 0.602
There is no such thing as a free lunch, however, and the special case for len == 40 does add overhead to the len != 40 cases. This seems to amount to be ~5% throughput and slightly less in terms of latency.
Testing: Part of this change is a new kunit test. The tests check all alignment X length pairs in [0, 64) X [0, 512). There are three cases. 1) Precomputed random inputs/seed. The expected results where generated use the generic implementation (which is assumed to be non-buggy). 2) An input of all 1s. The goal of this test is to catch any case a carry is missing. 3) An input that never carries. The goal of this test si to catch any case of incorrectly carrying.
More exhaustive tests that test all alignment X length pairs in [0, 8192) X [0, 8192] on random data are also available here: https://github.com/goldsteinn/csum-reproduction
The reposity also has the code for reproducing the above benchmark numbers.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com
show more ...
|
Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13 |
|
#
4f2c0a4a |
| 13-Dec-2022 |
Nick Terrell <terrelln@fb.com> |
Merge branch 'main' into zstd-linus
|
Revision tags: v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4 |
|
#
14e77332 |
| 21-Oct-2022 |
Nick Terrell <terrelln@fb.com> |
Merge branch 'main' into zstd-next
|
Revision tags: v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59 |
|
#
8bb5e7f4 |
| 02-Aug-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.20 (or 6.0) merge window.
|
Revision tags: v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
03ab8e62 |
| 31-May-2022 |
Konstantin Komarov <almaz.alexandrovich@paragon-software.com> |
Merge tag 'v5.18'
Linux 5.18
|
Revision tags: v5.15.44 |
|
#
690e1790 |
| 27-May-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.18' into next
Sync up with mainline to get updates to OMAP4 keypad driver and other upstream goodies.
|
Revision tags: v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34 |
|
#
651a8879 |
| 13-Apr-2022 |
Takashi Iwai <tiwai@suse.de> |
Merge branch 'topic/cs35l41' into for-next
Pull CS35L41 codec updates
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
#
c16c8bfa |
| 12-Apr-2022 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Pull in TTM changes needed for DG2 CCS enabling from Ram.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
#
83970cd6 |
| 11-Apr-2022 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync up with v5.18-rc1, in particular to get 5e3094cfd9fb ("drm/i915/xehpsdv: Add has_flat_ccs to device info").
Signed-off-by: Jani Nikula <jani.nikula@intel
Merge drm/drm-next into drm-intel-next
Sync up with v5.18-rc1, in particular to get 5e3094cfd9fb ("drm/i915/xehpsdv: Add has_flat_ccs to device info").
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
Revision tags: v5.15.33 |
|
#
9cbbd694 |
| 05-Apr-2022 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-next into drm-misc-next
Let's start the 5.19 development cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
0aea30a0 |
| 19-Apr-2022 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v5.18-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.18
A collection of fixes that came in since the merge window, plus
Merge tag 'asoc-fix-v5.18-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.18
A collection of fixes that came in since the merge window, plus one new device ID for an x86 laptop. Nothing that really stands out with particularly big impact outside of the affected device.
show more ...
|
#
cf5c5763 |
| 05-Apr-2022 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-fixes into drm-misc-fixes
Let's start the 5.18 fixes cycle.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
88e6c020 |
| 01-Apr-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro: "Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/ker
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro: "Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: drop needless assignment in aio_read() clean overflow checks in count_mounts() a bit seq_file: fix NULL pointer arithmetic warning uml/x86: use x86 load_unaligned_zeropad() asm/user.h: killed unused macros constify struct path argument of finish_automount()/do_add_mount() fs: Remove FIXME comment in generic_write_checks()
show more ...
|
#
de4fb176 |
| 01-Apr-2022 |
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> |
Merge branches 'fixes' and 'misc' into for-linus
|