#
917bda9a |
| 29-Aug-2022 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync drm-intel-next with v6.0-rc as well as recent drm-intel-gt-next.
Since drm-next does not have commit f0c70d41e4e8 ("drm/i915/guc: remove runtime info pri
Merge drm/drm-next into drm-intel-next
Sync drm-intel-next with v6.0-rc as well as recent drm-intel-gt-next.
Since drm-next does not have commit f0c70d41e4e8 ("drm/i915/guc: remove runtime info printing from time stamp logging") yet, only drm-intel-gt-next, will need to do that as part of the merge here to build.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
Revision tags: v5.15.63 |
|
#
35ffb665 |
| 23-Aug-2022 |
Richard Gobert <richardbgobert@gmail.com> |
net: gro: skb_gro_header helper function
Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fe
net: gro: skb_gro_header helper function
Introduce a simple helper function to replace a common pattern. When accessing the GRO header, we fetch the pointer from frag0, then test its validity and fetch it from the skb when necessary.
This leads to the pattern skb_gro_header_fast -> skb_gro_header_hard -> skb_gro_header_slow recurring many times throughout GRO code.
This patch replaces these patterns with a single inlined function call, improving code readability.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220823071034.GA56142@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
c21e1bf4 |
| 24-Aug-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'add-a-second-bind-table-hashed-by-port-and-address'
Joanne Koong says:
==================== Add a second bind table hashed by port and address
Currently, there is one bind hashtable
Merge branch 'add-a-second-bind-table-hashed-by-port-and-address'
Joanne Koong says:
==================== Add a second bind table hashed by port and address
Currently, there is one bind hashtable (bhash) that hashes by port only. This patchset adds a second bind table (bhash2) that hashes by port and address.
The motivation for adding bhash2 is to expedite bind requests in situations where the port has many sockets in its bhash table entry (eg a large number of sockets bound to different addresses on the same port), which makes checking bind conflicts costly especially given that we acquire the table entry spinlock while doing so, which can cause softirq cpu lockups and can prevent new tcp connections.
We ran into this problem at Meta where the traffic team binds a large number of IPs to port 443 and the bind() call took a significant amount of time which led to cpu softirq lockups, which caused packet drops and other failures on the machine.
When experimentally testing this on a local server for ~24k sockets bound to the port, the results seen were:
ipv4: before - 0.002317 seconds with bhash2 - 0.000020 seconds
ipv6: before - 0.002431 seconds with bhash2 - 0.000021 seconds
The additions to the initial bhash2 submission [0] are: * Updating bhash2 in the cases where a socket's rcv saddr changes after it has * been bound * Adding locks for bhash2 hashbuckets
[0] https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
v3: https://lore.kernel.org/netdev/20220722195406.1304948-2-joannelkoong@gmail.com/ v2: https://lore.kernel.org/netdev/20220712235310.1935121-1-joannelkoong@gmail.com/ v1: https://lore.kernel.org/netdev/20220623234242.2083895-2-joannelkoong@gmail.com/ ====================
Link: https://lore.kernel.org/r/20220822181023.3979645-1-joannelkoong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
28044fc1 |
| 22-Aug-2022 |
Joanne Koong <joannelkoong@gmail.com> |
net: Add a bhash2 table hashed by port and address
The current bind hashtable (bhash) is hashed by port only. In the socket bind path, we have to check for bind conflicts by traversing the specified
net: Add a bhash2 table hashed by port and address
The current bind hashtable (bhash) is hashed by port only. In the socket bind path, we have to check for bind conflicts by traversing the specified port's inet_bind_bucket while holding the hashbucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()). In instances where there are tons of sockets hashed to the same port at different addresses, the bind conflict check is time-intensive and can cause softirq cpu lockups, as well as stops new tcp connections since __inet_inherit_port() also contests for the spinlock.
This patch adds a second bind table, bhash2, that hashes by port and sk->sk_rcv_saddr (ipv4) and sk->sk_v6_rcv_saddr (ipv6). Searching the bhash2 table leads to significantly faster conflict resolution and less time holding the hashbucket spinlock.
Please note a few things: * There can be the case where the a socket's address changes after it has been bound. There are two cases where this happens:
1) The case where there is a bind() call on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6) and then a connect() call. The kernel will assign the socket an address when it handles the connect()
2) In inet_sk_reselect_saddr(), which is called when rebuilding the sk header and a few pre-conditions are met (eg rerouting fails).
In these two cases, we need to update the bhash2 table by removing the entry for the old address, and add a new entry reflecting the updated address.
* The bhash2 table must have its own lock, even though concurrent accesses on the same port are protected by the bhash lock. Bhash2 must have its own lock to protect against cases where sockets on different ports hash to different bhash hashbuckets but to the same bhash2 hashbucket.
This brings up a few stipulations: 1) When acquiring both the bhash and the bhash2 lock, the bhash2 lock will always be acquired after the bhash lock and released before the bhash lock is released.
2) There are no nested bhash2 hashbucket locks. A bhash2 lock is always acquired+released before another bhash2 lock is acquired+released.
* The bhash table cannot be superseded by the bhash2 table because for bind requests on INADDR_ANY (ipv4) or IPV6_ADDR_ANY (ipv6), every socket bound to that port must be checked for a potential conflict. The bhash table is the only source of port->socket associations.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.62 |
|
#
93fbff11 |
| 17-Aug-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'i2c/make_remove_callback_void-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into next
Sync up with the latest I2C code base to get updated prototype of I2C bus
Merge branch 'i2c/make_remove_callback_void-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into next
Sync up with the latest I2C code base to get updated prototype of I2C bus remove() method.
show more ...
|
Revision tags: v5.15.61 |
|
#
cf36ae3e |
| 17-Aug-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging for v6.0-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v5.15.60 |
|
#
44627916 |
| 05-Aug-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
Merge part of branch 'for-next.instantiate' into for-next
|
#
fc30eea1 |
| 04-Aug-2022 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync up. In special to get the drm-intel-gt-next stuff.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
#
f86d1fbb |
| 03-Aug-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni: "Core:
- Refactor the forward memory allocation to better cop
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni: "Core:
- Refactor the forward memory allocation to better cope with memory pressure with many open sockets, moving from a per socket cache to a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_* schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets, increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space tools.
- Significant performance improvement for the TLS 1.3 receive path, both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup status
- Introduce virtually contingus buffers for sockets over RDMA, to cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs: - i40e: add support for vlan pruning - i40e: add support for XDP framented packets - ice: improved vlan offload support - ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5) - refactor packet steering offload for performance and scalability - extend support for TC offload - refactor devlink code to clean-up the locking schema - support stacked vlans for bridge offloads - use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp): - extend support for IPv6 fields mangling offload - add support for vepa mode in HW bridge - better support for virtio data path acceleration (VDPA) - enable TSO by default
- Microsoft vNIC driver (mana) - add support for XDP redirect
- Others Ethernet drivers: - bonding: add per-port priority support - microchip lan743x: extend phy support - Fungible funeth: support UDP segmentation offload and XDP xmit - Solarflare EF100: add support for virtual function representors - MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw): - dropped support for unreleased H/W (XM router). - improved stats accuracy - unified bridge model coversion improving scalability (parts 1-6) - support for PTP in Spectrum-2 asics
- Broadcom PHYs - add PTP support for BCM54210E - add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera): - implement support for multicast forwarding offload
- Embedded Ethernet switches: - refactor OcteonTx MAC filter for better scalability - improve TC H/W offload for the Felix driver - refactor the Microchip ksz8 and ksz9477 drivers to share the probe code (parts 1, 2), add support for phylink mac configuration
- Other WiFi: - Microchip wilc1000: diable WEP support and enable WPA3 - Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits) doc: sfp-phylink: Fix a broken reference wireguard: selftests: support UML wireguard: allowedips: don't corrupt stack when detecting overflow wireguard: selftests: update config fragments wireguard: ratelimiter: use hrtimer in selftest net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ net: usb: ax88179_178a: Bind only to vendor-specific interface selftests: net: fix IOAM test skip return code net: usb: make USB_RTL8153_ECM non user configurable net: marvell: prestera: remove reduntant code octeontx2-pf: Reduce minimum mtu size to 60 net: devlink: Fix missing mutex_unlock() call net/tls: Remove redundant workqueue flush before destroy net: txgbe: Fix an error handling path in txgbe_probe() net: dsa: Fix spelling mistakes and cleanup code Documentation: devlink: add add devlink-selftests to the table of contents dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock net: ionic: fix error check for vlan flags in ionic_set_nic_features() net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr() nfp: flower: add support for tunnel offload without key ID ...
show more ...
|
Revision tags: v5.15.59 |
|
#
8bb5e7f4 |
| 02-Aug-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.20 (or 6.0) merge window.
|
Revision tags: v5.19 |
|
#
82b6c2e7 |
| 29-Jul-2022 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplu
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on Sapphire Rapids (Artem Bityutskiy).
* pm-cpufreq: cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask cpufreq: loongson2: fix Kconfig "its" grammar cpufreq: Warn users while freeing active policy cpufreq: ACPI: Add Zhaoxin/Centaur turbo boost control interface support cpufreq: Drop unnecessary cpus locking from store() cpufreq: Optimize cpufreq_show_cpus()
* pm-cpuidle: intel_idle: make SPR C1 and C1E be independent cpuidle: haltpoll: Add trace points for guest_halt_poll_ns grow/shrink
show more ...
|
Revision tags: v5.15.58 |
|
#
40d02efa |
| 27-Jul-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To get upstream fixes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
3c69a99b |
| 24-Jul-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
Merge tag 'v5.19-rc7' into fixes
Merge v5.19-rc7 into fixes to bring in: d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
|
#
4effe18f |
| 24-Jul-2022 |
Jens Axboe <axboe@kernel.dk> |
Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_m
Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.57 |
|
#
6e0e846e |
| 21-Jul-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
Revision tags: v5.15.56 |
|
#
7ca433dc |
| 21-Jul-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from can.
Still no major regressions, most of the ch
Merge tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from can.
Still no major regressions, most of the changes are still due to data races fixes, plus the usual bunch of drivers fixes.
Previous releases - regressions:
- tcp/udp: make early_demux back namespacified.
- dsa: fix issues with vlan_filtering_is_global
Previous releases - always broken:
- ip: fix data-races around ipv4_net_table (round 2, 3 & 4)
- amt: fix validation and synchronization bugs
- can: fix detection of mcp251863
- eth: iavf: fix handling of dummy receive descriptors
- eth: lan966x: fix issues with MAC table
- eth: stmmac: dwmac-mediatek: fix clock issue
Misc:
- dsa: update documentation"
* tag 'net-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits) mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication net/sched: cls_api: Fix flow action initialization tcp: Fix data-races around sysctl_tcp_max_reordering. tcp: Fix a data-race around sysctl_tcp_abort_on_overflow. tcp: Fix a data-race around sysctl_tcp_rfc1337. tcp: Fix a data-race around sysctl_tcp_stdurg. tcp: Fix a data-race around sysctl_tcp_retrans_collapse. tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts. tcp: Fix data-races around sysctl_tcp_recovery. tcp: Fix a data-race around sysctl_tcp_early_retrans. tcp: Fix data-races around sysctl knobs related to SYN option. udp: Fix a data-race around sysctl_udp_l3mdev_accept. ip: Fix data-races around sysctl_ip_prot_sock. ipv4: Fix data-races around sysctl_fib_multipath_hash_fields. ipv4: Fix data-races around sysctl_fib_multipath_hash_policy. ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh. can: rcar_canfd: Add missing of_node_put() in rcar_canfd_probe() can: mcp251xfd: fix detection of mcp251863 Documentation: fix udp_wmem_min in ip-sysctl.rst ...
show more ...
|
#
dc14036f |
| 18-Jul-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.19-rc7 into usb-next
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
0698461a |
| 18-Jul-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what
Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what was added in:
49c692b7dfc9b6c0 ("perf offcpu: Accept allowed sample types only")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
c9f21106 |
| 18-Jul-2022 |
David S. Miller <davem@davemloft.net> |
Merge branch 'net-ipv4-sysctl-races-part-3'
Kuniyuki Iwashima says:
==================== sysctl: Fix data-races around ipv4_net_table (Round 3).
This series fixes data-races around 21 knobs after
Merge branch 'net-ipv4-sysctl-races-part-3'
Kuniyuki Iwashima says:
==================== sysctl: Fix data-races around ipv4_net_table (Round 3).
This series fixes data-races around 21 knobs after igmp_link_local_mcast_reports in ipv4_net_table.
These 4 knobs are skipped because they are safe.
- tcp_congestion_control: Safe with RCU and xchg(). - tcp_available_congestion_control: Read only. - tcp_allowed_congestion_control: Safe with RCU and spinlock(). - tcp_fastopen_key: Safe with RCU and xchg()
So, round 4 will start with fib_multipath_use_neigh. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5a542133 |
| 15-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
tcp: Fix data-races around sysctl_tcp_fastopen.
While reading sysctl_tcp_fastopen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 2100c8d2d9db ("net-tcp: Fa
tcp: Fix data-races around sysctl_tcp_fastopen.
While reading sysctl_tcp_fastopen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 2100c8d2d9db ("net-tcp: Fast Open base") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.55 |
|
#
11052589 |
| 13-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we intr
tcp/udp: Make early_demux back namespacified.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
782d86fe |
| 15-Jul-2022 |
David S. Miller <davem@davemloft.net> |
Merge branch 'net-sysctl-races-round2'
Kuniyuki Iwashima says:
==================== sysctl: Fix data-races around ipv4_net_table (Round 2).
This series fixes data-races around 15 knobs after ip_de
Merge branch 'net-sysctl-races-round2'
Kuniyuki Iwashima says:
==================== sysctl: Fix data-races around ipv4_net_table (Round 2).
This series fixes data-races around 15 knobs after ip_default_ttl in ipv4_net_table.
These two knobs are skipped. - ip_local_port_range is safe with its own lock. - ip_local_reserved_ports uses proc_do_large_bitmap(), which will need an additional lock and can be fixed later.
So, the next round will start with igmp_link_local_mcast_reports. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
0968d2a4 |
| 13-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
ip: Fix data-races around sysctl_ip_no_pmtu_disc.
While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-
ip: Fix data-races around sysctl_ip_no_pmtu_disc.
While reading sysctl_ip_no_pmtu_disc, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
816cd168 |
| 14-Jul-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096""
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/
net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers")
net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
9bd572ec |
| 14-Jul-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bpf and wireless.
Still no major
Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, bpf and wireless.
Still no major regressions, the release continues to be calm. An uptick of fixes this time around due to trivial data race fixes and patches flowing down from subtrees.
There has been a few driver fixes (particularly a few fixes for false positives due to 66e4c8d95008 which went into -next in May!) that make me worry the wide testing is not exactly fully through.
So "calm" but not "let's just cut the final ASAP" vibes over here.
Current release - regressions:
- wifi: rtw88: fix write to const table of channel parameters
Current release - new code bugs:
- mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify
- mlx5: - TC, allow offload from uplink to other PF's VF - Lag, decouple FDB selection and shared FDB - Lag, correct get the port select mode str
- bnxt_en: fix and simplify XDP transmit path
- r8152: fix accessing unset transport header
Previous releases - regressions:
- conntrack: fix crash due to confirmed bit load reordering (after atomic -> refcount conversion)
- stmmac: dwc-qos: disable split header for Tegra194
Previous releases - always broken:
- mlx5e: ring the TX doorbell on DMA errors
- bpf: make sure mac_header was set before using it
- mac80211: do not wake queues on a vif that is being stopped
- mac80211: fix queue selection for mesh/OCB interfaces
- ip: fix dflt addr selection for connected nexthop
- seg6: fix skb checksums for SRH encapsulation/insertion
- xdp: fix spurious packet loss in generic XDP TX path
- bunch of sysctl data race fixes
- nf_log: incorrect offset to network header
Misc:
- bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs"
* tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) nfp: flower: configure tunnel neighbour on cmsg rx net/tls: Check for errors in tls_device_init MAINTAINERS: Add an additional maintainer to the AMD XGBE driver xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue selftests/net: test nexthop without gw ip: fix dflt addr selection for connected nexthop net: atlantic: remove aq_nic_deinit() when resume net: atlantic: remove deep parameter on suspend/resume functions sfc: fix kernel panic when creating VF seg6: bpf: fix skb checksum in bpf_push_seg6_encap() seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors seg6: fix skb checksum evaluation in SRH encapsulation/insertion sfc: fix use after free when disabling sriov net: sunhme: output link status with a single print. r8152: fix accessing unset transport header net: stmmac: fix leaks in probe net: ftgmac100: Hold reference returned by of_get_child_by_name() nexthop: Fix data-races around nexthop_compat_mode. ipv4: Fix data-races around sysctl_ip_dynaddr. tcp: Fix a data-race around sysctl_tcp_ecn_fallback. ...
show more ...
|