#
98dbec0a |
| 19-Dec-2022 |
David S. Miller <davem@davemloft.net> |
Merge branch 'rxrpc-fixes'
David Howells says:
==================== rxrpc: Fixes for I/O thread conversion/SACK table expansion
Here are some fixes for AF_RXRPC:
(1) Fix missing unlock in rxrpc'
Merge branch 'rxrpc-fixes'
David Howells says:
==================== rxrpc: Fixes for I/O thread conversion/SACK table expansion
Here are some fixes for AF_RXRPC:
(1) Fix missing unlock in rxrpc's sendmsg.
(2) Fix (lack of) propagation of security settings to rxrpc_call.
(3) Fix NULL ptr deref in rxrpc_unuse_local().
(4) Fix problem with kthread_run() not invoking the I/O thread function if the kthread gets stopped first. Possibly this should actually be fixed in the kthread code.
(5) Fix locking problem as putting a peer (which may be done from RCU) may now invoke kthread_stop().
(6) Fix switched parameters in a couple of trace calls.
(7) Fix I/O thread's checking for kthread stop to make sure it completes all outstanding work before returning so that calls are cleaned up.
(8) Fix an uninitialised var in the new rxperf test server.
(9) Fix the return value of rxrpc_new_incoming_call() so that the checks on it work correctly.
The patches fix at least one syzbot bug[1] and probably some others that don't have reproducers[2][3][4]. I think it also fixes another[5], but that showed another failure during testing that was different to the original.
There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a combination of several patches in my rxrpc-next branch, but I haven't included that here. ====================
Tested-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: kafs-testing+fedora36_64checkkafs-build-164@auristor.com Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
fdb99487 |
| 15-Dec-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix security setting propagation
Fix the propagation of the security settings from sendmsg to the rxrpc_call struct.
Fixes: f3441d4125fc ("rxrpc: Copy client call parameters into rxrpc_call
rxrpc: Fix security setting propagation
Fix the propagation of the security settings from sendmsg to the rxrpc_call struct.
Fixes: f3441d4125fc ("rxrpc: Copy client call parameters into rxrpc_call earlier") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1a931707 |
| 16-Dec-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To resolve a trivial merge conflict with c302378bc157f6a7 ("libbpf: Hashmap interface update to allow both long and void* keys/values"),
Merge remote-tracking branch 'torvalds/master' into perf/core
To resolve a trivial merge conflict with c302378bc157f6a7 ("libbpf: Hashmap interface update to allow both long and void* keys/values"), where a function present upstream was removed in the perf tools development tree.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
4f2c0a4a |
| 13-Dec-2022 |
Nick Terrell <terrelln@fb.com> |
Merge branch 'main' into zstd-linus
|
#
7e68dd7d |
| 13-Dec-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Allow live renaming when an interface is up
- Ad
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing data races
- De-duplicate common code between OVS and conntrack offloading infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF programs
- Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting, and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error reporting
- RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking
- IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and migratable
- Extend devlink-rate to support strict prioriry and weighted fair queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter
- PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S
- PTP: - Orolia ART-CARD
- WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices
- Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support
- Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default
- Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support
- Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan
- MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support
- Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support
- RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
show more ...
|
#
d69e8c63 |
| 09-Dec-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
Merge tag 'v6.1-rc8' into rdma.git for-next
For dependencies in following patches
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d9f26ae7 |
| 05-Dec-2022 |
Ard Biesheuvel <ardb@kernel.org> |
Merge tag 'v6.1-rc8' into efi/next
Linux 6.1-rc8
|
#
27e521c5 |
| 05-Dec-2022 |
David S. Miller <davem@davemloft.net> |
Merge tag 'rxrpc-next-20221201-b' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
==================== rxrpc: Increasing SACK size and moving away from softir
Merge tag 'rxrpc-next-20221201-b' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
==================== rxrpc: Increasing SACK size and moving away from softirq, parts 2 & 3
Here are the second and third parts of patches in the process of moving rxrpc from doing a lot of its stuff in softirq context to doing it in an I/O thread in process context and thereby making it easier to support a larger SACK table.
The full description is in the description for the first part[1] which is already in net-next.
The second part includes some cleanups, adds some testing and overhauls some tracing:
(1) Remove declaration of rxrpc_kernel_call_is_complete() as the definition is no longer present.
(2) Remove the knet() and kproto() macros in favour of using tracepoints.
(3) Remove handling of duplicate packets from recvmsg. The input side isn't now going to insert overlapping/duplicate packets into the recvmsg queue.
(4) Don't use the rxrpc_conn_parameters struct in the rxrpc_connection or rxrpc_bundle structs - rather put the members in directly.
(5) Extract the abort code from a received abort packet right up front rather than doing it in multiple places later.
(6) Use enums and symbol lists rather than __builtin_return_address() to indicate where a tracepoint was triggered for local, peer, conn, call and skbuff tracing.
(7) Add a refcount tracepoint for the rxrpc_bundle struct.
(8) Implement an in-kernel server for the AFS rxperf testing program to talk to (enabled by a Kconfig option).
This is tagged as rxrpc-next-20221201-a.
The third part introduces the I/O thread and switches various bits over to running there:
(1) Fix call timers and call and connection workqueues to not hold refs on the rxrpc_call and rxrpc_connection structs to thereby avoid messy cleanup when the last ref is put in softirq mode.
(2) Split input.c so that the call packet processing bits are separate from the received packet distribution bits. Call packet processing gets bumped over to the call event handler.
(3) Create a per-local endpoint I/O thread. Barring some tiny bits that still get done in softirq context, all packet reception, processing and transmission is done in this thread. That will allow a load of locking to be removed.
(4) Perform packet processing and error processing from the I/O thread.
(5) Provide a mechanism to process call event notifications in the I/O thread rather than queuing a work item for that call.
(6) Move data and ACK transmission into the I/O thread. ACKs can then be transmitted at the point they're generated rather than getting delegated from softirq context to some process context somewhere.
(7) Move call and local processor event handling into the I/O thread.
(8) Move cwnd degradation to after packets have been transmitted so that they don't shorten the window too quickly.
A bunch of simplifications can then be done:
(1) The input_lock is no longer necessary as exclusion is achieved by running the code in the I/O thread only.
(2) Don't need to use sk->sk_receive_queue.lock to guard socket state changes as the socket mutex should suffice.
(3) Don't take spinlocks in RCU callback functions as they get run in softirq context and thus need _bh annotations.
(4) RCU is then no longer needed for the peer's error_targets list.
(5) Simplify the skbuff handling in the receive path by dropping the ref in the basic I/O thread loop and getting an extra ref as and when we need to queue the packet for recvmsg or another context.
(6) Get the peer address earlier in the input process and pass it to the users so that we only do it once.
This is tagged as rxrpc-next-20221201-b.
Changes: ======== ver #2) - Added a patch to change four assertions into warnings in rxrpc_read() and fixed a checker warning from a __user annotation that should have been removed.. - Change a min() to min_t() in rxperf as PAGE_SIZE doesn't seem to match type size_t on i386. - Three error handling issues in rxrpc_new_incoming_call(): - If not DATA or not seq #1, should drop the packet, not abort. - Fix a goto that went to the wrong place, dropping a non-held lock. - Fix an rcu_read_lock that should've been an unlock.
Tested-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: kafs-testing+fedora36_64checkkafs-build-144@auristor.com Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/166982725699.621383.2358362793992993374.stgit@warthog.procyon.org.uk/ # v1 ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
90337f52 |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code.
Merge tag 'v6.1-rc7' into iommufd.git for-next
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15 |
|
#
3dd9c8b5 |
| 24-Jan-2020 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the _bh annotation from all the spinlocks
None of the spinlocks in rxrpc need a _bh annotation now as the RCU callback routines no longer take spinlocks and the bulk of the packet wran
rxrpc: Remove the _bh annotation from all the spinlocks
None of the spinlocks in rxrpc need a _bh annotation now as the RCU callback routines no longer take spinlocks and the bulk of the packet wrangling code is now run in the I/O thread, not softirq context.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
29fb4ec3 |
| 12-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove RCU from peer->error_targets list
Remove the RCU requirements from the peer's list of error targets so that the error distributor can call sleeping functions.
Signed-off-by: David How
rxrpc: Remove RCU from peer->error_targets list
Remove the RCU requirements from the peer's list of error targets so that the error distributor can call sleeping functions.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
f3441d41 |
| 20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Copy client call parameters into rxrpc_call earlier
Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offl
rxrpc: Copy client call parameters into rxrpc_call earlier
Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offloaded to the I/O thread.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
3cec055c |
| 25-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Don't hold a ref for connection workqueue
Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function.
rxrpc: Don't hold a ref for connection workqueue
Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run.
This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix).
(1) Don't give a ref to the work item.
(2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this.
(3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end.
(4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection.
(5) Make sure that the cleanup routine waits for the work item to complete.
(6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed.
Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function.
Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
fa3492ab |
| 02-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace rxrpc_bundle refcount
Add a tracepoint for the rxrpc_bundle refcounting.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lis
rxrpc: Trace rxrpc_bundle refcount
Add a tracepoint for the rxrpc_bundle refcounting.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
cb0fc0c9 |
| 21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call
rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call tracepoint
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
7fa25105 |
| 21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn
rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
47c810a7 |
| 21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer
rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
0fde882f |
| 21-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_loca
rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing
In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
2cc80086 |
| 19-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle
Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly.
rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle
Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly. These are going to get filled in from the rxrpc_call struct in future.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
e969c92c |
| 20-Oct-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove the [_k]net() debugging macros
Remove the _net() and knet() debugging macros in favour of tracepoints.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@
rxrpc: Remove the [_k]net() debugging macros
Remove the _net() and knet() debugging macros in favour of tracepoints.
Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
show more ...
|
#
f2bb566f |
| 29-Nov-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed cod
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
9d1566e1 |
| 28-Nov-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 6.1-rc7 into usb-next
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
08ad43d5 |
| 24-Nov-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc, netfilter and xfrm.
Current release - reg
Merge tag 'net-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc, netfilter and xfrm.
Current release - regressions:
- dccp/tcp: fix bhash2 issues related to WARN_ON() in inet_csk_get_port()
- l2tp: don't sleep and disable BH under writer-side sk_callback_lock
- eth: ice: fix handling of burst tx timestamps
Current release - new code bugs:
- xfrm: squelch kernel warning in case XFRM encap type is not available
- eth: mlx5e: fix possible race condition in macsec extended packet number update routine
Previous releases - regressions:
- neigh: decrement the family specific qlen
- netfilter: fix ipset regression
- rxrpc: fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
- eth: iavf: do not restart tx queues after reset task failure
- eth: nfp: add port from netdev validation for EEPROM access
- eth: mtk_eth_soc: fix potential memory leak in mtk_rx_alloc()
Previous releases - always broken:
- tipc: set con sock in tipc_conn_alloc
- nfc: - fix potential memory leaks - fix incorrect sizing calculations in EVT_TRANSACTION
- eth: octeontx2-af: fix pci device refcount leak
- eth: bonding: fix ICMPv6 header handling when receiving IPv6 messages
- eth: prestera: add missing unregister_netdev() in prestera_port_create()
- eth: tsnep: fix rotten packets
Misc:
- usb: qmi_wwan: add support for LARA-L6"
* tag 'net-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: thunderx: Fix the ACPI memory leak octeontx2-af: Fix reference count issue in rvu_sdp_init() net: altera_tse: release phylink resources in tse_shutdown() virtio_net: Fix probe failed when modprobe virtio_net net: wwan: t7xx: Fix the ACPI memory leak octeontx2-pf: Add check for devm_kcalloc net: enetc: preserve TX ring priority across reconfiguration net: marvell: prestera: add missing unregister_netdev() in prestera_port_create() nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st-nci: fix memory leaks in EVT_TRANSACTION nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION Documentation: networking: Update generic_netlink_howto URL net/cdc_ncm: Fix multicast RX support for CDC NCM devices with ZLP net: usb: qmi_wwan: add u-blox 0x1342 composition l2tp: Don't sleep and disable BH under writer-side sk_callback_lock net: dm9051: Fix missing dev_kfree_skb() in dm9051_loop_rx() arcnet: fix potential memory leak in com20020_probe() ipv4: Fix error return code in fib_table_insert() net: ethernet: mtk_eth_soc: fix memory leak in error path net: ethernet: mtk_eth_soc: fix resource leak in error path ...
show more ...
|
#
3bcd6c7e |
| 16-Nov-2022 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with availab
rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]
After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with available channels and, if not, removes and attempts to destroy the bundle.
Whilst it does check after grabbing client_bundles_lock that there are no connections attached, this races with rxrpc_look_up_bundle() retrieving the bundle, but not attaching a connection for the connection to be attached later.
There is therefore a window in which the bundle can get destroyed before we manage to attach a new connection to it.
Fix this by adding an "active" counter to struct rxrpc_bundle:
(1) rxrpc_connect_call() obtains an active count by prepping/looking up a bundle and ditches it before returning.
(2) If, during rxrpc_connect_call(), a connection is added to the bundle, this obtains an active count, which is held until the connection is discarded.
(3) rxrpc_deactivate_bundle() is created to drop an active count on a bundle and destroy it when the active count reaches 0. The active count is checked inside client_bundles_lock() to prevent a race with rxrpc_look_up_bundle().
(4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle().
Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3ca6c3b4 |
| 09-Nov-2022 |
David S. Miller <davem@davemloft.net> |
Merge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
rxrpc changes
David Howells says:
==================== rxrpc: Increasing SACK size and moving awa
Merge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
rxrpc changes
David Howells says:
==================== rxrpc: Increasing SACK size and moving away from softirq, part 1
AF_RXRPC has some issues that need addressing:
(1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call.
(2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays.
(3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available.
This is the first of a number of sets of patches that try and address them. The overall aims of these changes include:
(1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK.
On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet.
(2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there.
(3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission.
(4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort.
(5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API.
(6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them.
(7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle.
(8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully.
With regard to this first patchset, the changes made include:
(1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls.
(2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace.
(3) Addition of a new proc file to display some stats.
(4) Some code cleanups, including removing some unused bits and unnecessary header inclusions.
(5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those.
(6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes.
(7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified.
(8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel.
(9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed.
(10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously.
(11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|