Revision tags: v6.1.31 |
|
#
34dfde4a |
| 26-May-2023 |
Cambda Zhu <cambda@linux.alibaba.com> |
tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
This patch replaces the tp->mss_cache check in getting TCP_MAXSEG with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Sinc
tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
This patch replaces the tp->mss_cache check in getting TCP_MAXSEG with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Since tp->mss_cache is initialized with TCP_MSS_DEFAULT, checking if it's zero is probably a bug.
With this change, getting TCP_MAXSEG before connecting will return default MSS normally, and return user_mss if user_mss is set.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Jack Yang <mingliang@linux.alibaba.com> Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+3kL9pYtkxkwxwNMzvC_w3LNUum_2=3u+UyLBmGmifHA@mail.gmail.com/#t Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Link: https://lore.kernel.org/netdev/14D45862-36EA-4076-974C-EA67513C92F6@linux.alibaba.com/ Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230527040317.68247-1-cambda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
4faeee0c |
| 26-May-2023 |
Eric Dumazet <edumazet@google.com> |
tcp: deny tcp_disconnect() when threads are waiting
Historically connect(AF_UNSPEC) has been abused by syzkaller and other fuzzers to trigger various bugs.
A recent one triggers a divide-by-zero [1
tcp: deny tcp_disconnect() when threads are waiting
Historically connect(AF_UNSPEC) has been abused by syzkaller and other fuzzers to trigger various bugs.
A recent one triggers a divide-by-zero [1], and Paolo Abeni was able to diagnose the issue.
tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN and TCP REPAIR mode being not used.
Then later if socket lock is released in sk_wait_data(), another thread can call connect(AF_UNSPEC), then make this socket a TCP listener.
When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf() and attempt a divide by 0 in tcp_rcv_space_adjust() [1]
This patch adds a new socket field, counting number of threads blocked in sk_wait_event() and inet_wait_for_connect().
If this counter is not zero, tcp_disconnect() returns an error.
This patch adds code in blocking socket system calls, thus should not hurt performance of non blocking ones.
Note that we probably could revert commit 499350a5a6e7 ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore original tcpi_rcv_mss meaning (was 0 if no payload was ever received on a socket)
[1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740 Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48 RSP: 0018:ffffc900033af660 EFLAGS: 00010206 RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001 RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8 R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344 FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616 tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681 inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1038 ____sys_recvmsg+0x210/0x5a0 net/socket.c:2720 ___sys_recvmsg+0xf2/0x180 net/socket.c:2762 do_recvmmsg+0x25e/0x6e0 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5c0108c0f9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003 RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000 </TASK>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Diagnosed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7e530d32 |
| 28-May-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 6.4-rc4 into usb-next
We need the USB fixes in here and this resolves merge conflicts in: drivers/usb/dwc3/gadget.c drivers/usb/gadget/udc/core.c
Signed-off-by: Greg Kroah-Hartman <gregkh@l
Merge 6.4-rc4 into usb-next
We need the USB fixes in here and this resolves merge conflicts in: drivers/usb/dwc3/gadget.c drivers/usb/gadget/udc/core.c
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
8a29f74b |
| 28-May-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge v6.4-rc4 into char-misc-next
We need the binder fixes in here for future changes and testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
0e4daea3 |
| 27-May-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 6.4-rc3 into tty-next
We need the serial/tty fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
75455b90 |
| 26-May-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-05-26
We've added 54 non-merge commits
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-05-26
We've added 54 non-merge commits during the last 10 day(s) which contain a total of 76 files changed, 2729 insertions(+), 1003 deletions(-).
The main changes are:
1) Add the capability to destroy sockets in BPF through a new kfunc, from Aditi Ghag.
2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands, from Andrii Nakryiko.
3) Add capability for libbpf to resize datasec maps when backed via mmap, from JP Kobryn.
4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod, from Jiri Olsa.
5) Big batch of xsk selftest improvements to prep for multi-buffer testing, from Magnus Karlsson.
6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it via bpftool, from Yafang Shao.
7) Various misc BPF selftest improvements to work with upcoming LLVM 17, from Yonghong Song.
8) Extend bpftool to specify netdevice for resolving XDP hints, from Larysa Zaremba.
9) Document masking in shift operations for the insn set document, from Dave Thaler.
10) Extend BPF selftests to check xdp_feature support for bond driver, from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) bpf: Fix bad unlock balance on freeze_mutex libbpf: Ensure FD >= 3 during bpf_map__reuse_fd() libbpf: Ensure libbpf always opens files with O_CLOEXEC selftests/bpf: Check whether to run selftest libbpf: Change var type in datasec resize func bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command libbpf: Selftests for resizing datasec maps libbpf: Add capability for resizing datasec maps selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands libbpf: Start v1.3 development cycle bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM bpftool: Specify XDP Hints ifname when loading program selftests/bpf: Add xdp_feature selftest for bond device selftests/bpf: Test bpf_sock_destroy selftests/bpf: Add helper to get port using getsockname bpf: Add bpf_sock_destroy kfunc bpf: Add kfunc filter function to 'struct btf_kfunc_id_set' bpf: udp: Implement batching for sockets iterator ... ====================
Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
d4031ec8 |
| 25-May-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.
Conflicts:
net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") c85be08fc4fa ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/
Adjacent changes:
drivers/net/ethernet/freescale/fec_main.c 9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values") 144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
50fb587e |
| 25-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf.
Current release - regressions
Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf.
Current release - regressions:
- net: fix skb leak in __skb_tstamp_tx()
- eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
Current release - new code bugs:
- handshake: - fix sock->file allocation - fix handshake_dup() ref counting
- bluetooth: - fix potential double free caused by hci_conn_unlink - fix UAF in hci_conn_hash_flush
Previous releases - regressions:
- core: fix stack overflow when LRO is disabled for virtual interfaces
- tls: fix strparser rx issues
- bpf: - fix many sockmap/TCP related issues - fix a memory leak in the LRU and LRU_PERCPU hash maps - init the offload table earlier
- eth: mlx5e: - do as little as possible in napi poll when budget is 0 - fix using eswitch mapping in nic mode - fix deadlock in tc route query code
Previous releases - always broken:
- udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()
- raw: fix output xfrm lookup wrt protocol
- smc: reset connection when trying to use SMCRv2 fails
- phy: mscc: enable VSC8501/2 RGMII RX clock
- eth: octeontx2-pf: fix TSOv6 offload
- eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"
* tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ipv6: Fix out-of-bounds access in ipv6_find_tlv() net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs docs: netdev: document the existence of the mail bot net: fix skb leak in __skb_tstamp_tx() r8169: Use a raw_spinlock_t for the register locks. page_pool: fix inconsistency for page_pool_ring_[un]lock() bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer ...
show more ...
|
#
0c615f1c |
| 24-May-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
==================== pull-request: bpf 2023-05-24
We've added 19 non-merge commits during th
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
==================== pull-request: bpf 2023-05-24
We've added 19 non-merge commits during the last 10 day(s) which contain a total of 20 files changed, 738 insertions(+), 448 deletions(-).
The main changes are:
1) Batch of BPF sockmap fixes found when running against NGINX TCP tests, from John Fastabend.
2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails, from Anton Protopopov.
3) Init the BPF offload table earlier than just late_initcall, from Jakub Kicinski.
4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields, from Will Deacon.
5) Remove a now unsupported __fallthrough in BPF samples, from Andrii Nakryiko.
6) Fix a typo in pkg-config call for building sign-file, from Jeremy Sowden.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0 bpf, sockmap: Build helper to create connected socket pair bpf, sockmap: Pull socket helpers out of listen test for general use bpf, sockmap: Incorrectly handling copied_seq bpf, sockmap: Wake up polling after data copy bpf, sockmap: TCP data stall on recv before accept bpf, sockmap: Handle fin correctly bpf, sockmap: Improved check for empty queue bpf, sockmap: Reschedule is now done through backlog bpf, sockmap: Convert schedule_work into delayed_work bpf, sockmap: Pass skb ownership through read_skb bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields samples/bpf: Drop unnecessary fallthrough bpf: netdev: init the offload table earlier selftests/bpf: Fix pkg-config call building sign-file ====================
Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.30 |
|
#
01bc4ac9 |
| 24-May-2023 |
Mark Brown <broonie@kernel.org> |
spi: Merge up v6.4-rc3
Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
|
#
6c594a82 |
| 24-May-2023 |
Mark Brown <broonie@kernel.org> |
regulator: Merge up v6.4-rc3
Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
|
#
51c78a4d |
| 23-May-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'
David Howells says:
==================== splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1
Here's
Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'
David Howells says:
==================== splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1
Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES internal sendmsg flag that is intended to replace the ->sendpage() op with calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol that it should splice the pages supplied if it can and copy them if not.
This will allow splice to pass multiple pages in a single call and allow certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire message in one go rather than having to send them piecemeal. This should also make it easier to handle the splicing of multipage folios.
A helper, skb_splice_from_iter() is provided to do the work of splicing or copying data from an iterator. If a page is determined to be unspliceable (such as being in the slab), then the helper will give an error.
Note that this facility is not made available to userspace and does not provide any sort of callback.
This set consists of the following parts:
(1) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being able to set it.
(2) Add an extra argument to skb_append_pagefrags() so that something other than MAX_SKB_FRAGS can be used (sysctl_max_skb_frags for example).
(3) Add the skb_splice_from_iter() helper to handle splicing pages into skbuffs for MSG_SPLICE_PAGES that can be shared by TCP, IP/UDP and AF_UNIX.
(4) Implement MSG_SPLICE_PAGES support in TCP.
(5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its various callers.
(6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just a wrapper around sendmsg().
(7) Implement MSG_SPLICE_PAGES support in IP6/UDP6.
(8) Implement MSG_SPLICE_PAGES support in AF_UNIX.
(9) Make AF_UNIX copy unspliceable pages.
Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/ # v2 Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/ # v3 Link: https://lore.kernel.org/r/20230405165339.3468808-1-dhowells@redhat.com/ # v4 Link: https://lore.kernel.org/r/20230406094245.3633290-1-dhowells@redhat.com/ # v5 Link: https://lore.kernel.org/r/20230411160902.4134381-1-dhowells@redhat.com/ # v6 Link: https://lore.kernel.org/r/20230515093345.396978-1-dhowells@redhat.com/ # v7 Link: https://lore.kernel.org/r/20230518113453.1350757-1-dhowells@redhat.com/ # v8 ====================
Link: https://lore.kernel.org/r/20230522121125.2595254-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
5367f9bb |
| 22-May-2023 |
David Howells <dhowells@redhat.com> |
tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
Fold do_tcp_sendpages() into its last remaining caller, tcp_sendpage_locked().
Signed-off-by: David Howells <dhowells@redhat.com> cc: David A
tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()
Fold do_tcp_sendpages() into its last remaining caller, tcp_sendpage_locked().
Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
c5c37af6 |
| 22-May-2023 |
David Howells <dhowells@redhat.com> |
tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. do_tcp_sendpages() can t
tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES
Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. do_tcp_sendpages() can then be inlined in subsequent patches into its callers.
This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
270a1c3d |
| 22-May-2023 |
David Howells <dhowells@redhat.com> |
tcp: Support MSG_SPLICE_PAGES
Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced or copied (if it cannot be spliced) from the source iterator.
This allows ->sendpage()
tcp: Support MSG_SPLICE_PAGES
Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced or copied (if it cannot be spliced) from the source iterator.
This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
90d0d600 |
| 23-May-2023 |
Mark Brown <broonie@kernel.org> |
regmap: Merge up v6.4-rc3
Merge up v6.4-rc3 to get fixes which make my CI more stable.
|
#
e5c6de5f |
| 22-May-2023 |
John Fastabend <john.fastabend@gmail.com> |
bpf, sockmap: Incorrectly handling copied_seq
The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the
bpf, sockmap: Incorrectly handling copied_seq
The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value.
To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled.
Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that.
We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this.
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
show more ...
|
#
78fa0d61 |
| 22-May-2023 |
John Fastabend <john.fastabend@gmail.com> |
bpf, sockmap: Pass skb ownership through read_skb
The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that th
bpf, sockmap: Pass skb ownership through read_skb
The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff.
This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this,
skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb))
Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it.
To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree.
Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers.
[ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] <IRQ> [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay <will@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
show more ...
|
#
03a58514 |
| 23-May-2023 |
Takashi Iwai <tiwai@suse.de> |
Merge branch 'topic/midi20' into for-next
This is a (largish) patch set for adding the support of MIDI 2.0 functionality, mainly targeted for USB devices. MIDI 2.0 is a complete overhaul of the 40-
Merge branch 'topic/midi20' into for-next
This is a (largish) patch set for adding the support of MIDI 2.0 functionality, mainly targeted for USB devices. MIDI 2.0 is a complete overhaul of the 40-years old MIDI 1.0. Unlike MIDI 1.0 byte stream, MIDI 2.0 uses packets in 32bit words for Universal MIDI Packet (UMP) protocol. It supports both MIDI 1.0 commands for compatibility and the extended MIDI 2.0 commands for higher resolutions and more functions.
For supporting the UMP, the patch set extends the existing ALSA rawmidi and sequencer interfaces, and adds the USB MIDI 2.0 support to the standard USB-audio driver.
The rawmidi for UMP has a different device name (/dev/snd/umpC*D*) and it reads/writes UMP packet data in 32bit CPU-native endianness. For the old MIDI 1.0 applications, the legacy rawmidi interface is provided, too.
As default, USB-audio driver will take the alternate setting for MIDI 2.0 interface, and the compatibility with MIDI 1.0 is provided via the rawmidi common layer. However, user may let the driver falling back to the old MIDI 1.0 interface by a module option, too.
A UMP-capable rawmidi device can create the corresponding ALSA sequencer client(s) to support the UMP Endpoint and UMP Group connections. As a nature of ALSA sequencer, arbitrary connections between clients/ports are allowed, and the ALSA sequencer core performs the automatic conversions for the connections between a new UMP sequencer client and a legacy MIDI 1.0 sequencer client. It allows the existing application to use MIDI 2.0 devices without changes.
The MIDI-CI, which is another major extension in MIDI 2.0, isn't covered by this patch set. It would be implemented rather in user-space.
Roughly speaking, the first half of this patch set is for extending the rawmidi and USB-audio, and the second half is for extending the ALSA sequencer interface.
The patch set is based on 6.4-rc2 kernel, but all patches can be cleanly applicable on 6.2 and 6.3 kernels, too (while 6.1 and older kernels would need minor adjustment for uapi header changes).
The updates for alsa-lib and alsa-utils will follow shortly later.
The author thanks members of MIDI Association OS/API Working Group, especially Andrew Mee, for great helps for the initial design and debugging / testing the drivers.
Link: https://lore.kernel.org/r/20230523075358.9672-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
show more ...
|
#
18f55887 |
| 19-May-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
Merge branch 'bpf: Add socket destroy capability'
Aditi Ghag says:
====================
This patch set adds the capability to destroy sockets in BPF. We plan to use the capability in Cilium to for
Merge branch 'bpf: Add socket destroy capability'
Aditi Ghag says:
====================
This patch set adds the capability to destroy sockets in BPF. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be terminated.
The use cases, and more details around the selected approach were presented at LPC 2022 - https://lpc.events/event/16/contributions/1358/. RFC discussion - https://lore.kernel.org/netdev/CABG=zsBEh-P4NXk23eBJw7eajB5YJeRS7oPXnTAzs=yob4EMoQ@mail.gmail.com/T/#u. v8 patch series - https://lore.kernel.org/bpf/20230517175359.527917-1-aditi.ghag@isovalent.com/
v9 highlights: Address review comments: Martin: - Rearranged the kfunc filter patch, and added the missing break statement. - Squashed the extended selftest/bpf patch. Yonghong: - Revised commit message for patch 1.
(Below notes are same as v8 patch series that are still relevant. Refer to earlier patch series versions for other notes.) - I hit a snag while writing the kfunc where verifier complained about the `sock_common` type passed from TCP iterator. With kfuncs, there don't seem to be any options available to pass BTF type hints to the verifier (equivalent of `ARG_PTR_TO_BTF_ID_SOCK_COMMON`, as was the case with the helper). As a result, I changed the argument type of the sock_destory kfunc to `sock_common`. ====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
4ddbcb88 |
| 19-May-2023 |
Aditi Ghag <aditi.ghag@isovalent.com> |
bpf: Add bpf_sock_destroy kfunc
The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium load-balancing to terminate client
bpf: Add bpf_sock_destroy kfunc
The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium load-balancing to terminate client sockets that continue to connect to deleted backends. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The kfunc also allows terminating sockets that may or may not be actively sending traffic.
The kfunc can currently be called only from BPF TCP and UDP iterators where users can filter, and terminate selected sockets. More specifically, it can only be called from BPF contexts that ensure socket locking in order to allow synchronous execution of protocol specific `diag_destroy` handlers. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the UDP destroy callback from BPF context by skipping socket locks in `udp_abort`. TCP iterator already supported batching of sockets being iterated. To that end, `tracing_iter_filter` callback filter is added so that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER` attach type, and reject other programs.
The kfunc takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe for certain sockets like request sockets, but these have a special handling in the diag_destroy handlers.
Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer. eg. getting a sk pointer (may be even NULL) by following another sk pointer. The pointer socket argument passed in TCP and UDP iterators is tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes are contributed by Martin KaFai Lau <martin.lau@kernel.org>.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
af53b00f |
| 18-May-2023 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v6.4-rc2' into asoc-6.5 to get fixes for CI
Linux 6.4-rc2
|
#
8c1688e8 |
| 18-May-2023 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
Merge tag 'v6.4-rc2' into v4l_for_linus
Linux 6.4-rc2
* tag 'v6.4-rc2': (162 commits) Linux 6.4-rc2 parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag ext4: bail out of ext4_xa
Merge tag 'v6.4-rc2' into v4l_for_linus
Linux 6.4-rc2
* tag 'v6.4-rc2': (162 commits) Linux 6.4-rc2 parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag ext4: bail out of ext4_xattr_ibody_get() fails for any reason ext4: add bounds checking in get_max_inline_xattr_value_size() ext4: add indication of ro vs r/w mounts in the mount message ext4: fix deadlock when converting an inline directory in nojournal mode ext4: improve error recovery code paths in __ext4_remount() ext4: improve error handling from ext4_dirhash() ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled ext4: check iomap type only if ext4_iomap_begin() does not fail ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum ext4: fix data races when using cached status extents ext4: avoid deadlock in fs reclaim with page writeback ext4: fix invalid free tracking in ext4_xattr_move_to_block() ext4: remove a BUG_ON in ext4_mb_release_group_pa() ext4: allow ext4_get_group_info() to fail cxl: Add missing return to cdat read error path tools/testing/cxl: Use DEFINE_STATIC_SRCU() x86/retbleed: Fix return thunk alignment Documentation/block: drop the request.rst file ...
show more ...
|
#
9c3a985f |
| 17-May-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Backmerge to get some hwmon dependencies.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v6.1.29 |
|
#
81cf1ade |
| 17-May-2023 |
David S. Miller <davem@davemloft.net> |
Merge branch 'tcp-io_uring-zc-opts'
Merge branch 'tcp-io_uring-zc-opts'
Pavel Begunkov says:
==================== minor tcp io_uring zc optimisations
Patch 1 is a simple cleanup, patch 2 gives re
Merge branch 'tcp-io_uring-zc-opts'
Merge branch 'tcp-io_uring-zc-opts'
Pavel Begunkov says:
==================== minor tcp io_uring zc optimisations
Patch 1 is a simple cleanup, patch 2 gives removes 2 atomics from the io_uring zc TCP submission path, which yielded extra 0.5% for my throughput CPU bound tests based on liburing/examples/send-zerocopy.c ====================
Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|