History log of /openbmc/linux/net/ipv4/tcp.c (Results 126 – 150 of 4298)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.1.31
# 34dfde4a 26-May-2023 Cambda Zhu <cambda@linux.alibaba.com>

tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set

This patch replaces the tp->mss_cache check in getting TCP_MAXSEG
with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Sinc

tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set

This patch replaces the tp->mss_cache check in getting TCP_MAXSEG
with tp->rx_opt.user_mss check for CLOSE/LISTEN sock. Since
tp->mss_cache is initialized with TCP_MSS_DEFAULT, checking if
it's zero is probably a bug.

With this change, getting TCP_MAXSEG before connecting will return
default MSS normally, and return user_mss if user_mss is set.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Jack Yang <mingliang@linux.alibaba.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/CANn89i+3kL9pYtkxkwxwNMzvC_w3LNUum_2=3u+UyLBmGmifHA@mail.gmail.com/#t
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/netdev/14D45862-36EA-4076-974C-EA67513C92F6@linux.alibaba.com/
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230527040317.68247-1-cambda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 4faeee0c 26-May-2023 Eric Dumazet <edumazet@google.com>

tcp: deny tcp_disconnect() when threads are waiting

Historically connect(AF_UNSPEC) has been abused by syzkaller
and other fuzzers to trigger various bugs.

A recent one triggers a divide-by-zero [1

tcp: deny tcp_disconnect() when threads are waiting

Historically connect(AF_UNSPEC) has been abused by syzkaller
and other fuzzers to trigger various bugs.

A recent one triggers a divide-by-zero [1], and Paolo Abeni
was able to diagnose the issue.

tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN
and TCP REPAIR mode being not used.

Then later if socket lock is released in sk_wait_data(),
another thread can call connect(AF_UNSPEC), then make this
socket a TCP listener.

When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf()
and attempt a divide by 0 in tcp_rcv_space_adjust() [1]

This patch adds a new socket field, counting number of threads
blocked in sk_wait_event() and inet_wait_for_connect().

If this counter is not zero, tcp_disconnect() returns an error.

This patch adds code in blocking socket system calls, thus should
not hurt performance of non blocking ones.

Note that we probably could revert commit 499350a5a6e7 ("tcp:
initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore
original tcpi_rcv_mss meaning (was 0 if no payload was ever
received on a socket)

[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740
Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48
RSP: 0018:ffffc900033af660 EFLAGS: 00010206
RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001
RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8
R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344
FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616
tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681
inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1038
____sys_recvmsg+0x210/0x5a0 net/socket.c:2720
___sys_recvmsg+0xf2/0x180 net/socket.c:2762
do_recvmmsg+0x25e/0x6e0 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5c0108c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9
RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003
RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000
</TASK>

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Diagnosed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 7e530d32 28-May-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.4-rc4 into usb-next

We need the USB fixes in here and this resolves merge conflicts in:
drivers/usb/dwc3/gadget.c
drivers/usb/gadget/udc/core.c

Signed-off-by: Greg Kroah-Hartman <gregkh@l

Merge 6.4-rc4 into usb-next

We need the USB fixes in here and this resolves merge conflicts in:
drivers/usb/dwc3/gadget.c
drivers/usb/gadget/udc/core.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 8a29f74b 28-May-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge v6.4-rc4 into char-misc-next

We need the binder fixes in here for future changes and testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0e4daea3 27-May-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.4-rc3 into tty-next

We need the serial/tty fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 75455b90 26-May-2023 Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-05-26

We've added 54 non-merge commits

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-05-26

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 76 files changed, 2729 insertions(+), 1003 deletions(-).

The main changes are:

1) Add the capability to destroy sockets in BPF through a new kfunc,
from Aditi Ghag.

2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands,
from Andrii Nakryiko.

3) Add capability for libbpf to resize datasec maps when backed via mmap,
from JP Kobryn.

4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod,
from Jiri Olsa.

5) Big batch of xsk selftest improvements to prep for multi-buffer testing,
from Magnus Karlsson.

6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it
via bpftool, from Yafang Shao.

7) Various misc BPF selftest improvements to work with upcoming LLVM 17,
from Yonghong Song.

8) Extend bpftool to specify netdevice for resolving XDP hints,
from Larysa Zaremba.

9) Document masking in shift operations for the insn set document,
from Dave Thaler.

10) Extend BPF selftests to check xdp_feature support for bond driver,
from Lorenzo Bianconi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
bpf: Fix bad unlock balance on freeze_mutex
libbpf: Ensure FD >= 3 during bpf_map__reuse_fd()
libbpf: Ensure libbpf always opens files with O_CLOEXEC
selftests/bpf: Check whether to run selftest
libbpf: Change var type in datasec resize func
bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command
libbpf: Selftests for resizing datasec maps
libbpf: Add capability for resizing datasec maps
selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests
libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd
bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands
libbpf: Start v1.3 development cycle
bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM
bpftool: Specify XDP Hints ifname when loading program
selftests/bpf: Add xdp_feature selftest for bond device
selftests/bpf: Test bpf_sock_destroy
selftests/bpf: Add helper to get port using getsockname
bpf: Add bpf_sock_destroy kfunc
bpf: Add kfunc filter function to 'struct btf_kfunc_id_set'
bpf: udp: Implement batching for sockets iterator
...
====================

Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# d4031ec8 25-May-2023 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/raw.c
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/raw.c
3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol")
c85be08fc4fa ("raw: Stop using RTO_ONLINK.")
https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/freescale/fec_main.c
9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values")
144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 50fb587e 25-May-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and bpf.

Current release - regressions

Merge tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and bpf.

Current release - regressions:

- net: fix skb leak in __skb_tstamp_tx()

- eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs

Current release - new code bugs:

- handshake:
- fix sock->file allocation
- fix handshake_dup() ref counting

- bluetooth:
- fix potential double free caused by hci_conn_unlink
- fix UAF in hci_conn_hash_flush

Previous releases - regressions:

- core: fix stack overflow when LRO is disabled for virtual
interfaces

- tls: fix strparser rx issues

- bpf:
- fix many sockmap/TCP related issues
- fix a memory leak in the LRU and LRU_PERCPU hash maps
- init the offload table earlier

- eth: mlx5e:
- do as little as possible in napi poll when budget is 0
- fix using eswitch mapping in nic mode
- fix deadlock in tc route query code

Previous releases - always broken:

- udplite: fix NULL pointer dereference in __sk_mem_raise_allocated()

- raw: fix output xfrm lookup wrt protocol

- smc: reset connection when trying to use SMCRv2 fails

- phy: mscc: enable VSC8501/2 RGMII RX clock

- eth: octeontx2-pf: fix TSOv6 offload

- eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize"

* tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated().
net: phy: mscc: enable VSC8501/2 RGMII RX clock
net: phy: mscc: remove unnecessary phydev locking
net: phy: mscc: add support for VSC8501
net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE
net/handshake: Enable the SNI extension to work properly
net/handshake: Unpin sock->file if a handshake is cancelled
net/handshake: handshake_genl_notify() shouldn't ignore @flags
net/handshake: Fix uninitialized local variable
net/handshake: Fix handshake_dup() ref counting
net/handshake: Remove unneeded check from handshake_dup()
ipv6: Fix out-of-bounds access in ipv6_find_tlv()
net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs
docs: netdev: document the existence of the mail bot
net: fix skb leak in __skb_tstamp_tx()
r8169: Use a raw_spinlock_t for the register locks.
page_pool: fix inconsistency for page_pool_ring_[un]lock()
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
...

show more ...


# 0c615f1c 24-May-2023 Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-05-24

We've added 19 non-merge commits during th

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-05-24

We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).

The main changes are:

1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
from John Fastabend.

2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
from Anton Protopopov.

3) Init the BPF offload table earlier than just late_initcall,
from Jakub Kicinski.

4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
from Will Deacon.

5) Remove a now unsupported __fallthrough in BPF samples,
from Andrii Nakryiko.

6) Fix a typo in pkg-config call for building sign-file,
from Jeremy Sowden.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
bpf, sockmap: Build helper to create connected socket pair
bpf, sockmap: Pull socket helpers out of listen test for general use
bpf, sockmap: Incorrectly handling copied_seq
bpf, sockmap: Wake up polling after data copy
bpf, sockmap: TCP data stall on recv before accept
bpf, sockmap: Handle fin correctly
bpf, sockmap: Improved check for empty queue
bpf, sockmap: Reschedule is now done through backlog
bpf, sockmap: Convert schedule_work into delayed_work
bpf, sockmap: Pass skb ownership through read_skb
bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
samples/bpf: Drop unnecessary fallthrough
bpf: netdev: init the offload table earlier
selftests/bpf: Fix pkg-config call building sign-file
====================

Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.30
# 01bc4ac9 24-May-2023 Mark Brown <broonie@kernel.org>

spi: Merge up v6.4-rc3

Merge up v6.4-rc3 in order to get fixes to improve the stability of my
CI.


# 6c594a82 24-May-2023 Mark Brown <broonie@kernel.org>

regulator: Merge up v6.4-rc3

Merge up v6.4-rc3 in order to get fixes to improve the stability of my
CI.


# 51c78a4d 23-May-2023 Jakub Kicinski <kuba@kernel.org>

Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'

David Howells says:

====================
splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1

Here's

Merge branch 'splice-net-replace-sendpage-with-sendmsg-msg_splice_pages-part-1'

David Howells says:

====================
splice, net: Replace sendpage with sendmsg(MSG_SPLICE_PAGES), part 1

Here's the first tranche of patches towards providing a MSG_SPLICE_PAGES
internal sendmsg flag that is intended to replace the ->sendpage() op with
calls to sendmsg(). MSG_SPLICE_PAGES is a hint that tells the protocol
that it should splice the pages supplied if it can and copy them if not.

This will allow splice to pass multiple pages in a single call and allow
certain parts of higher protocols (e.g. sunrpc, iwarp) to pass an entire
message in one go rather than having to send them piecemeal. This should
also make it easier to handle the splicing of multipage folios.

A helper, skb_splice_from_iter() is provided to do the work of splicing or
copying data from an iterator. If a page is determined to be unspliceable
(such as being in the slab), then the helper will give an error.

Note that this facility is not made available to userspace and does not
provide any sort of callback.

This set consists of the following parts:

(1) Define the MSG_SPLICE_PAGES flag and prevent sys_sendmsg() from being
able to set it.

(2) Add an extra argument to skb_append_pagefrags() so that something
other than MAX_SKB_FRAGS can be used (sysctl_max_skb_frags for
example).

(3) Add the skb_splice_from_iter() helper to handle splicing pages into
skbuffs for MSG_SPLICE_PAGES that can be shared by TCP, IP/UDP and
AF_UNIX.

(4) Implement MSG_SPLICE_PAGES support in TCP.

(5) Make do_tcp_sendpages() just wrap sendmsg() and then fold it in to its
various callers.

(6) Implement MSG_SPLICE_PAGES support in IP and make udp_sendpage() just
a wrapper around sendmsg().

(7) Implement MSG_SPLICE_PAGES support in IP6/UDP6.

(8) Implement MSG_SPLICE_PAGES support in AF_UNIX.

(9) Make AF_UNIX copy unspliceable pages.

Link: https://lore.kernel.org/r/20230316152618.711970-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230329141354.516864-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230331160914.1608208-1-dhowells@redhat.com/ # v3
Link: https://lore.kernel.org/r/20230405165339.3468808-1-dhowells@redhat.com/ # v4
Link: https://lore.kernel.org/r/20230406094245.3633290-1-dhowells@redhat.com/ # v5
Link: https://lore.kernel.org/r/20230411160902.4134381-1-dhowells@redhat.com/ # v6
Link: https://lore.kernel.org/r/20230515093345.396978-1-dhowells@redhat.com/ # v7
Link: https://lore.kernel.org/r/20230518113453.1350757-1-dhowells@redhat.com/ # v8
====================

Link: https://lore.kernel.org/r/20230522121125.2595254-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 5367f9bb 22-May-2023 David Howells <dhowells@redhat.com>

tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()

Fold do_tcp_sendpages() into its last remaining caller,
tcp_sendpage_locked().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David A

tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()

Fold do_tcp_sendpages() into its last remaining caller,
tcp_sendpage_locked().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# c5c37af6 22-May-2023 David Howells <dhowells@redhat.com>

tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES

Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. do_tcp_sendpages() can t

tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES

Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. do_tcp_sendpages() can then be
inlined in subsequent patches into its callers.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 270a1c3d 22-May-2023 David Howells <dhowells@redhat.com>

tcp: Support MSG_SPLICE_PAGES

Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced or copied (if it cannot be spliced) from the source iterator.

This allows ->sendpage()

tcp: Support MSG_SPLICE_PAGES

Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced or copied (if it cannot be spliced) from the source iterator.

This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 90d0d600 23-May-2023 Mark Brown <broonie@kernel.org>

regmap: Merge up v6.4-rc3

Merge up v6.4-rc3 to get fixes which make my CI more stable.


# e5c6de5f 22-May-2023 John Fastabend <john.fastabend@gmail.com>

bpf, sockmap: Incorrectly handling copied_seq

The read_skb() logic is incrementing the tcp->copied_seq which is used for
among other things calculating how many outstanding bytes can be read by
the

bpf, sockmap: Incorrectly handling copied_seq

The read_skb() logic is incrementing the tcp->copied_seq which is used for
among other things calculating how many outstanding bytes can be read by
the application. This results in application errors, if the application
does an ioctl(FIONREAD) we return zero because this is calculated from
the copied_seq value.

To fix this we move tcp->copied_seq accounting into the recv handler so
that we update these when the recvmsg() hook is called and data is in
fact copied into user buffers. This gives an accurate FIONREAD value
as expected and improves ACK handling. Before we were calling the
tcp_rcv_space_adjust() which would update 'number of bytes copied to
user in last RTT' which is wrong for programs returning SK_PASS. The
bytes are only copied to the user when recvmsg is handled.

Doing the fix for recvmsg is straightforward, but fixing redirect and
SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
call this from skmsg handlers. This fixes another issue where a broken
socket with a BPF program doing a resubmit could hang the receiver. This
happened because although read_skb() consumed the skb through sock_drop()
it did not update the copied_seq. Now if a single reccv socket is
redirecting to many sockets (for example for lb) the receiver sk will be
hung even though we might expect it to continue. The hang comes from
not updating the copied_seq numbers and memory pressure resulting from
that.

We have a slight layer problem of calling tcp_eat_skb even if its not
a TCP socket. To fix we could refactor and create per type receiver
handlers. I decided this is more work than we want in the fix and we
already have some small tweaks depending on caller that use the
helper skb_bpf_strparser(). So we extend that a bit and always set
the strparser bit when it is in use and then we can gate the
seq_copied updates on this.

Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com

show more ...


# 78fa0d61 22-May-2023 John Fastabend <john.fastabend@gmail.com>

bpf, sockmap: Pass skb ownership through read_skb

The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that th

bpf, sockmap: Pass skb ownership through read_skb

The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

skb_linearize()
__pskb_pull_tail()
pskb_expand_head()
BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[ 106.536188] ------------[ cut here ]------------
[ 106.536197] kernel BUG at net/core/skbuff.c:1693!
[ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.542255] Call Trace:
[ 106.542383] <IRQ>
[ 106.542487] __pskb_pull_tail+0x4b/0x3e0
[ 106.542681] skb_ensure_writable+0x85/0xa0
[ 106.542882] sk_skb_pull_data+0x18/0x20
[ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[ 106.543536] ? migrate_disable+0x66/0x80
[ 106.543871] sk_psock_verdict_recv+0xe2/0x310
[ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0
[ 106.544561] tcp_read_skb+0x7b/0x120
[ 106.544740] tcp_data_queue+0x904/0xee0
[ 106.544931] tcp_rcv_established+0x212/0x7c0
[ 106.545142] tcp_v4_do_rcv+0x174/0x2a0
[ 106.545326] tcp_v4_rcv+0xe70/0xf60
[ 106.545500] ip_protocol_deliver_rcu+0x48/0x290
[ 106.545744] ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com

show more ...


# 03a58514 23-May-2023 Takashi Iwai <tiwai@suse.de>

Merge branch 'topic/midi20' into for-next

This is a (largish) patch set for adding the support of MIDI 2.0
functionality, mainly targeted for USB devices. MIDI 2.0 is a
complete overhaul of the 40-

Merge branch 'topic/midi20' into for-next

This is a (largish) patch set for adding the support of MIDI 2.0
functionality, mainly targeted for USB devices. MIDI 2.0 is a
complete overhaul of the 40-years old MIDI 1.0. Unlike MIDI 1.0 byte
stream, MIDI 2.0 uses packets in 32bit words for Universal MIDI Packet
(UMP) protocol. It supports both MIDI 1.0 commands for compatibility
and the extended MIDI 2.0 commands for higher resolutions and more
functions.

For supporting the UMP, the patch set extends the existing ALSA
rawmidi and sequencer interfaces, and adds the USB MIDI 2.0 support to
the standard USB-audio driver.

The rawmidi for UMP has a different device name (/dev/snd/umpC*D*) and
it reads/writes UMP packet data in 32bit CPU-native endianness. For
the old MIDI 1.0 applications, the legacy rawmidi interface is
provided, too.

As default, USB-audio driver will take the alternate setting for MIDI
2.0 interface, and the compatibility with MIDI 1.0 is provided via the
rawmidi common layer. However, user may let the driver falling back
to the old MIDI 1.0 interface by a module option, too.

A UMP-capable rawmidi device can create the corresponding ALSA
sequencer client(s) to support the UMP Endpoint and UMP Group
connections. As a nature of ALSA sequencer, arbitrary connections
between clients/ports are allowed, and the ALSA sequencer core
performs the automatic conversions for the connections between a new
UMP sequencer client and a legacy MIDI 1.0 sequencer client. It
allows the existing application to use MIDI 2.0 devices without
changes.

The MIDI-CI, which is another major extension in MIDI 2.0, isn't
covered by this patch set. It would be implemented rather in
user-space.

Roughly speaking, the first half of this patch set is for extending
the rawmidi and USB-audio, and the second half is for extending the
ALSA sequencer interface.

The patch set is based on 6.4-rc2 kernel, but all patches can be
cleanly applicable on 6.2 and 6.3 kernels, too (while 6.1 and older
kernels would need minor adjustment for uapi header changes).

The updates for alsa-lib and alsa-utils will follow shortly later.

The author thanks members of MIDI Association OS/API Working Group,
especially Andrew Mee, for great helps for the initial design and
debugging / testing the drivers.

Link: https://lore.kernel.org/r/20230523075358.9672-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

show more ...


# 18f55887 19-May-2023 Martin KaFai Lau <martin.lau@kernel.org>

Merge branch 'bpf: Add socket destroy capability'

Aditi Ghag says:

====================

This patch set adds the capability to destroy sockets in BPF. We plan to
use the capability in Cilium to for

Merge branch 'bpf: Add socket destroy capability'

Aditi Ghag says:

====================

This patch set adds the capability to destroy sockets in BPF. We plan to
use the capability in Cilium to force client sockets to reconnect when
their remote load-balancing backends are deleted. The other use case is
on-the-fly policy enforcement where existing socket connections
prevented by policies need to be terminated.

The use cases, and more details around
the selected approach were presented at LPC 2022 -
https://lpc.events/event/16/contributions/1358/.
RFC discussion -
https://lore.kernel.org/netdev/CABG=zsBEh-P4NXk23eBJw7eajB5YJeRS7oPXnTAzs=yob4EMoQ@mail.gmail.com/T/#u.
v8 patch series -
https://lore.kernel.org/bpf/20230517175359.527917-1-aditi.ghag@isovalent.com/

v9 highlights:
Address review comments:
Martin:
- Rearranged the kfunc filter patch, and added the missing break
statement.
- Squashed the extended selftest/bpf patch.
Yonghong:
- Revised commit message for patch 1.

(Below notes are same as v8 patch series that are still relevant. Refer to
earlier patch series versions for other notes.)
- I hit a snag while writing the kfunc where verifier complained about the
`sock_common` type passed from TCP iterator. With kfuncs, there don't
seem to be any options available to pass BTF type hints to the verifier
(equivalent of `ARG_PTR_TO_BTF_ID_SOCK_COMMON`, as was the case with the
helper). As a result, I changed the argument type of the sock_destory
kfunc to `sock_common`.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

show more ...


# 4ddbcb88 19-May-2023 Aditi Ghag <aditi.ghag@isovalent.com>

bpf: Add bpf_sock_destroy kfunc

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium
load-balancing to terminate client

bpf: Add bpf_sock_destroy kfunc

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium
load-balancing to terminate client sockets that continue to connect to
deleted backends. The other use case is on-the-fly policy enforcement
where existing socket connections prevented by policies need to be
forcefully terminated. The kfunc also allows terminating sockets that may
or may not be actively sending traffic.

The kfunc can currently be called only from BPF TCP and UDP iterators
where users can filter, and terminate selected sockets. More
specifically, it can only be called from BPF contexts that ensure
socket locking in order to allow synchronous execution of protocol
specific `diag_destroy` handlers. The previous commit that batches UDP
sockets during iteration facilitated a synchronous invocation of the UDP
destroy callback from BPF context by skipping socket locks in
`udp_abort`. TCP iterator already supported batching of sockets being
iterated. To that end, `tracing_iter_filter` callback filter is added so
that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER`
attach type, and reject other programs.

The kfunc takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have a
special handling in the diag_destroy handlers.

Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the
cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer.
eg. getting a sk pointer (may be even NULL) by following another sk
pointer. The pointer socket argument passed in TCP and UDP iterators is
tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes
are contributed by Martin KaFai Lau <martin.lau@kernel.org>.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

show more ...


# af53b00f 18-May-2023 Mark Brown <broonie@kernel.org>

Merge tag 'v6.4-rc2' into asoc-6.5 to get fixes for CI

Linux 6.4-rc2


# 8c1688e8 18-May-2023 Mauro Carvalho Chehab <mchehab@kernel.org>

Merge tag 'v6.4-rc2' into v4l_for_linus

Linux 6.4-rc2

* tag 'v6.4-rc2': (162 commits)
Linux 6.4-rc2
parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
ext4: bail out of ext4_xa

Merge tag 'v6.4-rc2' into v4l_for_linus

Linux 6.4-rc2

* tag 'v6.4-rc2': (162 commits)
Linux 6.4-rc2
parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
ext4: bail out of ext4_xattr_ibody_get() fails for any reason
ext4: add bounds checking in get_max_inline_xattr_value_size()
ext4: add indication of ro vs r/w mounts in the mount message
ext4: fix deadlock when converting an inline directory in nojournal mode
ext4: improve error recovery code paths in __ext4_remount()
ext4: improve error handling from ext4_dirhash()
ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
ext4: check iomap type only if ext4_iomap_begin() does not fail
ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
ext4: fix data races when using cached status extents
ext4: avoid deadlock in fs reclaim with page writeback
ext4: fix invalid free tracking in ext4_xattr_move_to_block()
ext4: remove a BUG_ON in ext4_mb_release_group_pa()
ext4: allow ext4_get_group_info() to fail
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()
x86/retbleed: Fix return thunk alignment
Documentation/block: drop the request.rst file
...

show more ...


# 9c3a985f 17-May-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to get some hwmon dependencies.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.1.29
# 81cf1ade 17-May-2023 David S. Miller <davem@davemloft.net>

Merge branch 'tcp-io_uring-zc-opts'

Merge branch 'tcp-io_uring-zc-opts'

Pavel Begunkov says:

====================
minor tcp io_uring zc optimisations

Patch 1 is a simple cleanup, patch 2 gives re

Merge branch 'tcp-io_uring-zc-opts'

Merge branch 'tcp-io_uring-zc-opts'

Pavel Begunkov says:

====================
minor tcp io_uring zc optimisations

Patch 1 is a simple cleanup, patch 2 gives removes 2 atomics from the
io_uring zc TCP submission path, which yielded extra 0.5% for my
throughput CPU bound tests based on liburing/examples/send-zerocopy.c
====================

Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


12345678910>>...172