Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13 |
|
#
168e7e59 |
| 17-Jan-2024 |
Zhengchao Shao <shaozhengchao@huawei.com> |
tcp: make sure init the accept_queue's spinlocks once
[ Upstream commit 198bc90e0e734e5f98c3d2833e8390cac3df61b2 ]
When I run syz's reproduction C program locally, it causes the following issue: pv
tcp: make sure init the accept_queue's spinlocks once
[ Upstream commit 198bc90e0e734e5f98c3d2833e8390cac3df61b2 ]
When I run syz's reproduction C program locally, it causes the following issue: pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0! WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7 30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900 RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000 R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000 FS: 00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0 Call Trace: <IRQ> _raw_spin_unlock (kernel/locking/spinlock.c:186) inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321) inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358) tcp_check_req (net/ipv4/tcp_minisocks.c:868) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205) ip_local_deliver_finish (net/ipv4/ip_input.c:234) __netif_receive_skb_one_core (net/core/dev.c:5529) process_backlog (./include/linux/rcupdate.h:779) __napi_poll (net/core/dev.c:6533) net_rx_action (net/core/dev.c:6604) __do_softirq (./arch/x86/include/asm/jump_label.h:27) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4374) ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469) tcp_rcv_state_process (net/ipv4/tcp_input.c:6657) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929) __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968) release_sock (net/core/sock.c:3536) inet_wait_for_connect (net/ipv4/af_inet.c:609) __inet_stream_connect (net/ipv4/af_inet.c:702) inet_stream_connect (net/ipv4/af_inet.c:748) __sys_connect (./include/linux/file.h:45 net/socket.c:2064) __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7fa10ff05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640 R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20 </TASK>
The issue triggering process is analyzed as follows: Thread A Thread B tcp_v4_rcv //receive ack TCP packet inet_shutdown tcp_check_req tcp_disconnect //disconnect sock ... tcp_set_state(sk, TCP_CLOSE) inet_csk_complete_hashdance ... inet_csk_reqsk_queue_add inet_listen //start listen spin_lock(&queue->rskq_lock) inet_csk_listen_start ... reqsk_queue_alloc ... spin_lock_init spin_unlock(&queue->rskq_lock) //warning
When the socket receives the ACK packet during the three-way handshake, it will hold spinlock. And then the user actively shutdowns the socket and listens to the socket immediately, the spinlock will be initialized. When the socket is going to release the spinlock, a warning is generated. Also the same issue to fastopenq.lock.
Move init spinlock to inet_create and inet_accept to make sure init the accept_queue's spinlocks once.
Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue") Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Reported-by: Ming Shu <sming56@aliyun.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6 |
|
#
d983ea6f |
| 10-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
tcp: add rcu protection around tp->fastopen_rsk
Both tcp_v4_err() and tcp_v6_err() do the following operations while they do not own the socket lock :
fastopen = tp->fastopen_rsk; snd_una = fast
tcp: add rcu protection around tp->fastopen_rsk
Both tcp_v4_err() and tcp_v6_err() do the following operations while they do not own the socket lock :
fastopen = tp->fastopen_rsk; snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;
The problem is that without appropriate barrier, the compiler might reload tp->fastopen_rsk and trigger a NULL deref.
request sockets are protected by RCU, we can simply add the missing annotations and barriers to solve the issue.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6 |
|
#
2874c5fd |
| 27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17, v4.16, v4.15, v4.13.16, v4.14, v4.13.5, v4.13, v4.12, v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2, v4.10.1, v4.10 |
|
#
fee83d09 |
| 28-Dec-2016 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
ipv4: Namespaceify tcp_max_syn_backlog knob
Different namespace application might require different maximal number of remembered connection requests.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss
ipv4: Namespaceify tcp_max_syn_backlog knob
Different namespace application might require different maximal number of remembered connection requests.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9, v4.4.8, v4.4.7, openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5, v4.4.5, v4.4.4, v4.4.3, openbmc-20160222-1, v4.4.2, openbmc-20160212-1, openbmc-20160210-1, openbmc-20160202-2, openbmc-20160202-1, v4.4.1, openbmc-20160127-1, openbmc-20160120-1, v4.4, openbmc-20151217-1, openbmc-20151210-1, openbmc-20151202-1, openbmc-20151123-1, openbmc-20151118-1, openbmc-20151104-1, v4.3, openbmc-20151102-1, openbmc-20151028-1 |
|
#
ac8cfc7b |
| 30-Sep-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: restore fastopen operations
I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc() while max_qlen can be set before listen() is called, using TCP_FASTOPEN socket option for example.
tcp: restore fastopen operations
I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc() while max_qlen can be set before listen() is called, using TCP_FASTOPEN socket option for example.
Fixes: 0536fcc039a8 ("tcp: prepare fastopen code for upcoming listener changes") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ef547f2a |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: remove max_qlen_log
This control variable was set at first listen(fd, backlog) call, but not updated if application tried to increase or decrease backlog. It made sense at the time listener had
tcp: remove max_qlen_log
This control variable was set at first listen(fd, backlog) call, but not updated if application tried to increase or decrease backlog. It made sense at the time listener had a non resizeable hash table.
Also rounding to powers of two was not very friendly.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
10cbc8f1 |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp/dccp: remove struct listen_sock
It is enough to check listener sk_state, no need for an extra condition.
max_qlen_log can be moved into struct request_sock_queue
We can remove syn_wait_lock an
tcp/dccp: remove struct listen_sock
It is enough to check listener sk_state, no need for an extra condition.
max_qlen_log can be moved into struct request_sock_queue
We can remove syn_wait_lock and the alignment it enforced.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
81b496b3 |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp/dccp: shrink struct listen_sock
We no longer use hash_rnd, nr_table_entries and syn_table[]
For a listener with a backlog of 10 millions sockets, this saves 80 MBytes of vmalloced memory.
Sign
tcp/dccp: shrink struct listen_sock
We no longer use hash_rnd, nr_table_entries and syn_table[]
For a listener with a backlog of 10 millions sockets, this saves 80 MBytes of vmalloced memory.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
079096f1 |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp/dccp: install syn_recv requests into ehash table
In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per
tcp/dccp: install syn_recv requests into ehash table
In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table.
ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener.
In nominal conditions, this halves pressure on listener lock.
Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic.
We will shrink listen_sock in the following patch to ease code review.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
aac065c5 |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: move qlen/young out of struct listen_sock
qlen_inc & young_inc were protected by listener lock, while qlen_dec & young_dec were atomic fields.
Everything needs to be atomic for upcoming lockle
tcp: move qlen/young out of struct listen_sock
qlen_inc & young_inc were protected by listener lock, while qlen_dec & young_dec were atomic fields.
Everything needs to be atomic for upcoming lockless listener.
Also move qlen/young in request_sock_queue as we'll get rid of struct listen_sock eventually.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
fff1f300 |
| 02-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: add a spinlock to protect struct request_sock_queue
struct request_sock_queue fields are currently protected by the listener 'lock' (not a real spinlock)
We need to add a private spinlock inst
tcp: add a spinlock to protect struct request_sock_queue
struct request_sock_queue fields are currently protected by the listener 'lock' (not a real spinlock)
We need to add a private spinlock instead, so that softirq handlers creating children do not have to worry with backlog notion that the listener 'lock' carries.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
0536fcc0 |
| 29-Sep-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: prepare fastopen code for upcoming listener changes
While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object
tcp: prepare fastopen code for upcoming listener changes
While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it.
Otherwise an other cpu could try to lock the spinlock before it gets properly initialized.
Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer.
If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7 |
|
#
2235f2ac |
| 10-Aug-2015 |
Eric Dumazet <edumazet@google.com> |
inet: fix races with reqsk timers
reqsk_queue_destroy() and reqsk_queue_unlink() should use del_timer_sync() instead of del_timer() before calling reqsk_put(), otherwise we could free a req still us
inet: fix races with reqsk timers
reqsk_queue_destroy() and reqsk_queue_unlink() should use del_timer_sync() instead of del_timer() before calling reqsk_put(), otherwise we could free a req still used by another cpu.
But before doing so, reqsk_queue_destroy() must release syn_wait_lock spinlock or risk a dead lock, as reqsk_timer_handler() might need to take this same spinlock from reqsk_queue_unlink() (called from inet_csk_reqsk_queue_drop())
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5 |
|
#
b2827053 |
| 22-Mar-2015 |
Eric Dumazet <edumazet@google.com> |
net: convert syn_wait_lock to a spinlock
This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.
We hold syn_wait_lock for such small sections, that it makes no sense to use a re
net: convert syn_wait_lock to a spinlock
This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.
We hold syn_wait_lock for such small sections, that it makes no sense to use a read/write lock. A spin lock is simply faster.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
fa76ce73 |
| 19-Mar-2015 |
Eric Dumazet <edumazet@google.com> |
inet: get rid of central tcp/dccp listener timer
One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket.
inet: get rid of central tcp/dccp listener timer
One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket.
This function runs for awful long times, with socket lock held, meaning that other cpus needing this lock have to spin for hundred of ms.
SYNACK are sent in huge bursts, likely to cause severe drops anyway.
This model was OK 15 years ago when memory was very tight.
We now can afford to have a timer per request sock.
Timer invocations no longer need to lock the listener, and can be run from all cpus in parallel.
With following patch increasing somaxconn width to 32 bits, I tested a listener with more than 4 million active request sockets, and a steady SYNFLOOD of ~200,000 SYN per second. Host was sending ~830,000 SYNACK per second.
This is ~100 times more what we could achieve before this patch.
Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9439ce00 |
| 17-Mar-2015 |
Eric Dumazet <edumazet@google.com> |
tcp: rename struct tcp_request_sock listener
The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a f
tcp: rename struct tcp_request_sock listener
The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a full pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
13854e5a |
| 15-Mar-2015 |
Eric Dumazet <edumazet@google.com> |
inet: add proper refcounting to request sock
reqsk_put() is the generic function that should be used to release a refcount (and automatically call reqsk_free())
reqsk_free() might be called if refc
inet: add proper refcounting to request sock
reqsk_put() is the generic function that should be used to release a refcount (and automatically call reqsk_free())
reqsk_free() might be called if refcount is known to be 0 or undefined.
refcnt is set to one in inet_csk_reqsk_queue_add()
As request socks are not yet in global ehash table, I added temporary debugging checks in reqsk_put() and reqsk_free()
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3 |
|
#
f6d8cb2e |
| 24-Jun-2014 |
Eric Dumazet <edumazet@google.com> |
inet: reduce TLB pressure for listeners
It seems overkill to use vmalloc() for typical listeners with less than 2048 hash buckets. Try kmalloc() and fallback to vmalloc() to reduce TLB pressure.
Us
inet: reduce TLB pressure for listeners
It seems overkill to use vmalloc() for typical listeners with less than 2048 hash buckets. Try kmalloc() and fallback to vmalloc() to reduce TLB pressure.
Use kvfree() helper as it is now available. Use ilog2() instead of a loop.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3 |
|
#
2045ceae |
| 12-Feb-2014 |
stephen hemminger <stephen@networkplumber.org> |
net: remove unnecessary return's
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code.
I suppose some coc
net: remove unnecessary return's
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code.
I suppose some coccinelle wizardy could do this automatically.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.14-rc2, v3.14-rc1, v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12, v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5, v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1, v3.8, v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4 |
|
#
cce894bb |
| 13-Jan-2013 |
Eric Dumazet <edumazet@google.com> |
tcp: fix a panic on UP machines in reqsk_fastopen_remove
spin_is_locked() on a non !SMP build is kind of useless.
BUG_ON(!spin_is_locked(xx)) is guaranteed to crash.
Just remove this check in reqs
tcp: fix a panic on UP machines in reqsk_fastopen_remove
spin_is_locked() on a non !SMP build is kind of useless.
BUG_ON(!spin_is_locked(xx)) is guaranteed to crash.
Just remove this check in reqsk_fastopen_remove() as the callers do hold the socket lock.
Reported-by: Ketan Kulkarni <ketkulka@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Dave Taht <dave.taht@gmail.com> Acked-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1, v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4 |
|
#
8336886f |
| 31-Aug-2012 |
Jerry Chu <hkchu@google.com> |
tcp: TCP Fast Open Server - support TFO listeners
This patch builds on top of the previous patch to add the support for TFO listeners. This includes -
1. allocating, properly initializing, and mana
tcp: TCP Fast Open Server - support TFO listeners
This patch builds on top of the previous patch to add the support for TFO listeners. This includes -
1. allocating, properly initializing, and managing the per listener fastopen_queue structure when TFO is enabled
2. changes to the inet_csk_accept code to support TFO. E.g., the request_sock can no longer be freed upon accept(), not until 3WHS finishes
3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg() if it's a TFO socket
4. properly closing a TFO listener, and a TFO socket before 3WHS finishes
5. supporting TCP_FASTOPEN socket option
6. modifying tcp_check_req() to use to check a TFO socket as well as request_sock
7. supporting TCP's TFO cookie option
8. adding a new SYN-ACK retransmit handler to use the timer directly off the TFO socket rather than the listener socket. Note that TFO server side will not retransmit anything other than SYN-ACK until the 3WHS is completed.
The patch also contains an important function "reqsk_fastopen_remove()" to manage the somewhat complex relation between a listener, its request_sock, and the corresponding child socket. See the comment above the function for the detail.
Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7, v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5 |
|
#
99b53bdd |
| 05-Dec-2011 |
Peter Pan(潘卫平) <panweiping3@gmail.com> |
ipv4:correct description for tcp_max_syn_backlog
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value
ipv4:correct description for tcp_max_syn_backlog
Since commit c5ed63d66f24(tcp: fix three tcp sysctls tuning), sysctl_max_syn_backlog is determined by tcp_hashinfo->ehash_mask, and the minimal value is 128, and it will increase in proportion to the memory of machine. The original description for tcp_max_syn_backlog and sysctl_max_syn_backlog are out of date.
Changelog: V2: update description for sysctl_max_syn_backlog
Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Shan Wei <shanwei88@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1, v3.1, v3.1-rc10, v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1, v2.6.39, v2.6.39-rc7, v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3, v2.6.39-rc2, v2.6.39-rc1, v2.6.38, v2.6.38-rc8, v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3, v2.6.38-rc2, v2.6.38-rc1, v2.6.37, v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6, v2.6.37-rc5 |
|
#
493f377d |
| 02-Dec-2010 |
David S. Miller <davem@davemloft.net> |
tcp: Add timewait recycling bits to ipv6 connect code.
This will also improve handling of ipv6 tcp socket request backlog when syncookies are not enabled. When backlog becomes very deep, last quart
tcp: Add timewait recycling bits to ipv6 connect code.
This will also improve handling of ipv6 tcp socket request backlog when syncookies are not enabled. When backlog becomes very deep, last quarter of backlog is limited to validated destinations. Previously only ipv4 implemented this logic, but now ipv6 does too.
Now we are only one step away from enabling timewait recycling for ipv6, and that step is simply filling in the implementation of tcp_v6_get_peer() and tcp_v6_tw_get_peer().
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v2.6.37-rc4, v2.6.37-rc3 |
|
#
7a1c8e5a |
| 20-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: allow GFP_HIGHMEM in __vmalloc()
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.
In ceph, add the missing flag.
In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
net: allow GFP_HIGHMEM in __vmalloc()
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.
In ceph, add the missing flag.
In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is cleaner and allows using HIGHMEM pages as well.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v2.6.37-rc2, v2.6.37-rc1, v2.6.36, v2.6.36-rc8, v2.6.36-rc7, v2.6.36-rc6, v2.6.36-rc5, v2.6.36-rc4, v2.6.36-rc3, v2.6.36-rc2, v2.6.36-rc1, v2.6.35, v2.6.35-rc6, v2.6.35-rc5, v2.6.35-rc4, v2.6.35-rc3, v2.6.35-rc2, v2.6.35-rc1, v2.6.34, v2.6.34-rc7, v2.6.34-rc6, v2.6.34-rc5, v2.6.34-rc4, v2.6.34-rc3, v2.6.34-rc2, v2.6.34-rc1, v2.6.33, v2.6.33-rc8, v2.6.33-rc7, v2.6.33-rc6, v2.6.33-rc5, v2.6.33-rc4, v2.6.33-rc3, v2.6.33-rc2, v2.6.33-rc1, v2.6.32, v2.6.32-rc8, v2.6.32-rc7, v2.6.32-rc6, v2.6.32-rc5, v2.6.32-rc4, v2.6.32-rc3, v2.6.32-rc1, v2.6.32-rc2, v2.6.31, v2.6.31-rc9, v2.6.31-rc8, v2.6.31-rc7, v2.6.31-rc6, v2.6.31-rc5, v2.6.31-rc4, v2.6.31-rc3, v2.6.31-rc2, v2.6.31-rc1, v2.6.30, v2.6.30-rc8, v2.6.30-rc7, v2.6.30-rc6, v2.6.30-rc5, v2.6.30-rc4, v2.6.30-rc3, v2.6.30-rc2, v2.6.30-rc1, v2.6.29, v2.6.29-rc8, v2.6.29-rc7, v2.6.29-rc6, v2.6.29-rc5, v2.6.29-rc4, v2.6.29-rc3, v2.6.29-rc2, v2.6.29-rc1, v2.6.28, v2.6.28-rc9, v2.6.28-rc8, v2.6.28-rc7, v2.6.28-rc6, v2.6.28-rc5, v2.6.28-rc4, v2.6.28-rc3, v2.6.28-rc2, v2.6.28-rc1, v2.6.27, v2.6.27-rc9, v2.6.27-rc8, v2.6.27-rc7, v2.6.27-rc6, v2.6.27-rc5, v2.6.27-rc4, v2.6.27-rc3, v2.6.27-rc2, v2.6.27-rc1 |
|
#
547b792c |
| 25-Jul-2008 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), a
net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|