Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
43c28172 |
| 21-Aug-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
net: remove unnecessary input parameter 'how' in ifdown function
When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementat
net: remove unnecessary input parameter 'how' in ifdown function
When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementation of the ifdown interface, ip6_dst_ifdown does not use the input parameter 'how', xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter 'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown function.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3 |
|
#
418a7307 |
| 20-Apr-2023 |
Maxime Bizon <mbizon@freebox.fr> |
net: dst: fix missing initialization of rt_uncached
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a xfrm4_fill_dst() call in between, causes the following BUG:
BUG: spinlock bad magic
net: dst: fix missing initialization of rt_uncached
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a xfrm4_fill_dst() call in between, causes the following BUG:
BUG: spinlock bad magic on CPU#0, fbxhostapd/732 lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9 Hardware name: Marvell Kirkwood (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x28/0x30 dump_stack_lvl from do_raw_spin_lock+0x20/0x80 do_raw_spin_lock from rt_del_uncached_list+0x30/0x64 rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc xfrm4_dst_destroy from dst_destroy+0x5c/0xb0 dst_destroy from rcu_process_callbacks+0xc4/0xec rcu_process_callbacks from __do_softirq+0xb4/0x22c __do_softirq from call_with_stack+0x1c/0x24 call_with_stack from do_softirq+0x60/0x6c do_softirq from __local_bh_enable_ip+0xa0/0xcc
Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved rt_uncached and rt_uncached_list fields from rtable struct to dst struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in xfrm_alloc_dst().
Note that rt_uncached (list_head) was never properly initialized at alloc time, but xfrm[46]_dst_destroy() is written in such a way that it was not an issue thanks to the memset:
if (xdst->u.rt.dst.rt_uncached_list) rt_del_uncached_list(&xdst->u.rt);
The route code does it the other way around: rt_uncached_list is assumed to be valid IIF rt_uncached list_head is not empty:
void rt_del_uncached_list(struct rtable *rt) { if (!list_empty(&rt->dst.rt_uncached)) { struct uncached_list *ul = rt->dst.rt_uncached_list;
spin_lock_bh(&ul->lock); list_del_init(&rt->dst.rt_uncached); spin_unlock_bh(&ul->lock); } }
This patch adds mandatory rt_uncached list_head initialization in generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the rest of the code.
Fixes: d288a162dd1c ("net: dst: Prevent false sharing vs. dst_entry:: __refcnt") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202304162125.18b7bcdd-oliver.sang@intel.com Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> CC: Leon Romanovsky <leon@kernel.org> Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Link: https://lore.kernel.org/r/20230420182508.2417582-1-mbizon@freebox.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.25, v6.1.24, v6.1.23, v6.1.22 |
|
#
bc9d3a9f |
| 23-Mar-2023 |
Thomas Gleixner <tglx@linutronix.de> |
net: dst: Switch to rcuref_t reference counting
Under high contention dst_entry::__refcnt becomes a significant bottleneck.
atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes in
net: dst: Switch to rcuref_t reference counting
Under high contention dst_entry::__refcnt becomes a significant bottleneck.
atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into high retry rates on contention.
Switch the reference count to rcuref_t which results in a significant performance gain. Rename the reference count member to __rcuref to reflect the change.
The gain depends on the micro-architecture and the number of concurrent operations and has been measured in the range of +25% to +130% with a localhost memtier/memcached benchmark which amplifies the problem massively.
Running the memtier/memcached benchmark over a real (1Gb) network connection the conversion on top of the false sharing fix for struct dst_entry::__refcnt results in a total gain in the 2%-5% range over the upstream baseline.
Reported-by: Wangyang Guo <wangyang.guo@intel.com> Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19 |
|
#
af6d1034 |
| 11-Jan-2023 |
Jon Maxwell <jmaxwell37@gmail.com> |
ipv6: remove max_size check inline with ipv4
In ip6_dst_gc() replace:
if (entries > gc_thresh)
With:
if (entries > ops->gc_thresh)
Sending Ipv6 packets in a loop via a raw socket triggers an
ipv6: remove max_size check inline with ipv4
In ip6_dst_gc() replace:
if (entries > gc_thresh)
With:
if (entries > ops->gc_thresh)
Sending Ipv6 packets in a loop via a raw socket triggers an issue where a route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly consumes the Ipv6 max_size threshold which defaults to 4096 resulting in these warnings:
[1] 99.187805] dst_alloc: 7728 callbacks suppressed [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. . . [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size.
When this happens the packet is dropped and sendto() gets a network is unreachable error:
remaining pkt 200557 errno 101 remaining pkt 196462 errno 101 . . remaining pkt 126821 errno 101
Implement David Aherns suggestion to remove max_size check seeing that Ipv6 has a GC to manage memory usage. Ipv4 already does not check max_size.
Here are some memory comparisons for Ipv4 vs Ipv6 with the patch:
Test by running 5 instances of a program that sends UDP packets to a raw socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar program.
Ipv4:
Before test:
MemFree: 29427108 kB Slab: 237612 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 2881 3990 192 42 2 : tunables 0 0 0
During test:
MemFree: 29417608 kB Slab: 247712 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 44394 44394 192 42 2 : tunables 0 0 0
After test:
MemFree: 29422308 kB Slab: 238104 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
Ipv6 with patch:
Errno 101 errors are not observed anymore with the patch.
Before test:
MemFree: 29422308 kB Slab: 238104 kB
ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
During Test:
MemFree: 29431516 kB Slab: 240940 kB
ip6_dst_cache 11980 12064 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
After Test:
MemFree: 29441816 kB Slab: 238132 kB
ip6_dst_cache 1902 2432 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0
Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12 |
|
#
94151f5a |
| 03-Dec-2022 |
Eyal Birger <eyal.birger@gmail.com> |
xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows s
xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs.
This object is built based on the availability of BTF debug info.
When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet.
In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload.
The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-3-eyal.birger@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
Revision tags: v6.0.11, v6.0.10, v5.15.80 |
|
#
483c26ff |
| 18-Nov-2022 |
Joel Fernandes (Google) <joel@joelfernandes.org> |
net: Use call_rcu_hurry() for dst_release()
In a networking test on ChromeOS, kernels built with the new CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown phase.
This failure
net: Use call_rcu_hurry() for dst_release()
In a networking test on ChromeOS, kernels built with the new CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown phase.
This failure may be reproduced as follows: ip netns del <name>
The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits in this series for the benefit of certain battery-powered systems. This Kconfig option causes call_rcu() to delay its callbacks in order to batch them. This means that a given RCU grace period covers more callbacks, thus reducing the number of grace periods, in turn reducing the amount of energy consumed, which increases battery lifetime which can be a very good thing. This is not a subtle effect: In some important use cases, the battery lifetime is increased by more than 10%.
This CONFIG_RCU_LAZY=y option is available only for CPUs that offload callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
Delaying callbacks is normally not a problem because most callbacks do nothing but free memory. If the system is short on memory, a shrinker will kick all currently queued lazy callbacks out of their laziness, thus freeing their memory in short order. Similarly, the rcu_barrier() function, which blocks until all currently queued callbacks are invoked, will also kick lazy callbacks, thus enabling rcu_barrier() to complete in a timely manner.
However, there are some cases where laziness is not a good option. For example, synchronize_rcu() invokes call_rcu(), and blocks until the newly queued callback is invoked. It would not be a good for synchronize_rcu() to block for ten seconds, even on an idle system. Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a given CPU kicks any lazy callbacks that might be already queued on that CPU. After all, if there is going to be a grace period, all callbacks might as well get full benefit from it.
Yes, this could be done the other way around by creating a call_rcu_lazy(), but earlier experience with this approach and feedback at the 2022 Linux Plumbers Conference shifted the approach to call_rcu() being lazy with call_rcu_hurry() for the few places where laziness is inappropriate.
Returning to the test failure, use of ftrace showed that this failure cause caused by the aadded delays due to this new lazy behavior of call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
Therefore, make dst_release() use call_rcu_hurry() in order to revert to the old test-failure-free behavior.
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: David Ahern <dsahern@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: <netdev@vger.kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
show more ...
|
Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46 |
|
#
d62607c3 |
| 07-Jun-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which ar
net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code.
Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace()
Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
9038c320 |
| 04-Dec-2021 |
Eric Dumazet <edumazet@google.com> |
net: dst: add net device refcount tracking to dst_entry
We want to track all dev_hold()/dev_put() to ease leak hunting.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicins
net: dst: add net device refcount tracking to dst_entry
We want to track all dev_hold()/dev_put() to ease leak hunting.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
1160dfa1 |
| 05-Aug-2021 |
Yajun Deng <yajun.deng@linux.dev> |
net: Remove redundant if statements
The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: D
net: Remove redundant if statements
The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23 |
|
#
a188bb56 |
| 09-Mar-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
I ran into a crash where setting up a ip6ip6 tunnel device which was /not/ set to collect_md mode was receiving collect_md populated skbs fo
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
I ran into a crash where setting up a ip6ip6 tunnel device which was /not/ set to collect_md mode was receiving collect_md populated skbs for xmit.
The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is assigning special metadata dst entry and then redirecting the skb to the device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit() and in the latter it performs a neigh lookup based on skb_dst(skb) where we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since the md_dst_ops do not populate neigh_lookup callback with a fake handler.
Transform the md_dst_ops into generic dst_blackhole_ops that can also be reused elsewhere when needed, and use them for the metadata dst entries as callback ops.
Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}() from dst_init() which free the skb the same way modulo the splat. Given we will be able to recover just fine from there, avoid any potential splats iff this gets ever triggered in future (or worse, panic on warns when set).
Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c4c877b2 |
| 09-Mar-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
net: Consolidate common blackhole dst ops
Move generic blackhole dst ops to the core and use them from both ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No functional change othe
net: Consolidate common blackhole dst ops
Move generic blackhole dst ops to the core and use them from both ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No functional change otherwise. We need these also in other locations and having to define them over and over again is not great.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c7eb3e12 |
| 09-Mar-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
[ Upstream commit a188bb5638d41aa99090ebf2f85d3505ab13fba5 ]
I ran into a crash where setting up a ip6ip6 tunnel device which was /not/ set
net, bpf: Fix ip6ip6 crash with collect_md populated skbs
[ Upstream commit a188bb5638d41aa99090ebf2f85d3505ab13fba5 ]
I ran into a crash where setting up a ip6ip6 tunnel device which was /not/ set to collect_md mode was receiving collect_md populated skbs for xmit.
The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is assigning special metadata dst entry and then redirecting the skb to the device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit() and in the latter it performs a neigh lookup based on skb_dst(skb) where we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since the md_dst_ops do not populate neigh_lookup callback with a fake handler.
Transform the md_dst_ops into generic dst_blackhole_ops that can also be reused elsewhere when needed, and use them for the metadata dst entries as callback ops.
Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}() from dst_init() which free the skb the same way modulo the splat. Given we will be able to recover just fine from there, avoid any potential splats iff this gets ever triggered in future (or worse, panic on warns when set).
Fixes: f38a9eb1f77b ("dst: Metadata destinations") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
0a245acb |
| 09-Mar-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
net: Consolidate common blackhole dst ops
[ Upstream commit c4c877b2732466b4c63217baad05c96f775912c7 ]
Move generic blackhole dst ops to the core and use them from both ipv4_dst_blackhole_ops and i
net: Consolidate common blackhole dst ops
[ Upstream commit c4c877b2732466b4c63217baad05c96f775912c7 ]
Move generic blackhole dst ops to the core and use them from both ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No functional change otherwise. We need these also in other locations and having to define them over and over again is not great.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9 |
|
#
1be107de |
| 10-Sep-2020 |
Miaohe Lin <linmiaohe@huawei.com> |
net: Correct the comment of dst_dev_put()
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of
net: Correct the comment of dst_dev_put()
Since commit 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries"), we use blackhole_netdev to invalidate dst entries instead of loopback device anymore.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40 |
|
#
cf86a086 |
| 07-May-2020 |
Eric Dumazet <edumazet@google.com> |
net/dst: use a smaller percpu_counter batch for dst entries accounting
percpu_counter_add() uses a default batch size which is quite big on platforms with 256 cpus. (2*256 -> 512)
This means dst_en
net/dst: use a smaller percpu_counter batch for dst entries accounting
percpu_counter_add() uses a default batch size which is quite big on platforms with 256 cpus. (2*256 -> 512)
This means dst_entries_get_fast() can be off by +/- 2*(nr_cpus^2) (131072 on servers with 256 cpus)
Reduce the batch size to something more reasonable, and add logic to ip6_dst_gc() to call dst_entries_get_slow() before calling the _very_ expensive fib6_run_gc() function.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2 |
|
#
adecda5b |
| 24-Sep-2019 |
Jason A. Donenfeld <Jason@zx2c4.com> |
net: print proper warning on dst underflow
Proper warnings with stack traces make it much easier to figure out what's doing the double free and create more meaningful bug reports from users.
Signed
net: print proper warning on dst underflow
Proper warnings with stack traces make it much easier to figure out what's doing the double free and create more meaningful bug reports from users.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16 |
|
#
8d7017fd |
| 01-Jul-2019 |
Mahesh Bandewar <maheshb@google.com> |
blackhole_netdev: use blackhole_netdev to invalidate dst entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead".
Signed-off-by: Mahesh Bandewar <maheshb@google.c
blackhole_netdev: use blackhole_netdev to invalidate dst entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead".
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4 |
|
#
457c8996 |
| 19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was use
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4 |
|
#
02afc7ad |
| 20-Mar-2019 |
Julian Wiedmann <jwi@linux.ibm.com> |
net: dst: remove gc leftovers
Get rid of some obsolete gc-related documentation and macros that were missed in commit 5b7c9a8ff828 ("net: remove dst gc related code").
CC: Wei Wang <weiwan@google.c
net: dst: remove gc leftovers
Get rid of some obsolete gc-related documentation and macros that were missed in commit 5b7c9a8ff828 ("net: remove dst gc related code").
CC: Wei Wang <weiwan@google.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16 |
|
#
22c2ad61 |
| 16-Jan-2019 |
Peter Oskolkov <posk@google.com> |
net: add a route cache full diagnostic message
In some testing scenarios, dst/route cache can fill up so quickly that even an explicit GC call occasionally fails to clean it up. This leads to sporad
net: add a route cache full diagnostic message
In some testing scenarios, dst/route cache can fill up so quickly that even an explicit GC call occasionally fails to clean it up. This leads to sporadically failing calls to dst_alloc and "network unreachable" errors to the user, which is confusing.
This patch adds a diagnostic message to make the cause of the failure easier to determine.
Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12 |
|
#
af308b94 |
| 02-Aug-2018 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: nf_tables: add tunnel support
This patch implements the tunnel object type that can be used to configure tunnels via metadata template through the existing lightweight API from the ingres
netfilter: nf_tables: add tunnel support
This patch implements the tunnel object type that can be used to configure tunnels via metadata template through the existing lightweight API from the ingress path.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17 |
|
#
d4ead6b3 |
| 17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: move metrics from dst to rt6_info
Similar to IPv4, add fib metrics to the fib struct, which at the moment is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics into dst by
net/ipv6: move metrics from dst to rt6_info
Similar to IPv4, add fib metrics to the fib struct, which at the moment is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics into dst by reference using refcount.
To make the transition: - add dst_metrics to rt6_info. Default to dst_default_metrics if no metrics are passed during route add. No need for a separate pmtu entry; it can reference the MTU slot in fib6_metrics
- ip6_convert_metrics allocates memory in the FIB entry and uses ip_metrics_convert to copy from netlink attribute to metrics entry
- the convert metrics call is done in ip6_route_info_create simplifying the route add path + fib6_commit_metrics and fib6_copy_metrics and the temporary mx6_config are no longer needed
- add fib6_metric_set helper to change the value of a metric in the fib entry since dst_metric_set can no longer be used
- cow_metrics for IPv6 can drop to dst_cow_metrics_generic
- rt6_dst_from_metrics_check is no longer needed
- rt6_fill_node needs the FIB entry and dst as separate arguments to keep compatibility with existing output. Current dst address is renamed to dest. (to be consistent with IPv4 rt6_fill_node really should be split into 2 functions similar to fib_dump_info and rt_fill_info)
- rt6_fill_node no longer needs the temporary metrics variable
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.16, v4.15 |
|
#
7149f813 |
| 28-Nov-2017 |
David Miller <davem@davemloft.net> |
net: Remove dst->next
There are no more users.
Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
|
#
0f6c480f |
| 28-Nov-2017 |
David Miller <davem@davemloft.net> |
xfrm: Move dst->path into struct xfrm_dst
The first member of an IPSEC route bundle chain sets it's dst->path to the underlying ipv4/ipv6 route that carries the bundle.
Stated another way, if one w
xfrm: Move dst->path into struct xfrm_dst
The first member of an IPSEC route bundle chain sets it's dst->path to the underlying ipv4/ipv6 route that carries the bundle.
Stated another way, if one were to follow the xfrm_dst->child chain of the bundle, the final non-NULL pointer would be the path and point to either an ipv4 or an ipv6 route.
This is largely used to make sure that PMTU events propagate down to the correct ipv4 or ipv6 route.
When we don't have the top of an IPSEC bundle 'dst->path == dst'.
Move it down into xfrm_dst and key off of dst->xfrm.
Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
show more ...
|
#
3a2232e9 |
| 28-Nov-2017 |
David Miller <davem@davemloft.net> |
ipv6: Move dst->from into struct rt6_info.
The dst->from value is only used by ipv6 routes to track where a route "came from".
Any time we clone or copy a core ipv6 route in the ipv6 routing tables
ipv6: Move dst->from into struct rt6_info.
The dst->from value is only used by ipv6 routes to track where a route "came from".
Any time we clone or copy a core ipv6 route in the ipv6 routing tables, we have the copy/clone's ->from point to the base route.
This is used to handle route expiration properly.
Only ipv6 uses this mechanism, and only ipv6 code references it. So it is safe to move it into rt6_info.
Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
show more ...
|