Revision tags: v5.15.45 |
|
#
3e0b8f52 |
| 30-May-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na
RFC 9131 changes default behaviour of handling RX of NA messages when the corresponding entry is absent in the neighbour cach
net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na
RFC 9131 changes default behaviour of handling RX of NA messages when the corresponding entry is absent in the neighbour cache. The current implementation is limited to accept just unsolicited NAs. However, the RFC is more generic where it also accepts solicited NAs. Both types should result in adding a STALE entry for this case.
Expand accept_untracked_na behaviour to also accept solicited NAs to be compliant with the RFC and rename the sysctl knob to accept_untracked_na.
Fixes: f9a2fb73318e ("net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131") Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220530101414.65439-1-aajith@arista.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
Revision tags: v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39 |
|
#
d4150779 |
| 11-May-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random32: use real rng for non-deterministic randomness
random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that doe
random32: use real rng for non-deterministic randomness
random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator.
The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly.
Stopmakingbespokerandomnumbergenerators.
Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing *real* benchmarks of the net functions that actually use these random numbers, the mean cycles actually *decreased* slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby.
The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds.
There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road.
Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
show more ...
|
Revision tags: v5.15.38 |
|
#
425b9c7f |
| 02-May-2022 |
Vasily Averin <vvs@openvz.org> |
memcg: accounting for objects allocated for new netdevice
Creating a new netdevice allocates at least ~50Kb of memory for various kernel objects, but only ~5Kb of them are accounted to memcg. As a r
memcg: accounting for objects allocated for new netdevice
Creating a new netdevice allocates at least ~50Kb of memory for various kernel objects, but only ~5Kb of them are accounted to memcg. As a result, creating an unlimited number of netdevice inside a memcg-limited container does not fall within memcg restrictions, consumes a significant part of the host's memory, can cause global OOM and lead to random kills of host processes.
The main consumers of non-accounted memory are: ~10Kb 80+ kernfs nodes ~6Kb ipv6_add_dev() allocations 6Kb __register_sysctl_table() allocations 4Kb neigh_sysctl_register() allocations 4Kb __devinet_sysctl_register() allocations 4Kb __addrconf_sysctl_register() allocations
Accounting of these objects allows to increase the share of memcg-related memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice on typical VM with default Fedora 35 kernel) and this should be enough to somehow protect the host from misuse inside container.
Other related objects are quite small and may not be taken into account to minimize the expected performance degradation.
It should be separately mentonied ~300 bytes of percpu allocation of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes it can become the main consumer of memory.
This patch does not enables kernfs accounting as it affects other parts of the kernel and should be discussed separately. However, even without kernfs, this patch significantly improves the current situation and allows to take into account more than half of all netdevice allocations.
Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.37 |
|
#
b52e1cce |
| 29-Apr-2022 |
jianghaoran <jianghaoran@kylinos.cn> |
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
ARPHRD_TUNNEL interface can't process rs packets and will generate TX errors
ex: ip tunnel add ethn mode ipip local 192.168.1.1 remote
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL
ARPHRD_TUNNEL interface can't process rs packets and will generate TX errors
ex: ip tunnel add ethn mode ipip local 192.168.1.1 remote 192.168.1.2 ifconfig ethn x.x.x.x
ethn: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1480 inet x.x.x.x netmask 255.255.255.255 destination x.x.x.x inet6 fe80::5efe:ac1e:3cdb prefixlen 64 scopeid 0x20<link> tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 3 dropped 0 overruns 0 carrier 0 collisions 0
Signed-off-by: jianghaoran <jianghaoran@kylinos.cn> Link: https://lore.kernel.org/r/20220429053802.246681-1-jianghaoran@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.36, v5.15.35 |
|
#
d09d3ec0 |
| 19-Apr-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Enforce limits for accept_unsolicited_na sysctl
Fix mistake in the original patch where limits were specified but the handler didn't take care of the limits.
Signed-off-by: Arun Ajith S <
net/ipv6: Enforce limits for accept_unsolicited_na sysctl
Fix mistake in the original patch where limits were specified but the handler didn't take care of the limits.
Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f9a2fb73 |
| 15-Apr-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131
Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighb
net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131
Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighbour advertisement with target link-layer-address option specified. This is similar to the arp_accept configuration for IPv4. A new sysctl endpoint is created to turn on this behaviour: /proc/sys/net/ipv6/conf/interface/accept_unsolicited_na.
Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.34, v5.15.33 |
|
#
51454ea4 |
| 03-Apr-2022 |
Niels Dossche <dossche.niels@gmail.com> |
ipv6: fix locking issues with loops over idev->addr_list
idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on in
ipv6: fix locking issues with loops over idev->addr_list
idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock.
This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop.
Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26 |
|
#
9995b408 |
| 24-Feb-2022 |
j.nixdorf@avm.de <j.nixdorf@avm.de> |
net: ipv6: ensure we call ipv6_mc_down() at most once
There are two reasons for addrconf_notify() to be called with NETDEV_DOWN: either the network device is actually going down, or IPv6 was disable
net: ipv6: ensure we call ipv6_mc_down() at most once
There are two reasons for addrconf_notify() to be called with NETDEV_DOWN: either the network device is actually going down, or IPv6 was disabled on the interface.
If either of them stays down while the other is toggled, we repeatedly call the code for NETDEV_DOWN, including ipv6_mc_down(), while never calling the corresponding ipv6_mc_up() in between. This will cause a new entry in idev->mc_tomb to be allocated for each multicast group the interface is subscribed to, which in turn leaks one struct ifmcaddr6 per nontrivial multicast group the interface is subscribed to.
The following reproducer will leak at least $n objects:
ip addr add ff2e::4242/32 dev eth0 autojoin sysctl -w net.ipv6.conf.eth0.disable_ipv6=1 for i in $(seq 1 $n); do ip link set up eth0; ip link set down eth0 done
Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2) can also be used to create a nontrivial idev->mc_list, which will the leak objects with the right up-down-sequence.
Based on both sources for NETDEV_DOWN events the interface IPv6 state should be considered:
- not ready if the network interface is not ready OR IPv6 is disabled for it - ready if the network interface is ready AND IPv6 is enabled for it
The functions ipv6_mc_up() and ipv6_down() should only be run when this state changes.
Implement this by remembering when the IPv6 state is ready, and only run ipv6_mc_down() if it actually changed from ready to not ready.
The other direction (not ready -> ready) already works correctly, as:
- the interface notification triggered codepath for NETDEV_UP / NETDEV_CHANGE returns early if ipv6 is disabled, and - the disable_ipv6=0 triggered codepath skips fully initializing the interface as long as addrconf_link_ready(dev) returns false - calling ipv6_mc_up() repeatedly does not leak anything
Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled") Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6c0d8833 |
| 23-Feb-2022 |
Niels Dossche <dossche.niels@gmail.com> |
ipv6: prevent a possible race condition with lifetimes
valid_lft, prefered_lft and tstamp are always accessed under the lock "lock" in other places. Reading these without taking the lock may result
ipv6: prevent a possible race condition with lifetimes
valid_lft, prefered_lft and tstamp are always accessed under the lock "lock" in other places. Reading these without taking the lock may result in inconsistencies regarding the calculation of the valid and preferred variables since decisions are taken on these fields for those variables.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Niels Dossche <niels.dossche@ugent.be> Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.25 |
|
#
47f0bd50 |
| 17-Feb-2022 |
Jacques de Laval <Jacques.De.Laval@westermo.com> |
net: Add new protocol attribute to IP addresses
This patch adds a new protocol attribute to IPv4 and IPv6 addresses. Inspiration was taken from the protocol attribute of routes. User space applicati
net: Add new protocol attribute to IP addresses
This patch adds a new protocol attribute to IPv4 and IPv6 addresses. Inspiration was taken from the protocol attribute of routes. User space applications like iproute2 can set/get the protocol with the Netlink API.
The attribute is stored as an 8-bit unsigned integer.
The protocol attribute is set by kernel for these categories:
- IPv4 and IPv6 loopback addresses - IPv6 addresses generated from router announcements - IPv6 link local addresses
User space may pass custom protocols, not defined by the kernel.
Grouping addresses on their origin is useful in scenarios where you want to distinguish between addresses based on who added them, e.g. kernel vs. user space.
Tagging addresses with a string label is an existing feature that could be used as a solution. Unfortunately the max length of a label is 15 characters, and for compatibility reasons the label must be prefixed with the name of the device followed by a colon. Since device names also have a max length of 15 characters, only -1 characters is guaranteed to be available for any origin tag, which is not that much.
A reference implementation of user space setting and getting protocols is available for iproute2:
https://github.com/westermo/iproute2/commit/9a6ea18bd79f47f293e5edc7780f315ea42ff540
Signed-off-by: Jacques de Laval <Jacques.De.Laval@westermo.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220217150202.80802-1-Jacques.De.Laval@westermo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
be6b41c1 |
| 16-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: ensure addrconf_verify_rtnl() has completed
Before freeing the hash table in addrconf_exit_net(), we need to make sure the work queue has completed, or risk NULL dereference or UAF.
ipv6/addrconf: ensure addrconf_verify_rtnl() has completed
Before freeing the hash table in addrconf_exit_net(), we need to make sure the work queue has completed, or risk NULL dereference or UAF.
Thus, use cancel_delayed_work_sync() to enforce this. We do not hold RTNL in addrconf_exit_net(), making this safe.
Fixes: 8805d13ff1b2 ("ipv6/addrconf: use one delayed work per netns") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.24 |
|
#
dd263a8c |
| 13-Feb-2022 |
Ido Schimmel <idosch@nvidia.com> |
ipv6: blackhole_netdev needs snmp6 counters
Whenever rt6_uncached_list_flush_dev() swaps rt->rt6_idev to the blackhole device, parts of IPv6 stack might still need to increment one SNMP counter.
Ro
ipv6: blackhole_netdev needs snmp6 counters
Whenever rt6_uncached_list_flush_dev() swaps rt->rt6_idev to the blackhole device, parts of IPv6 stack might still need to increment one SNMP counter.
Root cause, patch from Ido, changelog from Eric :)
This bug suggests that we need to audit rt->rt6_idev usages and make sure they are properly using RCU protection.
Fixes: e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
26394fc1 |
| 11-Feb-2022 |
Ignat Korchagin <ignat@cloudflare.com> |
ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
Some time ago 8965779d2c0e ("ipv6,mcast: always hold idev->lock before mca_lock") switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is r
ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
Some time ago 8965779d2c0e ("ipv6,mcast: always hold idev->lock before mca_lock") switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe version. That was OK, because idev->lock was held for these codepaths.
In 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") these external locks were removed, so we probably need to restore the original rcu-safe call.
Otherwise, we occasionally get a machine crashed/stalled with the following in dmesg:
[ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G O 5.15.19-cloudflare-2022.2.1 #1 [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV [ 3406.009552][T230589] Workqueue: mld mld_ifc_work [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60 [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202 [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040 [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008 [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000 [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100 [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000 [ 3406.125730][T230589] FS: 0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000 [ 3406.138992][T230589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0 [ 3406.162421][T230589] Call Trace: [ 3406.170235][T230589] <TASK> [ 3406.177736][T230589] mld_newpack+0xfe/0x1a0 [ 3406.186686][T230589] add_grhead+0x87/0xa0 [ 3406.195498][T230589] add_grec+0x485/0x4e0 [ 3406.204310][T230589] ? newidle_balance+0x126/0x3f0 [ 3406.214024][T230589] mld_ifc_work+0x15d/0x450 [ 3406.223279][T230589] process_one_work+0x1e6/0x380 [ 3406.232982][T230589] worker_thread+0x50/0x3a0 [ 3406.242371][T230589] ? rescuer_thread+0x360/0x360 [ 3406.252175][T230589] kthread+0x127/0x150 [ 3406.261197][T230589] ? set_kthread_struct+0x40/0x40 [ 3406.271287][T230589] ret_from_fork+0x22/0x30 [ 3406.280812][T230589] </TASK> [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders] [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---
Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: David Pinilla Caparros <dpini@cloudflare.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.23 |
|
#
e5f80fcf |
| 10-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: give an IPv6 dev to blackhole_netdev
IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion.
This caused many limitations and work arounds.
ipv6: give an IPv6 dev to blackhole_netdev
IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion.
This caused many limitations and work arounds.
Back in linux-5.3, Mahesh added a per host blackhole_netdev that can be used whenever we need to make sure objects no longer refer to a disappearing device.
If we attach to blackhole_netdev an ip6_ptr (allocate an idev), then we can use this special device (which is never freed) in place of the loopback_dev (which can be freed).
This will permit improvements in netdev_run_todo() and other parts of the stack where had steps to make sure loopback_dev was the last device to disappear.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.22 |
|
#
e66d1172 |
| 07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: switch to per netns inet6_addr_lst hash table
IPv6 does not scale very well with the number of IPv6 addresses. It uses a global (shared by all netns) hash table with 256 buckets.
Som
ipv6/addrconf: switch to per netns inet6_addr_lst hash table
IPv6 does not scale very well with the number of IPv6 addresses. It uses a global (shared by all netns) hash table with 256 buckets.
Some functions like addrconf_verify_rtnl() and addrconf_ifdown() have to iterate all addresses in the hash table.
I have seen addrconf_verify_rtnl() holding the cpu for 10ms or more.
Switch to the per netns hashtable (and spinlock) added in prior patches.
This considerably speeds up netns dismantle times on hosts with thousands of netns. This also has an impact on regular (fast path) IPv6 processing.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8805d13f |
| 07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: use one delayed work per netns
Next step for using per netns inet6_addr_lst is to have per netns work item to ultimately call addrconf_verify_rtnl() and addrconf_verify() with a new '
ipv6/addrconf: use one delayed work per netns
Next step for using per netns inet6_addr_lst is to have per netns work item to ultimately call addrconf_verify_rtnl() and addrconf_verify() with a new 'struct net*' argument.
Everything is still using the global inet6_addr_lst[] table.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
21a216a8 |
| 07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: allocate a per netns hash table
Add a per netns hash table and a dedicated spinlock, first step to get rid of the global inet6_addr_lst[] one.
Signed-off-by: Eric Dumazet <edumazet@g
ipv6/addrconf: allocate a per netns hash table
Add a per netns hash table and a dedicated spinlock, first step to get rid of the global inet6_addr_lst[] one.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7d9b1b57 |
| 06-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ip6mr: fix use-after-free in ip6mr_sk_done()
Apparently addrconf_exit_net() is called before igmp6_net_exit() and ndisc_net_exit() at netns dismantle time:
net_namespace: call ip6table_mangle_net_
ip6mr: fix use-after-free in ip6mr_sk_done()
Apparently addrconf_exit_net() is called before igmp6_net_exit() and ndisc_net_exit() at netns dismantle time:
net_namespace: call ip6table_mangle_net_exit() net_namespace: call ip6_tables_net_exit() net_namespace: call ipv6_sysctl_net_exit() net_namespace: call ioam6_net_exit() net_namespace: call seg6_net_exit() net_namespace: call ping_v6_proc_exit_net() net_namespace: call tcpv6_net_exit() ip6mr_sk_done sk=ffffa354c78a74c0 net_namespace: call ipv6_frags_exit_net() net_namespace: call addrconf_exit_net() net_namespace: call ip6addrlbl_net_exit() net_namespace: call ip6_flowlabel_net_exit() net_namespace: call ip6_route_net_exit_late() net_namespace: call fib6_rules_net_exit() net_namespace: call xfrm6_net_exit() net_namespace: call fib6_net_exit() net_namespace: call ip6_route_net_exit() net_namespace: call ipv6_inetpeer_exit() net_namespace: call if6_proc_net_exit() net_namespace: call ipv6_proc_exit_net() net_namespace: call udplite6_proc_exit_net() net_namespace: call raw6_exit_net() net_namespace: call igmp6_net_exit() ip6mr_sk_done sk=ffffa35472b2a180 ip6mr_sk_done sk=ffffa354c78a7980 net_namespace: call ndisc_net_exit() ip6mr_sk_done sk=ffffa35472b2ab00 net_namespace: call ip6mr_net_exit() net_namespace: call inet6_net_exit()
This was fine because ip6mr_sk_done() would not reach the point decreasing net->ipv6.devconf_all->mc_forwarding until my patch in ip6mr_sk_done().
To fix this without changing struct pernet_operations ordering, we can clear net->ipv6.devconf_dflt and net->ipv6.devconf_all when they are freed from addrconf_exit_net()
BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 Read of size 4 at addr ffff88801ff08688 by task kworker/u4:4/963
CPU: 0 PID: 963 Comm: kworker/u4:4 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 rawv6_close+0x58/0x80 net/ipv6/raw.c:1201 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release net/socket.c:650 [inline] sock_release+0x87/0x1b0 net/socket.c:678 inet_ctl_sock_destroy include/net/inet_common.h:65 [inline] igmp6_net_exit+0x6b/0x170 net/ipv6/mcast.c:3173 ops_exit_list+0xb0/0x170 net/core/net_namespace.c:168 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:600 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK>
Fixes: f2f2325ec799 ("ip6mr: ip6mr_sk_done() can exit early in common cases") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.21, v5.15.20 |
|
#
145c7a79 |
| 04-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: make mc_forwarding atomic
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David
ipv6: make mc_forwarding atomic
This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.15.19, v5.15.18, v5.15.17 |
|
#
36268983 |
| 26-Jan-2022 |
Guillaume Nault <gnault@redhat.com> |
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d.
This commit breaks Linux compatibility with USGv6 tests. The RFC this commit wa
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d.
This commit breaks Linux compatibility with USGv6 tests. The RFC this commit was based on is actually an expired draft: no published RFC currently allows the new behaviour it introduced.
Without full IETF endorsement, the flash renumbering scenario this patch was supposed to enable is never going to work, as other IPv6 equipements on the same LAN will keep the 2 hours limit.
Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
8c727003 |
| 04-Dec-2021 |
Eric Dumazet <edumazet@google.com> |
ipv6: add net device refcount tracker to struct inet6_dev
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1 |
|
#
18ac597a |
| 01-Nov-2021 |
James Prestwood <prestwoj@gmail.com> |
net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for w
net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed.
This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery.
A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams.
Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15 |
|
#
61e18ce7 |
| 20-Oct-2021 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
When addr_gen_mode is set to IN6_ADDR_GEN_MODE_NONE, the link-local addr should not be generated. But it isn't the
gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
When addr_gen_mode is set to IN6_ADDR_GEN_MODE_NONE, the link-local addr should not be generated. But it isn't the case for GRE (as well as GRE6) and SIT tunnels. Make it so that tunnels consider the addr_gen_mode, especially for IN6_ADDR_GEN_MODE_NONE.
Do this in add_v4_addrs() to cover both GRE and SIT only if the addr scope is link.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: Antonio Quartulli <a@unstable.cc> Link: https://lore.kernel.org/r/20211020200618.467342-1-ssuryaextr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.14.14, v5.14.13, v5.14.12 |
|
#
1a8a23d2 |
| 12-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
ipv6: constify dev_addr passing
In preparation for netdev->dev_addr being constant make all relevant arguments in ndisc constant.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b340f83d |
| 23-Aug-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
net: Fix data-races around sysctl_devconf_inherit_init_net.
[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]
While reading sysctl_devconf_inherit_init_net, it can be changed concurrentl
net: Fix data-races around sysctl_devconf_inherit_init_net.
[ Upstream commit a5612ca10d1aa05624ebe72633e0c8c792970833 ]
While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|