Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
80cd0487 |
| 27-Feb-2024 |
Florian Westphal <fw@strlen.de> |
netfilter: bridge: confirm multicast packets before passing them up the stack
[ Upstream commit 62e7151ae3eb465e0ab52a20c941ff33bb6332e9 ]
conntrack nf_confirm logic cannot handle cloned skbs refer
netfilter: bridge: confirm multicast packets before passing them up the stack
[ Upstream commit 62e7151ae3eb465e0ab52a20c941ff33bb6332e9 ]
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges.
Example: macvlan0 | br0 / \ ethX ethY
ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table.
1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices.
The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
The Macvlan broadcast worker and normal confirm path will race.
This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine.
But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting.
Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again.
This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook.
Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry.
The downside is that this disables NAT and conntrack helpers.
Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example:
-m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5
For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings.
Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
4914109a |
| 16-Jul-2023 |
Xin Long <lucien.xin@gmail.com> |
netfilter: allow exp not to be removed in nf_ct_find_expectation
Currently nf_conntrack_in() calling nf_ct_find_expectation() will remove the exp from the hash table. However, in some scenario, we e
netfilter: allow exp not to be removed in nf_ct_find_expectation
Currently nf_conntrack_in() calling nf_ct_find_expectation() will remove the exp from the hash table. However, in some scenario, we expect the exp not to be removed when the created ct will not be confirmed, like in OVS and TC conntrack in the following patches.
This patch allows exp not to be removed by setting IPS_CONFIRMED in the status of the tmpl.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
Revision tags: v6.1.38 |
|
#
eaf9e719 |
| 04-Jul-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: don't fold port numbers into addresses before hashing
Originally this used jhash2() over tuple and folded the zone id, the pernet hash value, destination port and l4 protocol n
netfilter: conntrack: don't fold port numbers into addresses before hashing
Originally this used jhash2() over tuple and folded the zone id, the pernet hash value, destination port and l4 protocol number into the 32bit seed value.
When the switch to siphash was done, I used an on-stack temporary buffer to build a suitable key to be hashed via siphash().
But this showed up as performance regression, so I got rid of the temporary copy and collected to-be-hashed data in 4 u64 variables.
This makes it easy to build tuples that produce the same hash, which isn't desirable even though chain lengths are limited.
Switch back to plain siphash, but just like with jhash2(), take advantage of the fact that most of to-be-hashed data is already in a suitable order.
Use an empty struct as annotation in 'struct nf_conntrack_tuple' to mark last member that can be used as hash input.
The only remaining data that isn't present in the tuple structure are the zone identifier and the pernet hash: fold those into the key.
Fixes: d2c806abcf0b ("netfilter: conntrack: use siphash_4u64") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31 |
|
#
e1f543dc |
| 25-May-2023 |
Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com> |
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.
Observed when TCP port 1720 (Q931_PORT), associated wi
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.
Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack helper, is DNAT'ed to another destination port (e.g. 1730), while nfqueue is being used for final acceptance (e.g. snort).
This happenned after transition from kernel 4.14 to 5.10.161.
Workarounds: * keep the same port (1720) in DNAT * disable nfqueue * disable/unload h323 NAT helper
$ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log BUG: kernel NULL pointer dereference, address: 0000000000000084 [..] RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack [..] nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink [..]
Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again") Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
2cdaa3ee |
| 18-Apr-2023 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()
e6d57e9ff0ae ("netfilter: conntrack: fix rmmod double-free race") consolidates IPS_CONFIRMED bit set in nf_conntra
netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()
e6d57e9ff0ae ("netfilter: conntrack: fix rmmod double-free race") consolidates IPS_CONFIRMED bit set in nf_conntrack_hash_check_insert(). However, this breaks ctnetlink:
# conntrack -I -p tcp --timeout 123 --src 1.2.3.4 --dst 5.6.7.8 --state ESTABLISHED --sport 1 --dport 4 -u SEEN_REPLY conntrack v1.4.6 (conntrack-tools): Operation failed: Device or resource busy
This is a partial revert of the aforementioned commit to restore IPS_CONFIRMED.
Fixes: e6d57e9ff0ae ("netfilter: conntrack: fix rmmod double-free race") Reported-by: Stéphane Graber <stgraber@stgraber.org> Tested-by: Stéphane Graber <stgraber@stgraber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16 |
|
#
e5d015a1 |
| 07-Mar-2023 |
Jeremy Sowden <jeremy@azazel.net> |
netfilter: conntrack: fix typo
There's a spelling mistake in a comment. Fix it.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
|
#
c77737b7 |
| 06-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
netfilter: conntrack: adopt safer max chain length
Customers using GKE 1.25 and 1.26 are facing conntrack issues root caused to commit c9c3b6811f74 ("netfilter: conntrack: make max chain length rand
netfilter: conntrack: adopt safer max chain length
Customers using GKE 1.25 and 1.26 are facing conntrack issues root caused to commit c9c3b6811f74 ("netfilter: conntrack: make max chain length random").
Even if we assume Uniform Hashing, a bucket often reachs 8 chained items while the load factor of the hash table is smaller than 0.5
With a limit of 16, we reach load factors of 3. With a limit of 32, we reach load factors of 11. With a limit of 40, we reach load factors of 15. With a limit of 50, we reach load factors of 24.
This patch changes MIN_CHAINLEN to 50, to minimize risks.
Ideally, we could in the future add a cushion based on expected load factor (2 * nf_conntrack_max / nf_conntrack_buckets), because some setups might expect unusual values.
Fixes: c9c3b6811f74 ("netfilter: conntrack: make max chain length random") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12 |
|
#
e6d57e9f |
| 14-Feb-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: fix rmmod double-free race
nf_conntrack_hash_check_insert() callers free the ct entry directly, via nf_conntrack_free.
This isn't safe anymore because nf_conntrack_hash_check_
netfilter: conntrack: fix rmmod double-free race
nf_conntrack_hash_check_insert() callers free the ct entry directly, via nf_conntrack_free.
This isn't safe anymore because nf_conntrack_hash_check_insert() might place the entry into the conntrack table and then delteted the entry again because it found that a conntrack extension has been removed at the same time.
In this case, the just-added entry is removed again and an error is returned to the caller.
Problem is that another cpu might have picked up this entry and incremented its reference count.
This results in a use-after-free/double-free, once by the other cpu and once by the caller of nf_conntrack_hash_check_insert().
Fix this by making nf_conntrack_hash_check_insert() not fail anymore after the insertion, just like before the 'Fixes' commit.
This is safe because a racing nf_ct_iterate() has to wait for us to release the conntrack hash spinlocks.
While at it, make the function return -EAGAIN in the rmmod (genid changed) case, this makes nfnetlink replay the command (suggested by Pablo Neira).
Fixes: c56716c69ce1 ("netfilter: extensions: introduce extension genid count") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.1.11, v6.1.10 |
|
#
2954fe60 |
| 01-Feb-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: let reset rules clean out conntrack entries
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting
netfilter: let reset rules clean out conntrack entries
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule.
If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days).
One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too.
Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well.
Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp.
If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table.
Reported-by: Russel King <linux@armlinux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
#
df25455e |
| 01-Feb-2023 |
Vlad Buslov <vladbu@nvidia.com> |
netfilter: nf_conntrack: allow early drop of offloaded UDP conns
Both synchronous early drop algorithm and asynchronous gc worker completely ignore connections with IPS_OFFLOAD_BIT status bit set. W
netfilter: nf_conntrack: allow early drop of offloaded UDP conns
Both synchronous early drop algorithm and asynchronous gc worker completely ignore connections with IPS_OFFLOAD_BIT status bit set. With new functionality that enabled UDP NEW connection offload in action CT malicious user can flood the conntrack table with offloaded UDP connections by just sending a single packet per 5tuple because such connections can no longer be deleted by early drop algorithm.
To mitigate the issue allow both early drop and gc to consider offloaded UDP connections for deletion.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
2a2fa2ef |
| 15-Dec-2022 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
Move rcu_read_lock/unlock to nf_conntrack_find_get(), this avoids nested rcu_read_lock call from resolve_normal_ct().
Signed-off-by
netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
Move rcu_read_lock/unlock to nf_conntrack_find_get(), this avoids nested rcu_read_lock call from resolve_normal_ct().
Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
#
4883ec51 |
| 02-Jan-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: avoid reload of ct->status
Compiler can't merge the two test_bit() calls, so load ct->status once and use non-atomic accesses.
This is fine because IPS_EXPECTED or NAT_CLASH a
netfilter: conntrack: avoid reload of ct->status
Compiler can't merge the two test_bit() calls, so load ct->status once and use non-atomic accesses.
This is fine because IPS_EXPECTED or NAT_CLASH are either set at ct creation time or not at all, but compiler can't know that.
Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
#
50bfbb89 |
| 02-Jan-2023 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: remove pr_debug calls
Those are all useless or dubious. getorigdst() is called via setsockopt, so return value/errno will already indicate an appropriate error.
For other pr_d
netfilter: conntrack: remove pr_debug calls
Those are all useless or dubious. getorigdst() is called via setsockopt, so return value/errno will already indicate an appropriate error.
For other pr_debug calls there are better replacements, such as slab/slub debugging or 'conntrack -E' (ctnetlink events).
Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
Revision tags: v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80 |
|
#
9464d0b6 |
| 24-Nov-2022 |
Xin Long <lucien.xin@gmail.com> |
netfilter: conntrack: fix using __this_cpu_add in preemptible
Currently in nf_conntrack_hash_check_insert(), when it fails in nf_ct_ext_valid_pre/post(), NF_CT_STAT_INC() will be called in the preem
netfilter: conntrack: fix using __this_cpu_add in preemptible
Currently in nf_conntrack_hash_check_insert(), when it fails in nf_ct_ext_valid_pre/post(), NF_CT_STAT_INC() will be called in the preemptible context, a call trace can be triggered:
BUG: using __this_cpu_add() in preemptible [00000000] code: conntrack/1636 caller is nf_conntrack_hash_check_insert+0x45/0x430 [nf_conntrack] Call Trace: <TASK> dump_stack_lvl+0x33/0x46 check_preemption_disabled+0xc3/0xf0 nf_conntrack_hash_check_insert+0x45/0x430 [nf_conntrack] ctnetlink_create_conntrack+0x3cd/0x4e0 [nf_conntrack_netlink] ctnetlink_new_conntrack+0x1c0/0x450 [nf_conntrack_netlink] nfnetlink_rcv_msg+0x277/0x2f0 [nfnetlink] netlink_rcv_skb+0x50/0x100 nfnetlink_rcv+0x65/0x144 [nfnetlink] netlink_unicast+0x1ae/0x290 netlink_sendmsg+0x257/0x4f0 sock_sendmsg+0x5f/0x70
This patch is to fix it by changing to use NF_CT_STAT_INC_ATOMIC() for nf_ct_ext_valid_pre/post() check in nf_conntrack_hash_check_insert(), as well as nf_ct_ext_valid_post() in __nf_conntrack_confirm().
Note that nf_ct_ext_valid_pre() check in __nf_conntrack_confirm() is safe to use NF_CT_STAT_INC(), as it's under local_bh_disable().
Fixes: c56716c69ce1 ("netfilter: extensions: introduce extension genid count") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78 |
|
#
52d1aa8b |
| 09-Nov-2022 |
Daniel Xu <dxu@dxuuu.xyz> |
netfilter: conntrack: Fix data-races around ct mark
nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizat
netfilter: conntrack: Fix data-races around ct mark
nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1 |
|
#
8032bf12 |
| 09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by:
@@ expression E; @@ - prandom_u32_max + get_random_u32_below (E)
Reviewed-
treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by:
@@ expression E; @@ - prandom_u32_max + get_random_u32_below (E)
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
show more ...
|
#
d2c806ab |
| 02-Nov-2022 |
Florian Westphal <fw@strlen.de> |
netfilter: conntrack: use siphash_4u64
This function is used for every packet, siphash_4u64 is noticeably faster than using local buffer + siphash:
Before: 1.23% kpktgend_0 [kernel.vmlinux
netfilter: conntrack: use siphash_4u64
This function is used for every packet, siphash_4u64 is noticeably faster than using local buffer + siphash:
Before: 1.23% kpktgend_0 [kernel.vmlinux] [k] __siphash_unaligned 0.14% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw After: 0.79% kpktgend_0 [kernel.vmlinux] [k] siphash_4u64 0.15% kpktgend_0 [nf_conntrack] [k] hash_conntrack_raw
In the pktgen test this gives about ~2.4% performance improvement.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69 |
|
#
2aa19275 |
| 16-Sep-2022 |
Antoine Tenart <atenart@kernel.org> |
netfilter: conntrack: revisit the gc initial rescheduling bias
The previous commit changed the way the rescheduling delay is computed which has a side effect: the bias is now represented as much as
netfilter: conntrack: revisit the gc initial rescheduling bias
The previous commit changed the way the rescheduling delay is computed which has a side effect: the bias is now represented as much as the other entries in the rescheduling delay which makes the logic to kick in only with very large sets, as the initial interval is very large (INT_MAX).
Revisit the GC initial bias to allow more frequent GC for smaller sets while still avoiding wakeups when a machine is mostly idle. We're moving from a large initial value to pretending we have 100 entries expiring at the upper bound. This way only a few entries having a small timeout won't impact much the rescheduling delay and non-idle machines will have enough entries to lower the delay when needed. This also improves readability as the initial bias is now linked to what is computed instead of being an arbitrary large value.
Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
#
95eabdd2 |
| 16-Sep-2022 |
Antoine Tenart <atenart@kernel.org> |
netfilter: conntrack: fix the gc rescheduling delay
Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") changed the eviction rescheduling to the use average expiry of scanned entries
netfilter: conntrack: fix the gc rescheduling delay
Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") changed the eviction rescheduling to the use average expiry of scanned entries (within 1-60s) by doing:
for (...) { expires = clamp(nf_ct_expires(tmp), ...); next_run += expires; next_run /= 2; }
The issue is the above will make the average ('next_run' here) more dependent on the last expiration values than the firsts (for sets > 2). Depending on the expiration values used to compute the average, the result can be quite different than what's expected. To fix this we can do the following:
for (...) { expires = clamp(nf_ct_expires(tmp), ...); next_run += (expires - next_run) / ++count; }
Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning") Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
show more ...
|
Revision tags: v5.15.68, v5.15.67, v5.15.66 |
|
#
864b656f |
| 07-Sep-2022 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf: Add support for writing to nf_conn:mark
Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also pa
bpf: Add support for writing to nf_conn:mark
Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also particularly useful for applications that run both bpf and iptables/nftables because the latter can trivially access this metadata.
One example use case would be if a bpf prog is responsible for advanced packet classification and iptables/nftables is later used for routing due to pre-existing/legacy code.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v5.15.65, v5.15.64 |
|
#
b1185090 |
| 26-Aug-2022 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: remove nf_conntrack_helper sysctl and modparam toggles
__nf_ct_try_assign_helper() remains in place but it now requires a template to configure the helper.
A toggle to disable automatic
netfilter: remove nf_conntrack_helper sysctl and modparam toggles
__nf_ct_try_assign_helper() remains in place but it now requires a template to configure the helper.
A toggle to disable automatic helper assignment was added by:
a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment")
in 2012 to address the issues described in "Secure use of iptables and connection tracking helpers". Automatic conntrack helper assignment was disabled by:
3bb398d925ec ("netfilter: nf_ct_helper: disable automatic helper assignment")
back in 2016.
This patch removes the sysctl and modparam toggles, users now have to rely on explicit conntrack helper configuration via ruleset.
Update tools/testing/selftests/netfilter/nft_conntrack_helper.sh to check that auto-assignment does not happen anymore.
Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
6e116280 |
| 25-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink
The current ifdefry for code shared by the BPF and ctnetlink side looks ugly. As per Pablo's request, simplify this by unconditiona
net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink
The current ifdefry for code shared by the BPF and ctnetlink side looks ugly. As per Pablo's request, simplify this by unconditionally compiling in the code. This can be revisited when the shared code between the two grows further.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v5.15.57, v5.15.56 |
|
#
ef69aa3a |
| 21-Jul-2022 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net: netfilter: Add kfuncs to set and change CT status
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in order to set nf_conn field of allocated entry or update nf_conn status fi
net: netfilter: Add kfuncs to set and change CT status
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in order to set nf_conn field of allocated entry or update nf_conn status field of existing inserted entry. Use nf_ct_change_status_common to share the permitted status field changes between netlink and BPF side by refactoring ctnetlink_change_status.
It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master.
Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
0b389236 |
| 21-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
net: netfilter: Add kfuncs to set and change CT timeout
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in order to change nf_conn timeout. This is same as ctnetlink_change_time
net: netfilter: Add kfuncs to set and change CT timeout
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in order to change nf_conn timeout. This is same as ctnetlink_change_timeout, hence code is shared between both by extracting it out to __nf_ct_change_timeout. It is also updated to return an error when it sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.
It is required to introduce two kfuncs taking nf_conn___init and nf_conn instead of sharing one because KF_TRUSTED_ARGS flag causes strict type checking. This would disallow passing nf_conn___init to kfunc taking nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we only want to accept refcounted pointers and not e.g. ct->master.
Apart from this, bpf_ct_set_timeout is only called for newly allocated CT so it doesn't need to inspect the status field just yet. Sharing the helpers even if it was possible would make timeout setting helper sensitive to order of setting status and timeout after allocation.
Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and bpf_ct_change_* kfuncs are meant to be used on inserted or looked up CT entry.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42 |
|
#
6be79156 |
| 24-May-2022 |
Jackie Liu <liuyun01@kylinos.cn> |
netfilter: conntrack: use fallthrough to cleanup
These cases all use the same function. we can simplify the code through fallthrough.
$ size net/netfilter/nf_conntrack_core.o
text data
netfilter: conntrack: use fallthrough to cleanup
These cases all use the same function. we can simplify the code through fallthrough.
$ size net/netfilter/nf_conntrack_core.o
text data bss dec hex filename before 81601 81430 768 163799 27fd7 net/netfilter/nf_conntrack_core.o after 80361 81430 768 162559 27aff net/netfilter/nf_conntrack_core.o
Arch: aarch64 Gcc : gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|