Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46 |
|
#
0db00e5d |
| 11-Aug-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.45' into for/openbmc/dev-6.6
This is the 6.6.45 stable release
|
Revision tags: v6.6.45, v6.6.44, v6.6.43 |
|
#
d7cc186d |
| 25-Jul-2024 |
Eric Dumazet <edumazet@google.com> |
sched: act_ct: take care of padding in struct zones_ht_key
[ Upstream commit 2191a54f63225b548fd8346be3611c3219a24738 ]
Blamed commit increased lookup key size from 2 bytes to 16 bytes, because zon
sched: act_ct: take care of padding in struct zones_ht_key
[ Upstream commit 2191a54f63225b548fd8346be3611c3219a24738 ]
Blamed commit increased lookup key size from 2 bytes to 16 bytes, because zones_ht_key got a struct net pointer.
Make sure rhashtable_lookup() is not using the padding bytes which are not initialized.
BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline] BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 rht_ptr_rcu include/linux/rhashtable.h:376 [inline] __rhashtable_lookup include/linux/rhashtable.h:607 [inline] rhashtable_lookup include/linux/rhashtable.h:646 [inline] rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425 tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488 tcf_action_add net/sched/act_api.c:2061 [inline] tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118 rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2597 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651 __sys_sendmsg net/socket.c:2680 [inline] __do_sys_sendmsg net/socket.c:2689 [inline] __se_sys_sendmsg net/socket.c:2687 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687 x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Local variable key created at: tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408
Fixes: 88c67aeb1407 ("sched: act_ct: add netns into the key of tcf_ct_flow_table") Reported-by: syzbot+1b5e4e187cc586d05ea0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.42 |
|
#
6e4f6b5e |
| 18-Jul-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.41' into dev-6.6
This is the 6.6.41 stable release
|
Revision tags: v6.6.41, v6.6.40, v6.6.39 |
|
#
799a3490 |
| 10-Jul-2024 |
Chengen Du <chengen.du@canonical.com> |
net/sched: Fix UAF when resolving a clash
[ Upstream commit 26488172b0292bed837b95a006a3f3431d1898c3 ]
KASAN reports the following UAF:
BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_proces
net/sched: Fix UAF when resolving a clash
[ Upstream commit 26488172b0292bed837b95a006a3f3431d1898c3 ]
KASAN reports the following UAF:
BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469
Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> <TASK> asm_common_interrupt+0x27/0x40
Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491
Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491
The ct may be dropped if a clash has been resolved but is still passed to the tcf_ct_flow_table_process_conn function for further usage. This issue can be fixed by retrieving ct from skb again after confirming conntrack.
Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Chengen Du <chengen.du@canonical.com> Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.38, v6.6.37 |
|
#
57904291 |
| 27-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.36' into dev-6.6
This is the 6.6.36 stable release
|
Revision tags: v6.6.36, v6.6.35, v6.6.34 |
|
#
9126fd82 |
| 15-Jun-2024 |
Xin Long <lucien.xin@gmail.com> |
sched: act_ct: add netns into the key of tcf_ct_flow_table
[ Upstream commit 88c67aeb14070bab61d3dd8be96c8b42ebcaf53a ]
zones_ht is a global hashtable for flow_table with zone as key. However, it d
sched: act_ct: add netns into the key of tcf_ct_flow_table
[ Upstream commit 88c67aeb14070bab61d3dd8be96c8b42ebcaf53a ]
zones_ht is a global hashtable for flow_table with zone as key. However, it does not consider netns when getting a flow_table from zones_ht in tcf_ct_init(), and it means an act_ct action in netns A may get a flow_table that belongs to netns B if it has the same zone value.
In Shuang's test with the TOPO:
tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s
tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table, which caused their ct entries entering unexpected states and the TCP connection not able to end normally.
This patch fixes the issue simply by adding netns into the key of tcf_ct_flow_table so that an act_ct action gets a flow_table that belongs to its own netns in tcf_ct_init().
Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net, as the ct_ft is initialized after inserting it to the hashtable in tcf_ct_flow_table_get() and also it requires to implement several functions in rhashtable_params including hashfn, obj_hashfn and obj_cmpfn.
Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23 |
|
#
1188f7f1 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.14' into dev-6.6
This is the 6.6.14 stable release
|
#
e6eff205 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.8' into dev-6.6
This is the 6.6.8 stable release
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9 |
|
#
73f7da5f |
| 28-Dec-2023 |
Tao Liu <taoliu828@163.com> |
net/sched: act_ct: fix skb leak and crash on ooo frags
[ Upstream commit 3f14b377d01d8357eba032b4cabc8c1149b458b6 ]
act_ct adds skb->users before defragmentation. If frags arrive in order, the last
net/sched: act_ct: fix skb leak and crash on ooo frags
[ Upstream commit 3f14b377d01d8357eba032b4cabc8c1149b458b6 ]
act_ct adds skb->users before defragmentation. If frags arrive in order, the last frag's reference is reset in:
inet_frag_reasm_prepare skb_morph
which is not straightforward.
However when frags arrive out of order, nobody unref the last frag, and all frags are leaked. The situation is even worse, as initiating packet capture can lead to a crash[0] when skb has been cloned and shared at the same time.
Fix the issue by removing skb_get() before defragmentation. act_ct returns TC_ACT_CONSUMED when defrag failed or in progress.
[0]: [ 843.804823] ------------[ cut here ]------------ [ 843.809659] kernel BUG at net/core/skbuff.c:2091! [ 843.814516] invalid opcode: 0000 [#1] PREEMPT SMP [ 843.819296] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G S 6.7.0-rc3 #2 [ 843.824107] Hardware name: XFUSION 1288H V6/BC13MBSBD, BIOS 1.29 11/25/2022 [ 843.828953] RIP: 0010:pskb_expand_head+0x2ac/0x300 [ 843.833805] Code: 8b 70 28 48 85 f6 74 82 48 83 c6 08 bf 01 00 00 00 e8 38 bd ff ff 8b 83 c0 00 00 00 48 03 83 c8 00 00 00 e9 62 ff ff ff 0f 0b <0f> 0b e8 8d d0 ff ff e9 b3 fd ff ff 81 7c 24 14 40 01 00 00 4c 89 [ 843.843698] RSP: 0018:ffffc9000cce07c0 EFLAGS: 00010202 [ 843.848524] RAX: 0000000000000002 RBX: ffff88811a211d00 RCX: 0000000000000820 [ 843.853299] RDX: 0000000000000640 RSI: 0000000000000000 RDI: ffff88811a211d00 [ 843.857974] RBP: ffff888127d39518 R08: 00000000bee97314 R09: 0000000000000000 [ 843.862584] R10: 0000000000000000 R11: ffff8881109f0000 R12: 0000000000000880 [ 843.867147] R13: ffff888127d39580 R14: 0000000000000640 R15: ffff888170f7b900 [ 843.871680] FS: 0000000000000000(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 [ 843.876242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 843.880778] CR2: 00007fa42affcfb8 CR3: 000000011433a002 CR4: 0000000000770ef0 [ 843.885336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 843.889809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 843.894229] PKRU: 55555554 [ 843.898539] Call Trace: [ 843.902772] <IRQ> [ 843.906922] ? __die_body+0x1e/0x60 [ 843.911032] ? die+0x3c/0x60 [ 843.915037] ? do_trap+0xe2/0x110 [ 843.918911] ? pskb_expand_head+0x2ac/0x300 [ 843.922687] ? do_error_trap+0x65/0x80 [ 843.926342] ? pskb_expand_head+0x2ac/0x300 [ 843.929905] ? exc_invalid_op+0x50/0x60 [ 843.933398] ? pskb_expand_head+0x2ac/0x300 [ 843.936835] ? asm_exc_invalid_op+0x1a/0x20 [ 843.940226] ? pskb_expand_head+0x2ac/0x300 [ 843.943580] inet_frag_reasm_prepare+0xd1/0x240 [ 843.946904] ip_defrag+0x5d4/0x870 [ 843.950132] nf_ct_handle_fragments+0xec/0x130 [nf_conntrack] [ 843.953334] tcf_ct_act+0x252/0xd90 [act_ct] [ 843.956473] ? tcf_mirred_act+0x516/0x5a0 [act_mirred] [ 843.959657] tcf_action_exec+0xa1/0x160 [ 843.962823] fl_classify+0x1db/0x1f0 [cls_flower] [ 843.966010] ? skb_clone+0x53/0xc0 [ 843.969173] tcf_classify+0x24d/0x420 [ 843.972333] tc_run+0x8f/0xf0 [ 843.975465] __netif_receive_skb_core+0x67a/0x1080 [ 843.978634] ? dev_gro_receive+0x249/0x730 [ 843.981759] __netif_receive_skb_list_core+0x12d/0x260 [ 843.984869] netif_receive_skb_list_internal+0x1cb/0x2f0 [ 843.987957] ? mlx5e_handle_rx_cqe_mpwrq_rep+0xfa/0x1a0 [mlx5_core] [ 843.991170] napi_complete_done+0x72/0x1a0 [ 843.994305] mlx5e_napi_poll+0x28c/0x6d0 [mlx5_core] [ 843.997501] __napi_poll+0x25/0x1b0 [ 844.000627] net_rx_action+0x256/0x330 [ 844.003705] __do_softirq+0xb3/0x29b [ 844.006718] irq_exit_rcu+0x9e/0xc0 [ 844.009672] common_interrupt+0x86/0xa0 [ 844.012537] </IRQ> [ 844.015285] <TASK> [ 844.017937] asm_common_interrupt+0x26/0x40 [ 844.020591] RIP: 0010:acpi_safe_halt+0x1b/0x20 [ 844.023247] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 65 48 8b 04 25 00 18 03 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d 81 d0 44 00 fb f4 <fa> c3 0f 1f 00 89 fa ec 48 8b 05 ee 88 ed 00 a9 00 00 00 80 75 11 [ 844.028900] RSP: 0018:ffffc90000533e70 EFLAGS: 00000246 [ 844.031725] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000 [ 844.034553] RDX: ffff889ffffc0000 RSI: ffffffff828b7f20 RDI: ffff88a090f45c64 [ 844.037368] RBP: ffff88a0901a2800 R08: ffff88a090f45c00 R09: 00000000000317c0 [ 844.040155] R10: 00ec812281150475 R11: ffff889fffff0e04 R12: ffffffff828b7fa0 [ 844.042962] R13: ffffffff828b7f20 R14: 0000000000000001 R15: 0000000000000000 [ 844.045819] acpi_idle_enter+0x7b/0xc0 [ 844.048621] cpuidle_enter_state+0x7f/0x430 [ 844.051451] cpuidle_enter+0x2d/0x40 [ 844.054279] do_idle+0x1d4/0x240 [ 844.057096] cpu_startup_entry+0x2a/0x30 [ 844.059934] start_secondary+0x104/0x130 [ 844.062787] secondary_startup_64_no_verify+0x16b/0x16b [ 844.065674] </TASK>
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Tao Liu <taoliu828@163.com> Link: https://lore.kernel.org/r/20231228081457.936732-1-taoliu828@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.8, v6.6.7, v6.6.6, v6.6.5 |
|
#
15f300ed |
| 05-Dec-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
[ Upstream commit 125f1c7f26ffcdbf96177abe75b70c1a6ceb17bc ]
The referenced change added custom cleanup code to act_ct to delete any ca
net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
[ Upstream commit 125f1c7f26ffcdbf96177abe75b70c1a6ceb17bc ]
The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free.
Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered.
[0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] <TASK> [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] </TASK> [106170.970726] ---[ end trace 0000000000000000 ]---
Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
b97d6790 |
| 13-Dec-2023 |
Joel Stanley <joel@jms.id.au> |
Merge tag 'v6.6.6' into dev-6.6
This is the 6.6.6 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
Revision tags: v6.6.4, v6.6.3, v6.6.2 |
|
#
0a7c9d1f |
| 13-Nov-2023 |
Xin Long <lucien.xin@gmail.com> |
net: sched: do not offload flows with a helper in act_ct
[ Upstream commit 7cd5af0e937a197295f3aa3721031f0fbae49cff ]
There is no hardware supporting ct helper offload. However, prior to this patch
net: sched: do not offload flows with a helper in act_ct
[ Upstream commit 7cd5af0e937a197295f3aa3721031f0fbae49cff ]
There is no hardware supporting ct helper offload. However, prior to this patch, a flower filter with a helper in the ct action can be successfully set into the HW, for example (eth1 is a bnxt NIC):
# tc qdisc add dev eth1 ingress_block 22 ingress # tc filter add block 22 proto ip flower skip_sw ip_proto tcp \ dst_port 21 ct_state -trk action ct helper ipv4-tcp-ftp # tc filter show dev eth1 ingress
filter block 22 protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_port 21 ct_state -trk skip_sw in_hw in_hw_count 1 <---- action order 1: ct zone 0 helper ipv4-tcp-ftp pipe index 2 ref 1 bind 1 used_hw_stats delayed
This might cause the flower filter not to work as expected in the HW.
This patch avoids this problem by simply returning -EOPNOTSUPP in tcf_ct_offload_act_setup() to not allow to offload flows with a helper in act_ct.
Fixes: a21b06e73191 ("net: sched: add helper support in act_ct") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/f8685ec7702c4a448a1371a8b34b43217b583b9d.1699898008.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.11, v6.6.1 |
|
#
d5a116db |
| 03-Nov-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/sched: act_ct: Always fill offloading tuple iifidx
[ Upstream commit 9bc64bd0cd765f696fcd40fc98909b1f7c73b2ba ]
Referenced commit doesn't always set iifidx when offloading the flow to hardware.
net/sched: act_ct: Always fill offloading tuple iifidx
[ Upstream commit 9bc64bd0cd765f696fcd40fc98909b1f7c73b2ba ]
Referenced commit doesn't always set iifidx when offloading the flow to hardware. Fix the following cases:
- nf_conn_act_ct_ext_fill() is called before extension is created with nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with unspecified iifidx when connection is offloaded after only single original-direction packet has been processed by tc data path. Always fill the new nf_conn_act_ct_ext instance after creating it in nf_conn_act_ct_ext_add().
- Offloading of unidirectional UDP NEW connections is now supported, but ct flow iifidx field is not updated when connection is promoted to bidirectional which can result reply-direction iifidx to be zero when refreshing the connection. Fill in the extension and update flow iifidx before calling flow_offload_refresh().
Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.10, v6.6 |
|
#
c17cda15 |
| 26-Oct-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter.
Most regressions addressed h
Merge tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter.
Most regressions addressed here come from quite old versions, with the exceptions of the iavf one and the WiFi fixes. No known outstanding reports or investigation.
Fixes to fixes:
- eth: iavf: in iavf_down, disable queues when removing the driver
Previous releases - regressions:
- sched: act_ct: additional checks for outdated flows
- tcp: do not leave an empty skb in write queue
- tcp: fix wrong RTO timeout when received SACK reneging
- wifi: cfg80211: pass correct pointer to rdev_inform_bss()
- eth: i40e: sync next_to_clean and next_to_process for programming status desc
- eth: iavf: initialize waitqueues before starting watchdog_task
Previous releases - always broken:
- eth: r8169: fix data-races
- eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry
- eth: r8152: avoid writing garbage to the adapter's registers
- eth: gtp: fix fragmentation needed check with gso"
* tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) iavf: in iavf_down, disable queues when removing the driver vsock/virtio: initialize the_virtio_vsock before using VQs net: ipv6: fix typo in comments net: ipv4: fix typo in comments net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR gtp: fix fragmentation needed check with gso gtp: uapi: fix GTPA_MAX Fix NULL pointer dereference in cn_filter() sfc: cleanup and reduce netlink error messages net/handshake: fix file ref count in handshake_nl_accept_doit() wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() isdn: mISDN: hfcsusb: Spelling fix in comment tcp: fix wrong RTO timeout when received SACK reneging r8152: Block future register access if register access fails r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() ...
show more ...
|
#
5e5d8b94 |
| 25-Oct-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
==================== Netfilter fixes for net
This patch contains two late Netfilter's
Merge tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
==================== Netfilter fixes for net
This patch contains two late Netfilter's flowtable fixes for net:
1) Flowtable GC pushes back packets to classic path in every GC run, ie. every second. This is because NF_FLOW_HW_ESTABLISHED is only used by sched/act_ct (never set) and IPS_SEEN_REPLY might be unset by the time the flow is offloaded (this status bit is only reliable in the sched/act_ct datapath).
2) sched/act_ct logic to push back packets to classic path to reevaluate if UDP flow is unidirectional only applies if IPS_HW_OFFLOAD_BIT is set on and no hardware offload request is pending to be handled. From Vlad Buslov.
These two patches fixes two problems that were introduced in the previous 6.5 development cycle.
* tag 'nf-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path ====================
Link: https://lore.kernel.org/r/20231025100819.2664-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.5.9 |
|
#
a63b6622 |
| 24-Oct-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/sched: act_ct: additional checks for outdated flows
Current nf_flow_is_outdated() implementation considers any flow table flow which state diverged from its underlying CT connection status for t
net/sched: act_ct: additional checks for outdated flows
Current nf_flow_is_outdated() implementation considers any flow table flow which state diverged from its underlying CT connection status for teardown which can be problematic in the following cases:
- Flow has never been offloaded to hardware in the first place either because flow table has hardware offload disabled (flag NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add' workqueue to be offloaded for the first time. The former is incorrect, the later generates excessive deletions and additions of flows.
- Flow is already pending to be updated on the workqueue. Tearing down such flows will also generate excessive removals from the flow table, especially on highly loaded system where the latency to re-offload a flow via 'add' workqueue can be quite high.
When considering a flow for teardown as outdated verify that it is both offloaded to hardware and doesn't have any pending updates.
Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
#
735795f6 |
| 24-Oct-2023 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netfilter: flowtable: GC pushes back packets to classic path
Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
netfilter: flowtable: GC pushes back packets to classic path
Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY back to classic path in every run, ie. every second. This is because of a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.
In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on and IPS_SEEN_REPLY is unreliable since users decide when to offload the flow before, such bit might be set on at a later stage.
Fix it by adding a custom .gc handler that sched/act_ct can use to deal with its NF_FLOW_HW_ESTABLISHED bit.
Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reported-by: Vladimir Smelhaus <vl.sm@email.cz> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
show more ...
|
Revision tags: v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
c900529f |
| 12-Sep-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Forwarding to v6.6-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.5.2, v6.1.51, v6.5.1 |
|
#
1ac731c5 |
| 30-Aug-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.6 merge window.
|
Revision tags: v6.1.50 |
|
#
bd6c11bc |
| 29-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Increase size limits for to-be-sent skb frag allocat
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory pressure
- Remove hard coded limitation on IPv6 specific info placement inside the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy
- WiFi: - MediaTek mt7981 support
- Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips
- Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925
Drivers:
- Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution
- Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support
- Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers
- Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support
- Ethernet PHYs: - convert mv88e6xxx to phylink_pcs
- WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support
- Connector: - support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
show more ...
|
Revision tags: v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44 |
|
#
2612e3bb |
| 07-Aug-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo V
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
#
9f771739 |
| 07-Aug-2023 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/1
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/121735/
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v6.1.43, v6.1.42, v6.1.41, v6.1.40 |
|
#
2d6d7d6c |
| 20-Jul-2023 |
Paolo Abeni <pabeni@redhat.com> |
Merge branch 'net-handle-the-exp-removal-problem-with-ovs-upcall-properly'
Xin Long says:
==================== net: handle the exp removal problem with ovs upcall properly
With the OVS upcall, the
Merge branch 'net-handle-the-exp-removal-problem-with-ovs-upcall-properly'
Xin Long says:
==================== net: handle the exp removal problem with ovs upcall properly
With the OVS upcall, the original ct in the skb will be dropped, and when the skb comes back from userspace it has to create a new ct again through nf_conntrack_in() in either OVS __ovs_ct_lookup() or TC tcf_ct_act().
However, the new ct will not be able to have the exp as the original ct has taken it away from the hash table in nf_ct_find_expectation(). This will cause some flow never to be matched, like:
'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
if the 2nd flow triggers the OVS upcall, the 3rd flow will never get matched.
OVS conntrack works around this by adding its own exp lookup function to not remove the exp from the hash table and saving the exp and its master info to the flow keys instead of create a real ct. But this way doesn't work for TC act_ct.
The patch 1/3 allows nf_ct_find_expectation() not to remove the exp from the hash table if tmpl is set with IPS_CONFIRMED when doing lookup. This allows both OVS conntrack and TC act_ct to have a simple and clear fix for this problem in the patch 2/3 and 3/3. ====================
Link: https://lore.kernel.org/r/cover.1689541664.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
Revision tags: v6.1.39 |
|
#
76622ced |
| 16-Jul-2023 |
Xin Long <lucien.xin@gmail.com> |
net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct
With the following flows, the packets will be dropped if OVS TC offload is enabled.
'ip,ct_state=-trk,in_port=1 acti
net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ct
With the following flows, the packets will be dropped if OVS TC offload is enabled.
'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal'
In the 1st flow, it finds the exp from the hashtable and removes it then creates the ct with this exp in act_ct. However, in the 2nd flow it goes to the OVS upcall at the 1st time. When the skb comes back from userspace, it has to create the ct again without exp(the exp was removed last time). With no 'rel' set in the ct, the 3rd flow can never get matched.
In OVS conntrack, it works around it by adding its own exp lookup function ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating a real ct, it only updates its keys with the exp and its master info. So when the skb comes back, the exp is still in the hashtable.
However, we can't do this trick in act_ct, as tc flower match is using a real ct, and passing the exp and its master info to flower parsing via tc_skb_cb is also not possible (tc_skb_cb size is not big enough).
The simple and clear fix is to not remove the exp at the 1st flow, namely, not set IPS_CONFIRMED in tmpl when commit is not set in act_ct.
Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
50501936 |
| 17-Jul-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.4' into next
Sync up with mainline to bring in updates to shared infrastructure.
|