History log of /openbmc/linux/net/bridge/br_private.h (Results 1 – 25 of 817)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 3f59ac29 09-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, pa

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.

Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.

Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.

[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220

Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 3f59ac29 09-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, pa

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.

Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.

Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.

[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220

Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 3f59ac29 09-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, pa

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.

Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.

Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.

[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220

Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 3f59ac29 09-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, pa

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.

Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.

Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.

[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220

Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# 3f59ac29 09-Apr-2024 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, pa

netfilter: br_netfilter: skip conntrack input hook for promisc packets

[ Upstream commit 751de2012eafa4d46d8081056761fa0e9cc8a178 ]

For historical reasons, when bridge device is in promisc mode, packets
that are directed to the taps follow bridge input hook path. This patch
adds a workaround to reset conntrack for these packets.

Jianbo Liu reports warning splats in their test infrastructure where
cloned packets reach the br_netfilter input hook to confirm the
conntrack object.

Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
reached the input hook because it is passed up to the bridge device to
reach the taps.

[ 57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
[ 57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
[ 57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
[ 57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
[ 57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
[ 57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
[ 57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
[ 57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
[ 57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
[ 57.582313] FS: 0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
[ 57.583040] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
[ 57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 57.585440] Call Trace:
[ 57.585721] <IRQ>
[ 57.585976] ? __warn+0x7d/0x130
[ 57.586323] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.586811] ? report_bug+0xf1/0x1c0
[ 57.587177] ? handle_bug+0x3f/0x70
[ 57.587539] ? exc_invalid_op+0x13/0x60
[ 57.587929] ? asm_exc_invalid_op+0x16/0x20
[ 57.588336] ? br_nf_local_in+0x157/0x180 [br_netfilter]
[ 57.588825] nf_hook_slow+0x3d/0xd0
[ 57.589188] ? br_handle_vlan+0x4b/0x110
[ 57.589579] br_pass_frame_up+0xfc/0x150
[ 57.589970] ? br_port_flags_change+0x40/0x40
[ 57.590396] br_handle_frame_finish+0x346/0x5e0
[ 57.590837] ? ipt_do_table+0x32e/0x430
[ 57.591221] ? br_handle_local_finish+0x20/0x20
[ 57.591656] br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
[ 57.592286] ? br_handle_local_finish+0x20/0x20
[ 57.592802] br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
[ 57.593348] ? br_handle_local_finish+0x20/0x20
[ 57.593782] ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
[ 57.594279] br_nf_pre_routing+0x24c/0x550 [br_netfilter]
[ 57.594780] ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
[ 57.595280] br_handle_frame+0x1f3/0x3d0
[ 57.595676] ? br_handle_local_finish+0x20/0x20
[ 57.596118] ? br_handle_frame_finish+0x5e0/0x5e0
[ 57.596566] __netif_receive_skb_core+0x25b/0xfc0
[ 57.597017] ? __napi_build_skb+0x37/0x40
[ 57.597418] __netif_receive_skb_list_core+0xfb/0x220

Fixes: 62e7151ae3eb ("netfilter: bridge: confirm multicast packets before passing them up the stack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15
# d99971ec 27-Jan-2024 Linus Lüssing <linus.luessing@c0d3.blue>

bridge: mcast: fix disabled snooping after long uptime

[ Upstream commit f5c3eb4b7251baba5cd72c9e93920e710ac8194a ]

The original idea of the delay_time check was to not apply multicast
snooping too

bridge: mcast: fix disabled snooping after long uptime

[ Upstream commit f5c3eb4b7251baba5cd72c9e93920e710ac8194a ]

The original idea of the delay_time check was to not apply multicast
snooping too early when an MLD querier appears. And to instead wait at
least for MLD reports to arrive before switching from flooding to group
based, MLD snooped forwarding, to avoid temporary packet loss.

However in a batman-adv mesh network it was noticed that after 248 days of
uptime 32bit MIPS based devices would start to signal that they had
stopped applying multicast snooping due to missing queriers - even though
they were the elected querier and still sending MLD queries themselves.

While time_is_before_jiffies() generally is safe against jiffies
wrap-arounds, like the code comments in jiffies.h explain, it won't
be able to track a difference larger than ULONG_MAX/2. With a 32bit
large jiffies and one jiffies tick every 10ms (CONFIG_HZ=100) on these MIPS
devices running OpenWrt this would result in a difference larger than
ULONG_MAX/2 after 248 (= 2^32/100/60/60/24/2) days and
time_is_before_jiffies() would then start to return false instead of
true. Leading to multicast snooping not being applied to multicast
packets anymore.

Fix this issue by using a proper timer_list object which won't have this
ULONG_MAX/2 difference limitation.

Fixes: b00589af3b04 ("bridge: disable snooping if there is no querier")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240127175033.9640-1-linus.luessing@c0d3.blue
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42
# 4d66f235 26-Jul-2023 YueHaibing <yuehaibing@huawei.com>

bridge: Remove unused declaration br_multicast_set_hash_max()

Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
this is not used, so can remove it.

Signed-off-by: Y

bridge: Remove unused declaration br_multicast_set_hash_max()

Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
this is not used, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230726143141.11704-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.41, v6.1.40, v6.1.39
# f2e2857b 19-Jul-2023 Petr Machata <petrm@nvidia.com>

net: switchdev: Add a helper to replay objects on a bridge port

When a front panel joins a bridge via another netdevice (typically a LAG),
the driver needs to learn about the objects configured on t

net: switchdev: Add a helper to replay objects on a bridge port

When a front panel joins a bridge via another netdevice (typically a LAG),
the driver needs to learn about the objects configured on the bridge port.
When the bridge port is offloaded by the driver for the first time, this
can be achieved by passing a notifier to switchdev_bridge_port_offload().
The notifier is then invoked for the individual objects (such as VLANs)
configured on the bridge, and can look for the interesting ones.

Calling switchdev_bridge_port_offload() when the second port joins the
bridge lower is unnecessary, but the replay is still needed. To that end,
add a new function, switchdev_bridge_port_replay(), which does only the
replay part of the _offload() function in exactly the same way as that
function.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Ivan Vecera <ivecera@redhat.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 29cfb2aa 17-Jul-2023 Ido Schimmel <idosch@nvidia.com>

bridge: Add backup nexthop ID support

Add a new bridge port attribute that allows attaching a nexthop object
ID to an skb that is redirected to a backup bridge port with VLAN
tunneling enabled.

Spe

bridge: Add backup nexthop ID support

Add a new bridge port attribute that allows attaching a nexthop object
ID to an skb that is redirected to a backup bridge port with VLAN
tunneling enabled.

Specifically, when redirecting a known unicast packet, read the backup
nexthop ID from the bridge port that lost its carrier and set it in the
bridge control block of the skb before forwarding it via the backup
port. Note that reading the ID from the bridge port should not result in
a cache miss as the ID is added next to the 'backup_port' field that was
already accessed. After this change, the 'state' field still stays on
the first cache line, together with other data path related fields such
as 'flags and 'vlgrp':

struct net_bridge_port {
struct net_bridge * br; /* 0 8 */
struct net_device * dev; /* 8 8 */
netdevice_tracker dev_tracker; /* 16 0 */
struct list_head list; /* 16 16 */
long unsigned int flags; /* 32 8 */
struct net_bridge_vlan_group * vlgrp; /* 40 8 */
struct net_bridge_port * backup_port; /* 48 8 */
u32 backup_nhid; /* 56 4 */
u8 priority; /* 60 1 */
u8 state; /* 61 1 */
u16 port_no; /* 62 2 */
/* --- cacheline 1 boundary (64 bytes) --- */
[...]
} __attribute__((__aligned__(8)));

When forwarding an skb via a bridge port that has VLAN tunneling
enabled, check if the backup nexthop ID stored in the bridge control
block is valid (i.e., not zero). If so, instead of attaching the
pre-allocated metadata (that only has the tunnel key set), allocate a
new metadata, set both the tunnel key and the nexthop object ID and
attach it to the skb.

By default, do not dump the new attribute to user space as a value of
zero is an invalid nexthop object ID.

The above is useful for EVPN multihoming. When one of the links
composing an Ethernet Segment (ES) fails, traffic needs to be redirected
towards the host via one of the other ES peers. For example, if a host
is multihomed to three different VTEPs, the backup port of each ES link
needs to be set to the VXLAN device and the backup nexthop ID needs to
point to an FDB nexthop group that includes the IP addresses of the
other two VTEPs. The VXLAN driver will extract the ID from the metadata
of the redirected skb, calculate its flow hash and forward it towards
one of the other VTEPs. If the ID does not exist, or represents an
invalid nexthop object, the VXLAN driver will drop the skb. This
relieves the bridge driver from the need to validate the ID.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31
# 7b4858df 29-May-2023 Ido Schimmel <idosch@nvidia.com>

skbuff: bridge: Add layer 2 miss indication

For EVPN non-DF (Designated Forwarder) filtering we need to be able to
prevent decapsulated traffic from being flooded to a multi-homed host.
Filtering of

skbuff: bridge: Add layer 2 miss indication

For EVPN non-DF (Designated Forwarder) filtering we need to be able to
prevent decapsulated traffic from being flooded to a multi-homed host.
Filtering of multicast and broadcast traffic can be achieved using the
following flower filter:

# tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop

Unlike broadcast and multicast traffic, it is not currently possible to
filter unknown unicast traffic. The classification into unknown unicast
is performed by the bridge driver, but is not visible to other layers
such as tc.

Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear
the bit whenever a packet enters the bridge (received from a bridge port
or transmitted via the bridge) and set it if the packet did not match an
FDB or MDB entry. If there is no skb extension and the bit needs to be
cleared, then do not allocate one as no extension is equivalent to the
bit being cleared. The bit is not set for broadcast packets as they
never perform a lookup and therefore never incur a miss.

A bit that is set for every flooded packet would also work for the
current use case, but it does not allow us to differentiate between
registered and unregistered multicast traffic, which might be useful in
the future.

To keep the performance impact to a minimum, the marking of packets is
guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not
touched and an skb extension is not allocated. Instead, only a
5 bytes nop is executed, as demonstrated below for the call site in
br_handle_frame().

Before the patch:

```
memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
c37b09: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12)
c37b10: 00 00

p = br_port_get_rcu(skb->dev);
c37b12: 49 8b 44 24 10 mov 0x10(%r12),%rax
memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
c37b17: 49 c7 44 24 30 00 00 movq $0x0,0x30(%r12)
c37b1e: 00 00
c37b20: 49 c7 44 24 38 00 00 movq $0x0,0x38(%r12)
c37b27: 00 00
```

After the patch (when static key is disabled):

```
memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
c37c29: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12)
c37c30: 00 00
c37c32: 49 8d 44 24 28 lea 0x28(%r12),%rax
c37c37: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax)
c37c3e: 00
c37c3f: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax)
c37c46: 00

#ifdef CONFIG_HAVE_JUMP_LABEL_HACK

static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
asm_volatile_goto("1:"
c37c47: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
br_tc_skb_miss_set(skb, false);

p = br_port_get_rcu(skb->dev);
c37c4c: 49 8b 44 24 10 mov 0x10(%r12),%rax
```

Subsequent patches will extend the flower classifier to be able to match
on the new 'l2_miss' bit and enable / disable the static key when
filters that match on it are added / deleted.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25
# 3aca683e 19-Apr-2023 Ido Schimmel <idosch@nvidia.com>

bridge: Encapsulate data path neighbor suppression logic

Currently, there are various places in the bridge data path that check
whether neighbor suppression is enabled on a given bridge port.

As a

bridge: Encapsulate data path neighbor suppression logic

Currently, there are various places in the bridge data path that check
whether neighbor suppression is enabled on a given bridge port.

As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate
this logic in a function and pass the VLAN ID of the packet as an
argument.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# a714e3ec 19-Apr-2023 Ido Schimmel <idosch@nvidia.com>

bridge: Add internal flags for per-{Port, VLAN} neighbor suppression

Add two internal flags that will be used to enable / disable per-{Port,
VLAN} neighbor suppression:

1. 'BR_NEIGH_VLAN_SUPPRESS':

bridge: Add internal flags for per-{Port, VLAN} neighbor suppression

Add two internal flags that will be used to enable / disable per-{Port,
VLAN} neighbor suppression:

1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that
per-{Port, VLAN} neighbor suppression is enabled on the bridge port.
When set, 'BR_NEIGH_SUPPRESS' has no effect.

2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate
that neighbor suppression is enabled on the given VLAN.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# e408336a 19-Apr-2023 Ido Schimmel <idosch@nvidia.com>

bridge: Pass VLAN ID to br_flood()

Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which will require br_flood() to potentially suppress ARP /
NS packets on a per-{Port, V

bridge: Pass VLAN ID to br_flood()

Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which will require br_flood() to potentially suppress ARP /
NS packets on a per-{Port, VLAN} basis.

As a preparation, pass the VLAN ID of the packet as another argument to
br_flood().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20
# cc7f5022 15-Mar-2023 Ido Schimmel <idosch@nvidia.com>

rtnetlink: bridge: mcast: Move MDB handlers out of bridge driver

Currently, the bridge driver registers handlers for MDB netlink
messages, making it impossible for other drivers to implement MDB
sup

rtnetlink: bridge: mcast: Move MDB handlers out of bridge driver

Currently, the bridge driver registers handlers for MDB netlink
messages, making it impossible for other drivers to implement MDB
support.

As a preparation for VXLAN MDB support, move the MDB handlers out of the
bridge driver to the core rtnetlink code. The rtnetlink code will call
into individual drivers by invoking their previously added MDB net
device operations.

Note that while the diffstat is large, the change is mechanical. It
moves code out of the bridge driver to rtnetlink code. Also note that a
similar change was made in 2012 with commit 77162022ab26 ("net: add
generic PF_BRIDGE:RTM_ FDB hooks") that moved FDB handlers out of the
bridge driver to the core rtnetlink code.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# c009de10 15-Mar-2023 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Implement MDB net device operations

Implement the previously added MDB net device operations in the bridge
driver so that they could be invoked by core rtnetlink code in the next
patc

bridge: mcast: Implement MDB net device operations

Implement the previously added MDB net device operations in the bridge
driver so that they could be invoked by core rtnetlink code in the next
patch.

The operations are identical to the existing br_mdb_{dump,add,del}
functions. The '_new' suffix will be removed in the next patch. The
functions are re-implemented in this patch to make the conversion in the
next patch easier to review.

Add dummy implementations when 'CONFIG_BRIDGE_IGMP_SNOOPING' is
disabled, so that an error will be returned to user space when it is
trying to add or delete an MDB entry. This is consistent with existing
behavior where the bridge driver does not even register rtnetlink
handlers for RTM_{NEW,DEL,GET}MDB messages when this Kconfig option is
disabled.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10
# a1aee20d 02-Feb-2023 Petr Machata <petrm@nvidia.com>

net: bridge: Add netlink knobs for number / maximum MDB entries

The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values st

net: bridge: Add netlink knobs for number / maximum MDB entries

The previous patch added accounting for number of MDB entries per port and
per port-VLAN, and the logic to verify that these values stay within
configured bounds. However it didn't provide means to actually configure
those bounds or read the occupancy. This patch does that.

Two new netlink attributes are added for the MDB occupancy:
IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
And another two for the maximum number of MDB entries:
IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.

Note that the two new IFLA_BRPORT_ attributes prompt bumping of
RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.

The new attributes are used like this:

# ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
mcast_vlan_snooping 1 mcast_querier 1
# ip link set dev v1 master br
# bridge vlan add dev v1 vid 2

# bridge vlan set dev v1 vid 1 mcast_max_groups 1
# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.

# bridge link set dev v1 mcast_max_groups 1
# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
Error: bridge: Port is already in 1 groups, and mcast_max_groups=1.

# bridge -d link show
5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
[...] mcast_n_groups 1 mcast_max_groups 1

# bridge -d vlan show
port vlan-id
br 1 PVID Egress Untagged
state forwarding mcast_router 1
v1 1 PVID Egress Untagged
[...] mcast_n_groups 1 mcast_max_groups 1
2
[...] mcast_n_groups 0 mcast_max_groups 0

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# b57e8d87 02-Feb-2023 Petr Machata <petrm@nvidia.com>

net: bridge: Maintain number of MDB entries in net_bridge_mcast_port

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client c

net: bridge: Maintain number of MDB entries in net_bridge_mcast_port

The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
per-port-VLAN maximum permitted number of MDB entries, or 0 for
no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 976b3858 02-Feb-2023 Petr Machata <petrm@nvidia.com>

net: bridge: Add br_multicast_del_port_group()

Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g(

net: bridge: Add br_multicast_del_port_group()

Since cleaning up the effects of br_multicast_new_port_group() just
consists of delisting and freeing the memory, the function
br_mdb_add_group_star_g() inlines the corresponding code. In the following
patches, number of per-port and per-port-VLAN MDB entries is going to be
maintained, and that counter will have to be updated. Because that logic
is going to be hidden in the br_multicast module, introduce a new hook
intended to again remove a newly-created group.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 60977a0c 02-Feb-2023 Petr Machata <petrm@nvidia.com>

net: bridge: Add extack to br_multicast_new_port_group()

Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB m

net: bridge: Add extack to br_multicast_new_port_group()

Make it possible to set an extack in br_multicast_new_port_group().
Eventually, this function will check for per-port and per-port-vlan
MDB maximums, and will use the extack to communicate the reason for
the bounce.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1
# 61f21835 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Support replacement of MDB port group entries

Now that user space can specify additional attributes of port group
entries such as filter mode and source list, it makes sense to allow

bridge: mcast: Support replacement of MDB port group entries

Now that user space can specify additional attributes of port group
entries such as filter mode and source list, it makes sense to allow
user space to atomically modify these attributes by replacing entries
instead of forcing user space to delete the entries and add them back.

Replace MDB port group entries when the 'NLM_F_REPLACE' flag is
specified in the netlink message header.

When a (*, G) entry is replaced, update the following attributes: Source
list, state, filter mode, protocol and flags. If the entry is temporary
and in EXCLUDE mode, reset the group timer to the group membership
interval. If the entry is temporary and in INCLUDE mode, reset the
source timers of associated sources to the group membership interval.

Examples:

# bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include
# bridge -d -s mdb show
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static 0.00
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static 0.00
dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static 0.00

# bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
# bridge -d -s mdb show
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra blocked 0.00
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra blocked 0.00
dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra 0.00

# bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp
# bridge -d -s mdb show
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp 0.00
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp 0.00
dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp 0.00

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 1d7b66a7 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Allow user space to specify MDB entry routing protocol

Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the
routing protocol of the MDB port group entry. Enforce a

bridge: mcast: Allow user space to specify MDB entry routing protocol

Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the
routing protocol of the MDB port group entry. Enforce a minimum value of
'RTPROT_STATIC' to prevent user space from using protocol values that
should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain
backward compatibility by defaulting to 'RTPROT_STATIC'.

The protocol is already visible to user space in RTM_NEWMDB responses
and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute.

The routing protocol allows a routing daemon to distinguish between
entries configured by it and those configured by the administrator. Once
MDB flush is supported, the protocol can be used as a criterion
according to which the flush is performed.

Examples:

# bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel
Error: integer out of range.

# bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static

# bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra

# bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250

# bridge -d mdb show
dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250
dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250
dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250
dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra
dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# b1c8fec8 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Add support for (*, G) with a source list and filter mode

In preparation for allowing user space to add (*, G) entries with a
source list and associated filter mode, add the necessary

bridge: mcast: Add support for (*, G) with a source list and filter mode

In preparation for allowing user space to add (*, G) entries with a
source list and associated filter mode, add the necessary plumbing to
handle such requests.

Extend the MDB configuration structure with a currently empty source
array and filter mode that is currently hard coded to EXCLUDE.

Add the source entries and the corresponding (S, G) entries before
making the new (*, G) port group entry visible to the data path.

Handle the creation of each source entry in a similar fashion to how it
is created from the data path in response to received Membership
Reports: Create the source entry, arm the source timer (if needed), add
a corresponding (S, G) forwarding entry and finally mark the source
entry as installed (by user space).

Add the (S, G) entry by populating an MDB configuration structure and
calling br_mdb_add_group_sg() as if a new entry is created by user
space, with the sole difference that the 'src_entry' field is set to
make sure that the group timer of such entries is never armed.

Note that it is not currently possible to add more than 32 source
entries to a port group entry. If this proves to be a problem we can
either increase 'PG_SRC_ENT_LIMIT' or avoid forcing a limit on entries
created by user space.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 079afd66 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source

User space will soon be able to install a (*, G) with a source list,
prompting the creation of a (S, G) entry for each sou

bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source

User space will soon be able to install a (*, G) with a source list,
prompting the creation of a (S, G) entry for each source.

In this case, the group timer of the (S, G) entry should never be set.

Solve this by adding a new field to the MDB configuration structure that
denotes whether the (S, G) corresponds to a source or not.

The field will be set in a subsequent patch where br_mdb_add_group_sg()
is called in order to create a (S, G) entry for each user provided
source.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# a01ecb17 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Add a flag for user installed source entries

There are a few places where the bridge driver differentiates between
(S, G) entries installed by the kernel (in response to Membership
Re

bridge: mcast: Add a flag for user installed source entries

There are a few places where the bridge driver differentiates between
(S, G) entries installed by the kernel (in response to Membership
Reports) and those installed by user space. One of them is when deleting
an (S, G) entry corresponding to a source entry that is being deleted.

While user space cannot currently add a source entry to a (*, G), it can
add an (S, G) entry that later corresponds to a source entry created by
the reception of a Membership Report. If this source entry is later
deleted because its source timer expired or because the (*, G) entry is
being deleted, the bridge driver will not delete the corresponding (S,
G) entry if it was added by user space as permanent.

This is going to be a problem when the ability to install a (*, G) with
a source list is exposed to user space. In this case, when user space
installs the (*, G) as permanent, then all the (S, G) entries
corresponding to its source list will also be installed as permanent.
When user space deletes the (*, G), all the source entries will be
deleted and the expectation is that the corresponding (S, G) entries
will be deleted as well.

Solve this by introducing a new source entry flag denoting that the
entry was installed by user space. When the entry is deleted, delete the
corresponding (S, G) entry even if it was installed by user space as
permanent, as the flag tells us that it was installed in response to the
source entry being created.

The flag will be set in a subsequent patch where source entries are
created in response to user requests.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 083e3534 10-Dec-2022 Ido Schimmel <idosch@nvidia.com>

bridge: mcast: Expose __br_multicast_del_group_src()

Expose __br_multicast_del_group_src() which is symmetric to
br_multicast_new_group_src() and does not remove the installed {S, G}
forwarding entr

bridge: mcast: Expose __br_multicast_del_group_src()

Expose __br_multicast_del_group_src() which is symmetric to
br_multicast_new_group_src() and does not remove the installed {S, G}
forwarding entry, unlike br_multicast_del_group_src().

The function will be used in the error path when user space was able to
add a new source entry, but failed to install a corresponding forwarding
entry.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


12345678910>>...33