1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2.. include:: <isonum.txt> 3 4========= 5Switchdev 6========= 7 8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 10.. _mlx5_bridge_offload: 11 12Bridge offload 13============== 14 15The mlx5 driver implements support for offloading bridge rules when in switchdev 16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev 17representor is attached to bridge. 18 19- Change device to switchdev mode:: 20 21 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 22 23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: 24 25 $ ip link set enp8s0f0 master bridge1 26 27VLANs 28----- 29 30Following bridge VLAN functions are supported by mlx5: 31 32- VLAN filtering (including multiple VLANs per port):: 33 34 $ ip link set bridge1 type bridge vlan_filtering 1 35 $ bridge vlan add dev enp8s0f0 vid 2-3 36 37- VLAN push on bridge ingress:: 38 39 $ bridge vlan add dev enp8s0f0 vid 3 pvid 40 41- VLAN pop on bridge egress:: 42 43 $ bridge vlan add dev enp8s0f0 vid 3 untagged 44 45Subfunction 46=========== 47 48mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. 49 50A subfunction has its own function capabilities and its own resources. This 51means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These 52queues are neither shared nor stolen from the parent PCI function. 53 54When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA 55resources neither shared nor stolen from the parent PCI function. 56 57A subfunction has a dedicated window in PCI BAR space that is not shared 58with the other subfunctions or the parent PCI function. This ensures that all 59devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned 60PCI BAR space. 61 62A subfunction supports eswitch representation through which it supports tc 63offloads. The user configures eswitch to send/receive packets from/to 64the subfunction port. 65 66Subfunctions share PCI level resources such as PCI MSI-X IRQs with 67other subfunctions and/or with its parent PCI function. 68 69Example mlx5 software, system, and device view:: 70 71 _______ 72 | admin | 73 | user |---------- 74 |_______| | 75 | | 76 ____|____ __|______ _________________ 77 | | | | | | 78 | devlink | | tc tool | | user | 79 | tool | |_________| | applications | 80 |_________| | |_________________| 81 | | | | 82 | | | | Userspace 83 +---------|-------------|-------------------|----------|--------------------+ 84 | | +----------+ +----------+ Kernel 85 | | | netdev | | rdma dev | 86 | | +----------+ +----------+ 87 (devlink port add/del | ^ ^ 88 port function set) | | | 89 | | +---------------| 90 _____|___ | | _______|_______ 91 | | | | | mlx5 class | 92 | devlink | +------------+ | | drivers | 93 | kernel | | rep netdev | | |(mlx5_core,ib) | 94 |_________| +------------+ | |_______________| 95 | | | ^ 96 (devlink ops) | | (probe/remove) 97 _________|________ | | ____|________ 98 | subfunction | | +---------------+ | subfunction | 99 | management driver|----- | subfunction |---| driver | 100 | (mlx5_core) | | auxiliary dev | | (mlx5_core) | 101 |__________________| +---------------+ |_____________| 102 | ^ 103 (sf add/del, vhca events) | 104 | (device add/del) 105 _____|____ ____|________ 106 | | | subfunction | 107 | PCI NIC |--- activate/deactivate events--->| host driver | 108 |__________| | (mlx5_core) | 109 |_____________| 110 111Subfunction is created using devlink port interface. 112 113- Change device to switchdev mode:: 114 115 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 116 117- Add a devlink port of subfunction flavour:: 118 119 $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 120 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 121 function: 122 hw_addr 00:00:00:00:00:00 state inactive opstate detached 123 124- Show a devlink port of the subfunction:: 125 126 $ devlink port show pci/0000:06:00.0/32768 127 pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 128 function: 129 hw_addr 00:00:00:00:00:00 state inactive opstate detached 130 131- Delete a devlink port of subfunction after use:: 132 133 $ devlink port del pci/0000:06:00.0/32768 134 135Function attributes 136=================== 137 138The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in 139a unified way for SmartNIC and non-SmartNIC. 140 141This is supported only when the eswitch mode is set to switchdev. Port function 142configuration of the PCI VF/SF is supported through devlink eswitch port. 143 144Port function attributes should be set before PCI VF/SF is enumerated by the 145driver. 146 147MAC address setup 148----------------- 149 150mlx5 driver support devlink port function attr mechanism to setup MAC 151address. (refer to Documentation/networking/devlink/devlink-port.rst) 152 153RoCE capability setup 154~~~~~~~~~~~~~~~~~~~~~ 155Not all mlx5 PCI devices/SFs require RoCE capability. 156 157When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per 158PCI devices/SF. 159 160mlx5 driver support devlink port function attr mechanism to setup RoCE 161capability. (refer to Documentation/networking/devlink/devlink-port.rst) 162 163migratable capability setup 164~~~~~~~~~~~~~~~~~~~~~~~~~~~ 165User who wants mlx5 PCI VFs to be able to perform live migration need to 166explicitly enable the VF migratable capability. 167 168mlx5 driver support devlink port function attr mechanism to setup migratable 169capability. (refer to Documentation/networking/devlink/devlink-port.rst) 170 171SF state setup 172-------------- 173 174To use the SF, the user must activate the SF using the SF function state 175attribute. 176 177- Get the state of the SF identified by its unique devlink port index:: 178 179 $ devlink port show ens2f0npf0sf88 180 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 181 function: 182 hw_addr 00:00:00:00:88:88 state inactive opstate detached 183 184- Activate the function and verify its state is active:: 185 186 $ devlink port function set ens2f0npf0sf88 state active 187 188 $ devlink port show ens2f0npf0sf88 189 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 190 function: 191 hw_addr 00:00:00:00:88:88 state active opstate detached 192 193Upon function activation, the PF driver instance gets the event from the device 194that a particular SF was activated. It's the cue to put the device on bus, probe 195it and instantiate the devlink instance and class specific auxiliary devices 196for it. 197 198- Show the auxiliary device and port of the subfunction:: 199 200 $ devlink dev show 201 devlink dev show auxiliary/mlx5_core.sf.4 202 203 $ devlink port show auxiliary/mlx5_core.sf.4/1 204 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false 205 206 $ rdma link show mlx5_0/1 207 link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 208 209 $ rdma dev show 210 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 211 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 212 213- Subfunction auxiliary device and class device hierarchy:: 214 215 mlx5_core.sf.4 216 (subfunction auxiliary device) 217 /\ 218 / \ 219 / \ 220 / \ 221 / \ 222 mlx5_core.eth.4 mlx5_core.rdma.4 223 (sf eth aux dev) (sf rdma aux dev) 224 | | 225 | | 226 p0sf88 mlx5_0 227 (sf netdev) (sf rdma device) 228 229Additionally, the SF port also gets the event when the driver attaches to the 230auxiliary device of the subfunction. This results in changing the operational 231state of the function. This provides visibility to the user to decide when is it 232safe to delete the SF port for graceful termination of the subfunction. 233 234- Show the SF port operational state:: 235 236 $ devlink port show ens2f0npf0sf88 237 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 238 function: 239 hw_addr 00:00:00:00:88:88 state active opstate attached 240