1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2.. include:: <isonum.txt> 3 4========= 5Switchdev 6========= 7 8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 10.. _mlx5_bridge_offload: 11 12Bridge offload 13============== 14 15The mlx5 driver implements support for offloading bridge rules when in switchdev 16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev 17representor is attached to bridge. 18 19- Change device to switchdev mode:: 20 21 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 22 23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: 24 25 $ ip link set enp8s0f0 master bridge1 26 27VLANs 28----- 29 30Following bridge VLAN functions are supported by mlx5: 31 32- VLAN filtering (including multiple VLANs per port):: 33 34 $ ip link set bridge1 type bridge vlan_filtering 1 35 $ bridge vlan add dev enp8s0f0 vid 2-3 36 37- VLAN push on bridge ingress:: 38 39 $ bridge vlan add dev enp8s0f0 vid 3 pvid 40 41- VLAN pop on bridge egress:: 42 43 $ bridge vlan add dev enp8s0f0 vid 3 untagged 44 45Subfunction 46=========== 47 48Subfunction which are spawned over the E-switch are created only with devlink 49device, and by default all the SF auxiliary devices are disabled. 50This will allow user to configure the SF before the SF have been fully probed, 51which will save time. 52 53Usage example: 54Create SF: 55$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 56$ devlink port function set pci/0000:08:00.0/32768 \ 57 hw_addr 00:00:00:00:00:11 state active 58 59Enable ETH auxiliary device: 60$ devlink dev param set auxiliary/mlx5_core.sf.1 \ 61 name enable_eth value true cmode driverinit 62 63Now, in order to fully probe the SF, use devlink reload: 64$ devlink dev reload auxiliary/mlx5_core.sf.1 65 66mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst`) 67 68mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. 69 70A subfunction has its own function capabilities and its own resources. This 71means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These 72queues are neither shared nor stolen from the parent PCI function. 73 74When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA 75resources neither shared nor stolen from the parent PCI function. 76 77A subfunction has a dedicated window in PCI BAR space that is not shared 78with the other subfunctions or the parent PCI function. This ensures that all 79devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned 80PCI BAR space. 81 82A subfunction supports eswitch representation through which it supports tc 83offloads. The user configures eswitch to send/receive packets from/to 84the subfunction port. 85 86Subfunctions share PCI level resources such as PCI MSI-X IRQs with 87other subfunctions and/or with its parent PCI function. 88 89Example mlx5 software, system, and device view:: 90 91 _______ 92 | admin | 93 | user |---------- 94 |_______| | 95 | | 96 ____|____ __|______ _________________ 97 | | | | | | 98 | devlink | | tc tool | | user | 99 | tool | |_________| | applications | 100 |_________| | |_________________| 101 | | | | 102 | | | | Userspace 103 +---------|-------------|-------------------|----------|--------------------+ 104 | | +----------+ +----------+ Kernel 105 | | | netdev | | rdma dev | 106 | | +----------+ +----------+ 107 (devlink port add/del | ^ ^ 108 port function set) | | | 109 | | +---------------| 110 _____|___ | | _______|_______ 111 | | | | | mlx5 class | 112 | devlink | +------------+ | | drivers | 113 | kernel | | rep netdev | | |(mlx5_core,ib) | 114 |_________| +------------+ | |_______________| 115 | | | ^ 116 (devlink ops) | | (probe/remove) 117 _________|________ | | ____|________ 118 | subfunction | | +---------------+ | subfunction | 119 | management driver|----- | subfunction |---| driver | 120 | (mlx5_core) | | auxiliary dev | | (mlx5_core) | 121 |__________________| +---------------+ |_____________| 122 | ^ 123 (sf add/del, vhca events) | 124 | (device add/del) 125 _____|____ ____|________ 126 | | | subfunction | 127 | PCI NIC |--- activate/deactivate events--->| host driver | 128 |__________| | (mlx5_core) | 129 |_____________| 130 131Subfunction is created using devlink port interface. 132 133- Change device to switchdev mode:: 134 135 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 136 137- Add a devlink port of subfunction flavour:: 138 139 $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 140 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 141 function: 142 hw_addr 00:00:00:00:00:00 state inactive opstate detached 143 144- Show a devlink port of the subfunction:: 145 146 $ devlink port show pci/0000:06:00.0/32768 147 pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 148 function: 149 hw_addr 00:00:00:00:00:00 state inactive opstate detached 150 151- Delete a devlink port of subfunction after use:: 152 153 $ devlink port del pci/0000:06:00.0/32768 154 155Function attributes 156=================== 157 158The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in 159a unified way for SmartNIC and non-SmartNIC. 160 161This is supported only when the eswitch mode is set to switchdev. Port function 162configuration of the PCI VF/SF is supported through devlink eswitch port. 163 164Port function attributes should be set before PCI VF/SF is enumerated by the 165driver. 166 167MAC address setup 168----------------- 169 170mlx5 driver support devlink port function attr mechanism to setup MAC 171address. (refer to Documentation/networking/devlink/devlink-port.rst) 172 173RoCE capability setup 174~~~~~~~~~~~~~~~~~~~~~ 175Not all mlx5 PCI devices/SFs require RoCE capability. 176 177When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per 178PCI devices/SF. 179 180mlx5 driver support devlink port function attr mechanism to setup RoCE 181capability. (refer to Documentation/networking/devlink/devlink-port.rst) 182 183migratable capability setup 184~~~~~~~~~~~~~~~~~~~~~~~~~~~ 185User who wants mlx5 PCI VFs to be able to perform live migration need to 186explicitly enable the VF migratable capability. 187 188mlx5 driver support devlink port function attr mechanism to setup migratable 189capability. (refer to Documentation/networking/devlink/devlink-port.rst) 190 191SF state setup 192-------------- 193 194To use the SF, the user must activate the SF using the SF function state 195attribute. 196 197- Get the state of the SF identified by its unique devlink port index:: 198 199 $ devlink port show ens2f0npf0sf88 200 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 201 function: 202 hw_addr 00:00:00:00:88:88 state inactive opstate detached 203 204- Activate the function and verify its state is active:: 205 206 $ devlink port function set ens2f0npf0sf88 state active 207 208 $ devlink port show ens2f0npf0sf88 209 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 210 function: 211 hw_addr 00:00:00:00:88:88 state active opstate detached 212 213Upon function activation, the PF driver instance gets the event from the device 214that a particular SF was activated. It's the cue to put the device on bus, probe 215it and instantiate the devlink instance and class specific auxiliary devices 216for it. 217 218- Show the auxiliary device and port of the subfunction:: 219 220 $ devlink dev show 221 devlink dev show auxiliary/mlx5_core.sf.4 222 223 $ devlink port show auxiliary/mlx5_core.sf.4/1 224 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false 225 226 $ rdma link show mlx5_0/1 227 link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 228 229 $ rdma dev show 230 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 231 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 232 233- Subfunction auxiliary device and class device hierarchy:: 234 235 mlx5_core.sf.4 236 (subfunction auxiliary device) 237 /\ 238 / \ 239 / \ 240 / \ 241 / \ 242 mlx5_core.eth.4 mlx5_core.rdma.4 243 (sf eth aux dev) (sf rdma aux dev) 244 | | 245 | | 246 p0sf88 mlx5_0 247 (sf netdev) (sf rdma device) 248 249Additionally, the SF port also gets the event when the driver attaches to the 250auxiliary device of the subfunction. This results in changing the operational 251state of the function. This provides visibility to the user to decide when is it 252safe to delete the SF port for graceful termination of the subfunction. 253 254- Show the SF port operational state:: 255 256 $ devlink port show ens2f0npf0sf88 257 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 258 function: 259 hw_addr 00:00:00:00:88:88 state active opstate attached 260