1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2.. include:: <isonum.txt> 3 4========= 5Switchdev 6========= 7 8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 10.. _mlx5_bridge_offload: 11 12Bridge offload 13============== 14 15The mlx5 driver implements support for offloading bridge rules when in switchdev 16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev 17representor is attached to bridge. 18 19- Change device to switchdev mode:: 20 21 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 22 23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: 24 25 $ ip link set enp8s0f0 master bridge1 26 27VLANs 28----- 29 30Following bridge VLAN functions are supported by mlx5: 31 32- VLAN filtering (including multiple VLANs per port):: 33 34 $ ip link set bridge1 type bridge vlan_filtering 1 35 $ bridge vlan add dev enp8s0f0 vid 2-3 36 37- VLAN push on bridge ingress:: 38 39 $ bridge vlan add dev enp8s0f0 vid 3 pvid 40 41- VLAN pop on bridge egress:: 42 43 $ bridge vlan add dev enp8s0f0 vid 3 untagged 44 45Subfunction 46=========== 47 48Subfunction which are spawned over the E-switch are created only with devlink 49device, and by default all the SF auxiliary devices are disabled. 50This will allow user to configure the SF before the SF have been fully probed, 51which will save time. 52 53Usage example: 54 55- Create SF:: 56 57 $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 58 $ devlink port function set pci/0000:08:00.0/32768 hw_addr 00:00:00:00:00:11 state active 59 60- Enable ETH auxiliary device:: 61 62 $ devlink dev param set auxiliary/mlx5_core.sf.1 name enable_eth value true cmode driverinit 63 64- Now, in order to fully probe the SF, use devlink reload:: 65 66 $ devlink dev reload auxiliary/mlx5_core.sf.1 67 68mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`). 69 70mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. 71 72A subfunction has its own function capabilities and its own resources. This 73means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These 74queues are neither shared nor stolen from the parent PCI function. 75 76When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA 77resources neither shared nor stolen from the parent PCI function. 78 79A subfunction has a dedicated window in PCI BAR space that is not shared 80with the other subfunctions or the parent PCI function. This ensures that all 81devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned 82PCI BAR space. 83 84A subfunction supports eswitch representation through which it supports tc 85offloads. The user configures eswitch to send/receive packets from/to 86the subfunction port. 87 88Subfunctions share PCI level resources such as PCI MSI-X IRQs with 89other subfunctions and/or with its parent PCI function. 90 91Example mlx5 software, system, and device view:: 92 93 _______ 94 | admin | 95 | user |---------- 96 |_______| | 97 | | 98 ____|____ __|______ _________________ 99 | | | | | | 100 | devlink | | tc tool | | user | 101 | tool | |_________| | applications | 102 |_________| | |_________________| 103 | | | | 104 | | | | Userspace 105 +---------|-------------|-------------------|----------|--------------------+ 106 | | +----------+ +----------+ Kernel 107 | | | netdev | | rdma dev | 108 | | +----------+ +----------+ 109 (devlink port add/del | ^ ^ 110 port function set) | | | 111 | | +---------------| 112 _____|___ | | _______|_______ 113 | | | | | mlx5 class | 114 | devlink | +------------+ | | drivers | 115 | kernel | | rep netdev | | |(mlx5_core,ib) | 116 |_________| +------------+ | |_______________| 117 | | | ^ 118 (devlink ops) | | (probe/remove) 119 _________|________ | | ____|________ 120 | subfunction | | +---------------+ | subfunction | 121 | management driver|----- | subfunction |---| driver | 122 | (mlx5_core) | | auxiliary dev | | (mlx5_core) | 123 |__________________| +---------------+ |_____________| 124 | ^ 125 (sf add/del, vhca events) | 126 | (device add/del) 127 _____|____ ____|________ 128 | | | subfunction | 129 | PCI NIC |--- activate/deactivate events--->| host driver | 130 |__________| | (mlx5_core) | 131 |_____________| 132 133Subfunction is created using devlink port interface. 134 135- Change device to switchdev mode:: 136 137 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 138 139- Add a devlink port of subfunction flavour:: 140 141 $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 142 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 143 function: 144 hw_addr 00:00:00:00:00:00 state inactive opstate detached 145 146- Show a devlink port of the subfunction:: 147 148 $ devlink port show pci/0000:06:00.0/32768 149 pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 150 function: 151 hw_addr 00:00:00:00:00:00 state inactive opstate detached 152 153- Delete a devlink port of subfunction after use:: 154 155 $ devlink port del pci/0000:06:00.0/32768 156 157Function attributes 158=================== 159 160The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in 161a unified way for SmartNIC and non-SmartNIC. 162 163This is supported only when the eswitch mode is set to switchdev. Port function 164configuration of the PCI VF/SF is supported through devlink eswitch port. 165 166Port function attributes should be set before PCI VF/SF is enumerated by the 167driver. 168 169MAC address setup 170----------------- 171 172mlx5 driver support devlink port function attr mechanism to setup MAC 173address. (refer to Documentation/networking/devlink/devlink-port.rst) 174 175RoCE capability setup 176~~~~~~~~~~~~~~~~~~~~~ 177Not all mlx5 PCI devices/SFs require RoCE capability. 178 179When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per 180PCI devices/SF. 181 182mlx5 driver support devlink port function attr mechanism to setup RoCE 183capability. (refer to Documentation/networking/devlink/devlink-port.rst) 184 185migratable capability setup 186~~~~~~~~~~~~~~~~~~~~~~~~~~~ 187User who wants mlx5 PCI VFs to be able to perform live migration need to 188explicitly enable the VF migratable capability. 189 190mlx5 driver support devlink port function attr mechanism to setup migratable 191capability. (refer to Documentation/networking/devlink/devlink-port.rst) 192 193IPsec crypto capability setup 194~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 195User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need 196to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability 197for VFs is supported starting with ConnectX6dx devices and above. When a VF has 198IPsec capability enabled, any IPsec offloading is blocked on the PF. 199 200mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto 201capability. (refer to Documentation/networking/devlink/devlink-port.rst) 202 203IPsec packet capability setup 204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 205User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need 206to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability 207for VFs is supported starting with ConnectX6dx devices and above. When a VF has 208IPsec capability enabled, any IPsec offloading is blocked on the PF. 209 210mlx5 driver support devlink port function attr mechanism to setup ipsec_packet 211capability. (refer to Documentation/networking/devlink/devlink-port.rst) 212 213SF state setup 214-------------- 215 216To use the SF, the user must activate the SF using the SF function state 217attribute. 218 219- Get the state of the SF identified by its unique devlink port index:: 220 221 $ devlink port show ens2f0npf0sf88 222 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 223 function: 224 hw_addr 00:00:00:00:88:88 state inactive opstate detached 225 226- Activate the function and verify its state is active:: 227 228 $ devlink port function set ens2f0npf0sf88 state active 229 230 $ devlink port show ens2f0npf0sf88 231 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 232 function: 233 hw_addr 00:00:00:00:88:88 state active opstate detached 234 235Upon function activation, the PF driver instance gets the event from the device 236that a particular SF was activated. It's the cue to put the device on bus, probe 237it and instantiate the devlink instance and class specific auxiliary devices 238for it. 239 240- Show the auxiliary device and port of the subfunction:: 241 242 $ devlink dev show 243 devlink dev show auxiliary/mlx5_core.sf.4 244 245 $ devlink port show auxiliary/mlx5_core.sf.4/1 246 auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false 247 248 $ rdma link show mlx5_0/1 249 link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 250 251 $ rdma dev show 252 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 253 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 254 255- Subfunction auxiliary device and class device hierarchy:: 256 257 mlx5_core.sf.4 258 (subfunction auxiliary device) 259 /\ 260 / \ 261 / \ 262 / \ 263 / \ 264 mlx5_core.eth.4 mlx5_core.rdma.4 265 (sf eth aux dev) (sf rdma aux dev) 266 | | 267 | | 268 p0sf88 mlx5_0 269 (sf netdev) (sf rdma device) 270 271Additionally, the SF port also gets the event when the driver attaches to the 272auxiliary device of the subfunction. This results in changing the operational 273state of the function. This provides visibility to the user to decide when is it 274safe to delete the SF port for graceful termination of the subfunction. 275 276- Show the SF port operational state:: 277 278 $ devlink port show ens2f0npf0sf88 279 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 280 function: 281 hw_addr 00:00:00:00:88:88 state active opstate attached 282