1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
2.. include:: <isonum.txt>
3
4=========
5Switchdev
6=========
7
8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
9
10.. _mlx5_bridge_offload:
11
12Bridge offload
13==============
14
15The mlx5 driver implements support for offloading bridge rules when in switchdev
16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
17representor is attached to bridge.
18
19- Change device to switchdev mode::
20
21    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
22
23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
24
25    $ ip link set enp8s0f0 master bridge1
26
27VLANs
28-----
29
30Following bridge VLAN functions are supported by mlx5:
31
32- VLAN filtering (including multiple VLANs per port)::
33
34    $ ip link set bridge1 type bridge vlan_filtering 1
35    $ bridge vlan add dev enp8s0f0 vid 2-3
36
37- VLAN push on bridge ingress::
38
39    $ bridge vlan add dev enp8s0f0 vid 3 pvid
40
41- VLAN pop on bridge egress::
42
43    $ bridge vlan add dev enp8s0f0 vid 3 untagged
44
45Subfunction
46===========
47
48Subfunction which are spawned over the E-switch are created only with devlink
49device, and by default all the SF auxiliary devices are disabled.
50This will allow user to configure the SF before the SF have been fully probed,
51which will save time.
52
53Usage example:
54
55- Create SF::
56
57    $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
58    $ devlink port function set pci/0000:08:00.0/32768 hw_addr 00:00:00:00:00:11 state active
59
60- Enable ETH auxiliary device::
61
62    $ devlink dev param set auxiliary/mlx5_core.sf.1 name enable_eth value true cmode driverinit
63
64- Now, in order to fully probe the SF, use devlink reload::
65
66    $ devlink dev reload auxiliary/mlx5_core.sf.1
67
68mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`).
69
70mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
71
72A subfunction has its own function capabilities and its own resources. This
73means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
74queues are neither shared nor stolen from the parent PCI function.
75
76When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
77resources neither shared nor stolen from the parent PCI function.
78
79A subfunction has a dedicated window in PCI BAR space that is not shared
80with the other subfunctions or the parent PCI function. This ensures that all
81devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
82PCI BAR space.
83
84A subfunction supports eswitch representation through which it supports tc
85offloads. The user configures eswitch to send/receive packets from/to
86the subfunction port.
87
88Subfunctions share PCI level resources such as PCI MSI-X IRQs with
89other subfunctions and/or with its parent PCI function.
90
91Example mlx5 software, system, and device view::
92
93       _______
94      | admin |
95      | user  |----------
96      |_______|         |
97          |             |
98      ____|____       __|______            _________________
99     |         |     |         |          |                 |
100     | devlink |     | tc tool |          |    user         |
101     | tool    |     |_________|          | applications    |
102     |_________|         |                |_________________|
103           |             |                   |          |
104           |             |                   |          |         Userspace
105 +---------|-------------|-------------------|----------|--------------------+
106           |             |           +----------+   +----------+   Kernel
107           |             |           |  netdev  |   | rdma dev |
108           |             |           +----------+   +----------+
109   (devlink port add/del |              ^               ^
110    port function set)   |              |               |
111           |             |              +---------------|
112      _____|___          |              |        _______|_______
113     |         |         |              |       | mlx5 class    |
114     | devlink |   +------------+       |       |   drivers     |
115     | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
116     |_________|   +------------+       |       |_______________|
117           |             |              |               ^
118   (devlink ops)         |              |          (probe/remove)
119  _________|________     |              |           ____|________
120 | subfunction      |    |     +---------------+   | subfunction |
121 | management driver|-----     | subfunction   |---|  driver     |
122 | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
123 |__________________|          +---------------+   |_____________|
124           |                                            ^
125  (sf add/del, vhca events)                             |
126           |                                      (device add/del)
127      _____|____                                    ____|________
128     |          |                                  | subfunction |
129     |  PCI NIC |--- activate/deactivate events--->| host driver |
130     |__________|                                  | (mlx5_core) |
131                                                   |_____________|
132
133Subfunction is created using devlink port interface.
134
135- Change device to switchdev mode::
136
137    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
138
139- Add a devlink port of subfunction flavour::
140
141    $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
142    pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
143      function:
144        hw_addr 00:00:00:00:00:00 state inactive opstate detached
145
146- Show a devlink port of the subfunction::
147
148    $ devlink port show pci/0000:06:00.0/32768
149    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
150      function:
151        hw_addr 00:00:00:00:00:00 state inactive opstate detached
152
153- Delete a devlink port of subfunction after use::
154
155    $ devlink port del pci/0000:06:00.0/32768
156
157Function attributes
158===================
159
160The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
161a unified way for SmartNIC and non-SmartNIC.
162
163This is supported only when the eswitch mode is set to switchdev. Port function
164configuration of the PCI VF/SF is supported through devlink eswitch port.
165
166Port function attributes should be set before PCI VF/SF is enumerated by the
167driver.
168
169MAC address setup
170-----------------
171
172mlx5 driver support devlink port function attr mechanism to setup MAC
173address. (refer to Documentation/networking/devlink/devlink-port.rst)
174
175RoCE capability setup
176~~~~~~~~~~~~~~~~~~~~~
177Not all mlx5 PCI devices/SFs require RoCE capability.
178
179When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
180PCI devices/SF.
181
182mlx5 driver support devlink port function attr mechanism to setup RoCE
183capability. (refer to Documentation/networking/devlink/devlink-port.rst)
184
185migratable capability setup
186~~~~~~~~~~~~~~~~~~~~~~~~~~~
187User who wants mlx5 PCI VFs to be able to perform live migration need to
188explicitly enable the VF migratable capability.
189
190mlx5 driver support devlink port function attr mechanism to setup migratable
191capability. (refer to Documentation/networking/devlink/devlink-port.rst)
192
193IPsec crypto capability setup
194~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
195User who wants mlx5 PCI VFs to be able to perform IPsec crypto offloading need
196to explicitly enable the VF ipsec_crypto capability. Enabling IPsec capability
197for VFs is supported starting with ConnectX6dx devices and above. When a VF has
198IPsec capability enabled, any IPsec offloading is blocked on the PF.
199
200mlx5 driver support devlink port function attr mechanism to setup ipsec_crypto
201capability. (refer to Documentation/networking/devlink/devlink-port.rst)
202
203IPsec packet capability setup
204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
205User who wants mlx5 PCI VFs to be able to perform IPsec packet offloading need
206to explicitly enable the VF ipsec_packet capability. Enabling IPsec capability
207for VFs is supported starting with ConnectX6dx devices and above. When a VF has
208IPsec capability enabled, any IPsec offloading is blocked on the PF.
209
210mlx5 driver support devlink port function attr mechanism to setup ipsec_packet
211capability. (refer to Documentation/networking/devlink/devlink-port.rst)
212
213SF state setup
214--------------
215
216To use the SF, the user must activate the SF using the SF function state
217attribute.
218
219- Get the state of the SF identified by its unique devlink port index::
220
221   $ devlink port show ens2f0npf0sf88
222   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
223     function:
224       hw_addr 00:00:00:00:88:88 state inactive opstate detached
225
226- Activate the function and verify its state is active::
227
228   $ devlink port function set ens2f0npf0sf88 state active
229
230   $ devlink port show ens2f0npf0sf88
231   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
232     function:
233       hw_addr 00:00:00:00:88:88 state active opstate detached
234
235Upon function activation, the PF driver instance gets the event from the device
236that a particular SF was activated. It's the cue to put the device on bus, probe
237it and instantiate the devlink instance and class specific auxiliary devices
238for it.
239
240- Show the auxiliary device and port of the subfunction::
241
242    $ devlink dev show
243    devlink dev show auxiliary/mlx5_core.sf.4
244
245    $ devlink port show auxiliary/mlx5_core.sf.4/1
246    auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
247
248    $ rdma link show mlx5_0/1
249    link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
250
251    $ rdma dev show
252    8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
253    13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
254
255- Subfunction auxiliary device and class device hierarchy::
256
257                 mlx5_core.sf.4
258          (subfunction auxiliary device)
259                       /\
260                      /  \
261                     /    \
262                    /      \
263                   /        \
264      mlx5_core.eth.4     mlx5_core.rdma.4
265     (sf eth aux dev)     (sf rdma aux dev)
266         |                      |
267         |                      |
268      p0sf88                  mlx5_0
269     (sf netdev)          (sf rdma device)
270
271Additionally, the SF port also gets the event when the driver attaches to the
272auxiliary device of the subfunction. This results in changing the operational
273state of the function. This provides visibility to the user to decide when is it
274safe to delete the SF port for graceful termination of the subfunction.
275
276- Show the SF port operational state::
277
278    $ devlink port show ens2f0npf0sf88
279    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
280      function:
281        hw_addr 00:00:00:00:88:88 state active opstate attached
282