1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
2.. include:: <isonum.txt>
3
4=========
5Switchdev
6=========
7
8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
9
10.. _mlx5_bridge_offload:
11
12Bridge offload
13==============
14
15The mlx5 driver implements support for offloading bridge rules when in switchdev
16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
17representor is attached to bridge.
18
19- Change device to switchdev mode::
20
21    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
22
23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
24
25    $ ip link set enp8s0f0 master bridge1
26
27VLANs
28-----
29
30Following bridge VLAN functions are supported by mlx5:
31
32- VLAN filtering (including multiple VLANs per port)::
33
34    $ ip link set bridge1 type bridge vlan_filtering 1
35    $ bridge vlan add dev enp8s0f0 vid 2-3
36
37- VLAN push on bridge ingress::
38
39    $ bridge vlan add dev enp8s0f0 vid 3 pvid
40
41- VLAN pop on bridge egress::
42
43    $ bridge vlan add dev enp8s0f0 vid 3 untagged
44
45Subfunction
46===========
47
48Subfunction which are spawned over the E-switch are created only with devlink
49device, and by default all the SF auxiliary devices are disabled.
50This will allow user to configure the SF before the SF have been fully probed,
51which will save time.
52
53Usage example:
54Create SF:
55$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
56$ devlink port function set pci/0000:08:00.0/32768 \
57               hw_addr 00:00:00:00:00:11 state active
58
59Enable ETH auxiliary device:
60$ devlink dev param set auxiliary/mlx5_core.sf.1 \
61              name enable_eth value true cmode driverinit
62
63Now, in order to fully probe the SF, use devlink reload:
64$ devlink dev reload auxiliary/mlx5_core.sf.1
65
66mlx5 supports ETH,rdma and vdpa (vnet) auxiliary devices devlink params (see :ref:`Documentation/networking/devlink/devlink-params.rst`)
67
68mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
69
70A subfunction has its own function capabilities and its own resources. This
71means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
72queues are neither shared nor stolen from the parent PCI function.
73
74When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
75resources neither shared nor stolen from the parent PCI function.
76
77A subfunction has a dedicated window in PCI BAR space that is not shared
78with the other subfunctions or the parent PCI function. This ensures that all
79devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
80PCI BAR space.
81
82A subfunction supports eswitch representation through which it supports tc
83offloads. The user configures eswitch to send/receive packets from/to
84the subfunction port.
85
86Subfunctions share PCI level resources such as PCI MSI-X IRQs with
87other subfunctions and/or with its parent PCI function.
88
89Example mlx5 software, system, and device view::
90
91       _______
92      | admin |
93      | user  |----------
94      |_______|         |
95          |             |
96      ____|____       __|______            _________________
97     |         |     |         |          |                 |
98     | devlink |     | tc tool |          |    user         |
99     | tool    |     |_________|          | applications    |
100     |_________|         |                |_________________|
101           |             |                   |          |
102           |             |                   |          |         Userspace
103 +---------|-------------|-------------------|----------|--------------------+
104           |             |           +----------+   +----------+   Kernel
105           |             |           |  netdev  |   | rdma dev |
106           |             |           +----------+   +----------+
107   (devlink port add/del |              ^               ^
108    port function set)   |              |               |
109           |             |              +---------------|
110      _____|___          |              |        _______|_______
111     |         |         |              |       | mlx5 class    |
112     | devlink |   +------------+       |       |   drivers     |
113     | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
114     |_________|   +------------+       |       |_______________|
115           |             |              |               ^
116   (devlink ops)         |              |          (probe/remove)
117  _________|________     |              |           ____|________
118 | subfunction      |    |     +---------------+   | subfunction |
119 | management driver|-----     | subfunction   |---|  driver     |
120 | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
121 |__________________|          +---------------+   |_____________|
122           |                                            ^
123  (sf add/del, vhca events)                             |
124           |                                      (device add/del)
125      _____|____                                    ____|________
126     |          |                                  | subfunction |
127     |  PCI NIC |--- activate/deactivate events--->| host driver |
128     |__________|                                  | (mlx5_core) |
129                                                   |_____________|
130
131Subfunction is created using devlink port interface.
132
133- Change device to switchdev mode::
134
135    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
136
137- Add a devlink port of subfunction flavour::
138
139    $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
140    pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
141      function:
142        hw_addr 00:00:00:00:00:00 state inactive opstate detached
143
144- Show a devlink port of the subfunction::
145
146    $ devlink port show pci/0000:06:00.0/32768
147    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
148      function:
149        hw_addr 00:00:00:00:00:00 state inactive opstate detached
150
151- Delete a devlink port of subfunction after use::
152
153    $ devlink port del pci/0000:06:00.0/32768
154
155Function attributes
156===================
157
158The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
159a unified way for SmartNIC and non-SmartNIC.
160
161This is supported only when the eswitch mode is set to switchdev. Port function
162configuration of the PCI VF/SF is supported through devlink eswitch port.
163
164Port function attributes should be set before PCI VF/SF is enumerated by the
165driver.
166
167MAC address setup
168-----------------
169
170mlx5 driver support devlink port function attr mechanism to setup MAC
171address. (refer to Documentation/networking/devlink/devlink-port.rst)
172
173RoCE capability setup
174~~~~~~~~~~~~~~~~~~~~~
175Not all mlx5 PCI devices/SFs require RoCE capability.
176
177When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
178PCI devices/SF.
179
180mlx5 driver support devlink port function attr mechanism to setup RoCE
181capability. (refer to Documentation/networking/devlink/devlink-port.rst)
182
183migratable capability setup
184~~~~~~~~~~~~~~~~~~~~~~~~~~~
185User who wants mlx5 PCI VFs to be able to perform live migration need to
186explicitly enable the VF migratable capability.
187
188mlx5 driver support devlink port function attr mechanism to setup migratable
189capability. (refer to Documentation/networking/devlink/devlink-port.rst)
190
191SF state setup
192--------------
193
194To use the SF, the user must activate the SF using the SF function state
195attribute.
196
197- Get the state of the SF identified by its unique devlink port index::
198
199   $ devlink port show ens2f0npf0sf88
200   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
201     function:
202       hw_addr 00:00:00:00:88:88 state inactive opstate detached
203
204- Activate the function and verify its state is active::
205
206   $ devlink port function set ens2f0npf0sf88 state active
207
208   $ devlink port show ens2f0npf0sf88
209   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
210     function:
211       hw_addr 00:00:00:00:88:88 state active opstate detached
212
213Upon function activation, the PF driver instance gets the event from the device
214that a particular SF was activated. It's the cue to put the device on bus, probe
215it and instantiate the devlink instance and class specific auxiliary devices
216for it.
217
218- Show the auxiliary device and port of the subfunction::
219
220    $ devlink dev show
221    devlink dev show auxiliary/mlx5_core.sf.4
222
223    $ devlink port show auxiliary/mlx5_core.sf.4/1
224    auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
225
226    $ rdma link show mlx5_0/1
227    link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
228
229    $ rdma dev show
230    8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
231    13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
232
233- Subfunction auxiliary device and class device hierarchy::
234
235                 mlx5_core.sf.4
236          (subfunction auxiliary device)
237                       /\
238                      /  \
239                     /    \
240                    /      \
241                   /        \
242      mlx5_core.eth.4     mlx5_core.rdma.4
243     (sf eth aux dev)     (sf rdma aux dev)
244         |                      |
245         |                      |
246      p0sf88                  mlx5_0
247     (sf netdev)          (sf rdma device)
248
249Additionally, the SF port also gets the event when the driver attaches to the
250auxiliary device of the subfunction. This results in changing the operational
251state of the function. This provides visibility to the user to decide when is it
252safe to delete the SF port for graceful termination of the subfunction.
253
254- Show the SF port operational state::
255
256    $ devlink port show ens2f0npf0sf88
257    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
258      function:
259        hw_addr 00:00:00:00:88:88 state active opstate attached
260