xref: /openbmc/linux/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst (revision 8a649e33f48e08be20c51541d9184645892ec370)
1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
2.. include:: <isonum.txt>
3
4=========
5Switchdev
6=========
7
8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
9
10.. _mlx5_bridge_offload:
11
12Bridge offload
13==============
14
15The mlx5 driver implements support for offloading bridge rules when in switchdev
16mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev
17representor is attached to bridge.
18
19- Change device to switchdev mode::
20
21    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
22
23- Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1'::
24
25    $ ip link set enp8s0f0 master bridge1
26
27VLANs
28-----
29
30Following bridge VLAN functions are supported by mlx5:
31
32- VLAN filtering (including multiple VLANs per port)::
33
34    $ ip link set bridge1 type bridge vlan_filtering 1
35    $ bridge vlan add dev enp8s0f0 vid 2-3
36
37- VLAN push on bridge ingress::
38
39    $ bridge vlan add dev enp8s0f0 vid 3 pvid
40
41- VLAN pop on bridge egress::
42
43    $ bridge vlan add dev enp8s0f0 vid 3 untagged
44
45Subfunction
46===========
47
48mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
49
50A subfunction has its own function capabilities and its own resources. This
51means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These
52queues are neither shared nor stolen from the parent PCI function.
53
54When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA
55resources neither shared nor stolen from the parent PCI function.
56
57A subfunction has a dedicated window in PCI BAR space that is not shared
58with the other subfunctions or the parent PCI function. This ensures that all
59devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned
60PCI BAR space.
61
62A subfunction supports eswitch representation through which it supports tc
63offloads. The user configures eswitch to send/receive packets from/to
64the subfunction port.
65
66Subfunctions share PCI level resources such as PCI MSI-X IRQs with
67other subfunctions and/or with its parent PCI function.
68
69Example mlx5 software, system, and device view::
70
71       _______
72      | admin |
73      | user  |----------
74      |_______|         |
75          |             |
76      ____|____       __|______            _________________
77     |         |     |         |          |                 |
78     | devlink |     | tc tool |          |    user         |
79     | tool    |     |_________|          | applications    |
80     |_________|         |                |_________________|
81           |             |                   |          |
82           |             |                   |          |         Userspace
83 +---------|-------------|-------------------|----------|--------------------+
84           |             |           +----------+   +----------+   Kernel
85           |             |           |  netdev  |   | rdma dev |
86           |             |           +----------+   +----------+
87   (devlink port add/del |              ^               ^
88    port function set)   |              |               |
89           |             |              +---------------|
90      _____|___          |              |        _______|_______
91     |         |         |              |       | mlx5 class    |
92     | devlink |   +------------+       |       |   drivers     |
93     | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
94     |_________|   +------------+       |       |_______________|
95           |             |              |               ^
96   (devlink ops)         |              |          (probe/remove)
97  _________|________     |              |           ____|________
98 | subfunction      |    |     +---------------+   | subfunction |
99 | management driver|-----     | subfunction   |---|  driver     |
100 | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
101 |__________________|          +---------------+   |_____________|
102           |                                            ^
103  (sf add/del, vhca events)                             |
104           |                                      (device add/del)
105      _____|____                                    ____|________
106     |          |                                  | subfunction |
107     |  PCI NIC |--- activate/deactivate events--->| host driver |
108     |__________|                                  | (mlx5_core) |
109                                                   |_____________|
110
111Subfunction is created using devlink port interface.
112
113- Change device to switchdev mode::
114
115    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
116
117- Add a devlink port of subfunction flavour::
118
119    $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
120    pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
121      function:
122        hw_addr 00:00:00:00:00:00 state inactive opstate detached
123
124- Show a devlink port of the subfunction::
125
126    $ devlink port show pci/0000:06:00.0/32768
127    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
128      function:
129        hw_addr 00:00:00:00:00:00 state inactive opstate detached
130
131- Delete a devlink port of subfunction after use::
132
133    $ devlink port del pci/0000:06:00.0/32768
134
135Function attributes
136===================
137
138The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in
139a unified way for SmartNIC and non-SmartNIC.
140
141This is supported only when the eswitch mode is set to switchdev. Port function
142configuration of the PCI VF/SF is supported through devlink eswitch port.
143
144Port function attributes should be set before PCI VF/SF is enumerated by the
145driver.
146
147MAC address setup
148-----------------
149
150mlx5 driver support devlink port function attr mechanism to setup MAC
151address. (refer to Documentation/networking/devlink/devlink-port.rst)
152
153RoCE capability setup
154~~~~~~~~~~~~~~~~~~~~~
155Not all mlx5 PCI devices/SFs require RoCE capability.
156
157When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
158PCI devices/SF.
159
160mlx5 driver support devlink port function attr mechanism to setup RoCE
161capability. (refer to Documentation/networking/devlink/devlink-port.rst)
162
163migratable capability setup
164~~~~~~~~~~~~~~~~~~~~~~~~~~~
165User who wants mlx5 PCI VFs to be able to perform live migration need to
166explicitly enable the VF migratable capability.
167
168mlx5 driver support devlink port function attr mechanism to setup migratable
169capability. (refer to Documentation/networking/devlink/devlink-port.rst)
170
171SF state setup
172--------------
173
174To use the SF, the user must activate the SF using the SF function state
175attribute.
176
177- Get the state of the SF identified by its unique devlink port index::
178
179   $ devlink port show ens2f0npf0sf88
180   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
181     function:
182       hw_addr 00:00:00:00:88:88 state inactive opstate detached
183
184- Activate the function and verify its state is active::
185
186   $ devlink port function set ens2f0npf0sf88 state active
187
188   $ devlink port show ens2f0npf0sf88
189   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
190     function:
191       hw_addr 00:00:00:00:88:88 state active opstate detached
192
193Upon function activation, the PF driver instance gets the event from the device
194that a particular SF was activated. It's the cue to put the device on bus, probe
195it and instantiate the devlink instance and class specific auxiliary devices
196for it.
197
198- Show the auxiliary device and port of the subfunction::
199
200    $ devlink dev show
201    devlink dev show auxiliary/mlx5_core.sf.4
202
203    $ devlink port show auxiliary/mlx5_core.sf.4/1
204    auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
205
206    $ rdma link show mlx5_0/1
207    link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
208
209    $ rdma dev show
210    8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
211    13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
212
213- Subfunction auxiliary device and class device hierarchy::
214
215                 mlx5_core.sf.4
216          (subfunction auxiliary device)
217                       /\
218                      /  \
219                     /    \
220                    /      \
221                   /        \
222      mlx5_core.eth.4     mlx5_core.rdma.4
223     (sf eth aux dev)     (sf rdma aux dev)
224         |                      |
225         |                      |
226      p0sf88                  mlx5_0
227     (sf netdev)          (sf rdma device)
228
229Additionally, the SF port also gets the event when the driver attaches to the
230auxiliary device of the subfunction. This results in changing the operational
231state of the function. This provides visibility to the user to decide when is it
232safe to delete the SF port for graceful termination of the subfunction.
233
234- Show the SF port operational state::
235
236    $ devlink port show ens2f0npf0sf88
237    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
238      function:
239        hw_addr 00:00:00:00:88:88 state active opstate attached
240