xref: /openbmc/linux/Documentation/networking/device_drivers/ethernet/intel/iavf.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1132db935SJakub Kicinski.. SPDX-License-Identifier: GPL-2.0+
2132db935SJakub Kicinski
3132db935SJakub Kicinski=================================================================
4132db935SJakub KicinskiLinux Base Driver for Intel(R) Ethernet Adaptive Virtual Function
5132db935SJakub Kicinski=================================================================
6132db935SJakub Kicinski
7132db935SJakub KicinskiIntel Ethernet Adaptive Virtual Function Linux driver.
8132db935SJakub KicinskiCopyright(c) 2013-2018 Intel Corporation.
9132db935SJakub Kicinski
10132db935SJakub KicinskiContents
11132db935SJakub Kicinski========
12132db935SJakub Kicinski
13132db935SJakub Kicinski- Overview
14132db935SJakub Kicinski- Identifying Your Adapter
15132db935SJakub Kicinski- Additional Configurations
16132db935SJakub Kicinski- Known Issues/Troubleshooting
17132db935SJakub Kicinski- Support
18132db935SJakub Kicinski
19132db935SJakub KicinskiOverview
20132db935SJakub Kicinski========
21132db935SJakub Kicinski
22132db935SJakub KicinskiThis file describes the iavf Linux Base Driver. This driver was formerly
23132db935SJakub Kicinskicalled i40evf.
24132db935SJakub Kicinski
25132db935SJakub KicinskiThe iavf driver supports the below mentioned virtual function devices and
26132db935SJakub Kicinskican only be activated on kernels running the i40e or newer Physical Function
27132db935SJakub Kicinski(PF) driver compiled with CONFIG_PCI_IOV.  The iavf driver requires
28132db935SJakub KicinskiCONFIG_PCI_MSI to be enabled.
29132db935SJakub Kicinski
30132db935SJakub KicinskiThe guest OS loading the iavf driver must support MSI-X interrupts.
31132db935SJakub Kicinski
32132db935SJakub KicinskiIdentifying Your Adapter
33132db935SJakub Kicinski========================
34132db935SJakub Kicinski
35132db935SJakub KicinskiThe driver in this kernel is compatible with devices based on the following:
36132db935SJakub Kicinski * Intel(R) XL710 X710 Virtual Function
37132db935SJakub Kicinski * Intel(R) X722 Virtual Function
38132db935SJakub Kicinski * Intel(R) XXV710 Virtual Function
39132db935SJakub Kicinski * Intel(R) Ethernet Adaptive Virtual Function
40132db935SJakub Kicinski
41132db935SJakub KicinskiFor the best performance, make sure the latest NVM/FW is installed on your
42132db935SJakub Kicinskidevice.
43132db935SJakub Kicinski
44132db935SJakub KicinskiFor information on how to identify your adapter, and for the latest NVM/FW
45132db935SJakub Kicinskiimages and Intel network drivers, refer to the Intel Support website:
4609a071f5SAlexander A. Klimovhttps://www.intel.com/support
47132db935SJakub Kicinski
48132db935SJakub Kicinski
49132db935SJakub KicinskiAdditional Features and Configurations
50132db935SJakub Kicinski======================================
51132db935SJakub Kicinski
52132db935SJakub KicinskiViewing Link Messages
53132db935SJakub Kicinski---------------------
54132db935SJakub KicinskiLink messages will not be displayed to the console if the distribution is
55132db935SJakub Kicinskirestricting system messages. In order to see network driver link messages on
56132db935SJakub Kicinskiyour console, set dmesg to eight by entering the following::
57132db935SJakub Kicinski
58132db935SJakub Kicinski    # dmesg -n 8
59132db935SJakub Kicinski
60132db935SJakub KicinskiNOTE:
61132db935SJakub Kicinski  This setting is not saved across reboots.
62132db935SJakub Kicinski
63132db935SJakub Kicinskiethtool
64132db935SJakub Kicinski-------
65132db935SJakub KicinskiThe driver utilizes the ethtool interface for driver configuration and
66132db935SJakub Kicinskidiagnostics, as well as displaying statistical information. The latest ethtool
67132db935SJakub Kicinskiversion is required for this functionality. Download it at:
68132db935SJakub Kicinskihttps://www.kernel.org/pub/software/network/ethtool/
69132db935SJakub Kicinski
70132db935SJakub KicinskiSetting VLAN Tag Stripping
71132db935SJakub Kicinski--------------------------
72132db935SJakub KicinskiIf you have applications that require Virtual Functions (VFs) to receive
73132db935SJakub Kicinskipackets with VLAN tags, you can disable VLAN tag stripping for the VF. The
74132db935SJakub KicinskiPhysical Function (PF) processes requests issued from the VF to enable or
75132db935SJakub Kicinskidisable VLAN tag stripping. Note that if the PF has assigned a VLAN to a VF,
76132db935SJakub Kicinskithen requests from that VF to set VLAN tag stripping will be ignored.
77132db935SJakub Kicinski
78132db935SJakub KicinskiTo enable/disable VLAN tag stripping for a VF, issue the following command
79132db935SJakub Kicinskifrom inside the VM in which you are running the VF::
80132db935SJakub Kicinski
81132db935SJakub Kicinski    # ethtool -K <if_name> rxvlan on/off
82132db935SJakub Kicinski
83132db935SJakub Kicinskior alternatively::
84132db935SJakub Kicinski
85132db935SJakub Kicinski    # ethtool --offload <if_name> rxvlan on/off
86132db935SJakub Kicinski
87132db935SJakub KicinskiAdaptive Virtual Function
88132db935SJakub Kicinski-------------------------
89132db935SJakub KicinskiAdaptive Virtual Function (AVF) allows the virtual function driver, or VF, to
90132db935SJakub Kicinskiadapt to changing feature sets of the physical function driver (PF) with which
91132db935SJakub Kicinskiit is associated. This allows system administrators to update a PF without
92132db935SJakub Kicinskihaving to update all the VFs associated with it. All AVFs have a single common
93132db935SJakub Kicinskidevice ID and branding string.
94132db935SJakub Kicinski
95132db935SJakub KicinskiAVFs have a minimum set of features known as "base mode," but may provide
96132db935SJakub Kicinskiadditional features depending on what features are available in the PF with
97132db935SJakub Kicinskiwhich the AVF is associated. The following are base mode features:
98132db935SJakub Kicinski
99132db935SJakub Kicinski- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
100132db935SJakub Kicinski  for Tx/Rx
101132db935SJakub Kicinski- i40e descriptors and ring format
102132db935SJakub Kicinski- Descriptor write-back completion
103132db935SJakub Kicinski- 1 control queue, with i40e descriptors, CSRs and ring format
104132db935SJakub Kicinski- 5 MSI-X interrupt vectors and corresponding i40e CSRs
105132db935SJakub Kicinski- 1 Interrupt Throttle Rate (ITR) index
106132db935SJakub Kicinski- 1 Virtual Station Interface (VSI) per VF
107132db935SJakub Kicinski- 1 Traffic Class (TC), TC0
108132db935SJakub Kicinski- Receive Side Scaling (RSS) with 64 entry indirection table and key,
109132db935SJakub Kicinski  configured through the PF
110132db935SJakub Kicinski- 1 unicast MAC address reserved per VF
111132db935SJakub Kicinski- 16 MAC address filters for each VF
112132db935SJakub Kicinski- Stateless offloads - non-tunneled checksums
113132db935SJakub Kicinski- AVF device ID
114132db935SJakub Kicinski- HW mailbox is used for VF to PF communications (including on Windows)
115132db935SJakub Kicinski
116729979ebSMauro Carvalho ChehabIEEE 802.1ad (QinQ) Support
117132db935SJakub Kicinski---------------------------
118132db935SJakub KicinskiThe IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
119132db935SJakub KicinskiIDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
120132db935SJakub Kicinski"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
121132db935SJakub Kicinskiallow L2 tunneling and the ability to segregate traffic within a particular
122132db935SJakub KicinskiVLAN ID, among other uses.
123132db935SJakub Kicinski
124132db935SJakub KicinskiThe following are examples of how to configure 802.1ad (QinQ)::
125132db935SJakub Kicinski
126132db935SJakub Kicinski    # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
127132db935SJakub Kicinski    # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
128132db935SJakub Kicinski
129132db935SJakub KicinskiWhere "24" and "371" are example VLAN IDs.
130132db935SJakub Kicinski
131132db935SJakub KicinskiNOTES:
132132db935SJakub Kicinski  Receive checksum offloads, cloud filters, and VLAN acceleration are not
133132db935SJakub Kicinski  supported for 802.1ad (QinQ) packets.
134132db935SJakub Kicinski
135132db935SJakub KicinskiApplication Device Queues (ADq)
136132db935SJakub Kicinski-------------------------------
137132db935SJakub KicinskiApplication Device Queues (ADq) allows you to dedicate one or more queues to a
138132db935SJakub Kicinskispecific application. This can reduce latency for the specified application,
139132db935SJakub Kicinskiand allow Tx traffic to be rate limited per application. Follow the steps below
140132db935SJakub Kicinskito set ADq.
141132db935SJakub Kicinski
142132db935SJakub KicinskiRequirements:
143132db935SJakub Kicinski
144132db935SJakub Kicinski- The sch_mqprio, act_mirred and cls_flower modules must be loaded
145132db935SJakub Kicinski- The latest version of iproute2
146132db935SJakub Kicinski- If another driver (for example, DPDK) has set cloud filters, you cannot
147132db935SJakub Kicinski  enable ADQ
148132db935SJakub Kicinski- Depending on the underlying PF device, ADQ cannot be enabled when the
149132db935SJakub Kicinski  following features are enabled:
150132db935SJakub Kicinski
151132db935SJakub Kicinski  + Data Center Bridging (DCB)
152132db935SJakub Kicinski  + Multiple Functions per Port (MFP)
153132db935SJakub Kicinski  + Sideband Filters
154132db935SJakub Kicinski
155132db935SJakub Kicinski1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
156132db935SJakub KicinskiThe shaper bw_rlimit parameter is optional.
157132db935SJakub Kicinski
158132db935SJakub KicinskiExample: Sets up two tcs, tc0 and tc1, with 16 queues each and max tx rate set
159132db935SJakub Kicinskito 1Gbit for tc0 and 3Gbit for tc1.
160132db935SJakub Kicinski
161132db935SJakub Kicinski::
162132db935SJakub Kicinski
163132db935SJakub Kicinski    tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
164132db935SJakub Kicinski    queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
165132db935SJakub Kicinski    max_rate 1Gbit 3Gbit
166132db935SJakub Kicinski
167132db935SJakub Kicinskimap: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
168132db935SJakub Kicinskisets priorities 0-3 to use tc0 and 4-7 to use tc1)
169132db935SJakub Kicinski
170132db935SJakub Kicinskiqueues: for each tc, <num queues>@<offset> (e.g. queues 16@0 16@16 assigns
171132db935SJakub Kicinski16 queues to tc0 at offset 0 and 16 queues to tc1 at offset 16. Max total
172132db935SJakub Kicinskinumber of queues for all tcs is 64 or number of cores, whichever is lower.)
173132db935SJakub Kicinski
174132db935SJakub Kicinskihw 1 mode channel: ‘channel’ with ‘hw’ set to 1 is a new new hardware
175132db935SJakub Kicinskioffload mode in mqprio that makes full use of the mqprio options, the
176132db935SJakub KicinskiTCs, the queue configurations, and the QoS parameters.
177132db935SJakub Kicinski
178132db935SJakub Kicinskishaper bw_rlimit: for each tc, sets minimum and maximum bandwidth rates.
179132db935SJakub KicinskiTotals must be equal or less than port speed.
180132db935SJakub Kicinski
181132db935SJakub KicinskiFor example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
1825e716ec6SMauro Carvalho Chehabmonitoring tools such as ``ifstat`` or ``sar -n DEV [interval] [number of samples]``
183132db935SJakub Kicinski
184132db935SJakub KicinskiNOTE:
185132db935SJakub Kicinski  Setting up channels via ethtool (ethtool -L) is not supported when the
186132db935SJakub Kicinski  TCs are configured using mqprio.
187132db935SJakub Kicinski
188132db935SJakub Kicinski2. Enable HW TC offload on interface::
189132db935SJakub Kicinski
190132db935SJakub Kicinski    # ethtool -K <interface> hw-tc-offload on
191132db935SJakub Kicinski
192132db935SJakub Kicinski3. Apply TCs to ingress (RX) flow of interface::
193132db935SJakub Kicinski
194132db935SJakub Kicinski    # tc qdisc add dev <interface> ingress
195132db935SJakub Kicinski
196132db935SJakub KicinskiNOTES:
197132db935SJakub Kicinski - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
198132db935SJakub Kicinski - ADq is not compatible with cloud filters
199132db935SJakub Kicinski - Setting up channels via ethtool (ethtool -L) is not supported when the TCs
200132db935SJakub Kicinski   are configured using mqprio
201132db935SJakub Kicinski - You must have iproute2 latest version
202132db935SJakub Kicinski - NVM version 6.01 or later is required
203132db935SJakub Kicinski - ADq cannot be enabled when any the following features are enabled: Data
204132db935SJakub Kicinski   Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
205132db935SJakub Kicinski - If another driver (for example, DPDK) has set cloud filters, you cannot
206132db935SJakub Kicinski   enable ADq
207132db935SJakub Kicinski - Tunnel filters are not supported in ADq. If encapsulated packets do arrive
208132db935SJakub Kicinski   in non-tunnel mode, filtering will be done on the inner headers.  For example,
209132db935SJakub Kicinski   for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
210132db935SJakub Kicinski   encapsulated packet, outer headers are ignored. Therefore, inner headers are
211132db935SJakub Kicinski   matched.
212132db935SJakub Kicinski - If a TC filter on a PF matches traffic over a VF (on the PF), that traffic
213132db935SJakub Kicinski   will be routed to the appropriate queue of the PF, and will not be passed on
214132db935SJakub Kicinski   the VF. Such traffic will end up getting dropped higher up in the TCP/IP
215132db935SJakub Kicinski   stack as it does not match PF address data.
216132db935SJakub Kicinski - If traffic matches multiple TC filters that point to different TCs, that
217132db935SJakub Kicinski   traffic will be duplicated and sent to all matching TC queues.  The hardware
218132db935SJakub Kicinski   switch mirrors the packet to a VSI list when multiple filters are matched.
219132db935SJakub Kicinski
220132db935SJakub Kicinski
221132db935SJakub KicinskiKnown Issues/Troubleshooting
222132db935SJakub Kicinski============================
223132db935SJakub Kicinski
224132db935SJakub KicinskiBonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
225132db935SJakub Kicinski---------------------------------------------------------------------------------
226132db935SJakub KicinskiIf you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
227132db935SJakub Kicinskiseries based device, the VF slaves may fail when they become the active slave.
228132db935SJakub KicinskiIf the MAC address of the VF is set by the PF (Physical Function) of the
229132db935SJakub Kicinskidevice, when you add a slave, or change the active-backup slave, Linux bonding
230132db935SJakub Kicinskitries to sync the backup slave's MAC address to the same MAC address as the
231132db935SJakub Kicinskiactive slave. Linux bonding will fail at this point. This issue will not occur
232132db935SJakub Kicinskiif the VF's MAC address is not set by the PF.
233132db935SJakub Kicinski
234132db935SJakub KicinskiTraffic Is Not Being Passed Between VM and Client
235132db935SJakub Kicinski-------------------------------------------------
236132db935SJakub KicinskiYou may not be able to pass traffic between a client system and a
237132db935SJakub KicinskiVirtual Machine (VM) running on a separate host if the Virtual Function
238132db935SJakub Kicinski(VF, or Virtual NIC) is not in trusted mode and spoof checking is enabled
239132db935SJakub Kicinskion the VF. Note that this situation can occur in any combination of client,
240132db935SJakub Kicinskihost, and guest operating system. For information on how to set the VF to
241132db935SJakub Kicinskitrusted mode, refer to the section "VLAN Tag Packet Steering" in this
242132db935SJakub Kicinskireadme document. For information on setting spoof checking, refer to the
243132db935SJakub Kicinskisection "MAC and VLAN anti-spoofing feature" in this readme document.
244132db935SJakub Kicinski
245132db935SJakub KicinskiDo not unload port driver if VF with active VM is bound to it
246132db935SJakub Kicinski-------------------------------------------------------------
247132db935SJakub KicinskiDo not unload a port's driver if a Virtual Function (VF) with an active Virtual
248132db935SJakub KicinskiMachine (VM) is bound to it. Doing so will cause the port to appear to hang.
249132db935SJakub KicinskiOnce the VM shuts down, or otherwise releases the VF, the command will complete.
250132db935SJakub Kicinski
251132db935SJakub KicinskiUsing four traffic classes fails
252132db935SJakub Kicinski--------------------------------
253132db935SJakub KicinskiDo not try to reserve more than three traffic classes in the iavf driver. Doing
254132db935SJakub Kicinskiso will fail to set any traffic classes and will cause the driver to write
255132db935SJakub Kicinskierrors to stdout. Use a maximum of three queues to avoid this issue.
256132db935SJakub Kicinski
257132db935SJakub KicinskiMultiple log error messages on iavf driver removal
258132db935SJakub Kicinski--------------------------------------------------
259132db935SJakub KicinskiIf you have several VFs and you remove the iavf driver, several instances of
260132db935SJakub Kicinskithe following log errors are written to the log::
261132db935SJakub Kicinski
262132db935SJakub Kicinski    Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
263132db935SJakub Kicinski    Unable to send the message to VF 2 aq_err 12
264132db935SJakub Kicinski    ARQ Overflow Error detected
265132db935SJakub Kicinski
266132db935SJakub KicinskiVirtual machine does not get link
267132db935SJakub Kicinski---------------------------------
268132db935SJakub KicinskiIf the virtual machine has more than one virtual port assigned to it, and those
269132db935SJakub Kicinskivirtual ports are bound to different physical ports, you may not get link on
270132db935SJakub Kicinskiall of the virtual ports. The following command may work around the issue::
271132db935SJakub Kicinski
272132db935SJakub Kicinski    # ethtool -r <PF>
273132db935SJakub Kicinski
274132db935SJakub KicinskiWhere <PF> is the PF interface in the host, for example: p5p1. You may need to
275132db935SJakub Kicinskirun the command more than once to get link on all virtual ports.
276132db935SJakub Kicinski
277132db935SJakub KicinskiMAC address of Virtual Function changes unexpectedly
278132db935SJakub Kicinski----------------------------------------------------
279132db935SJakub KicinskiIf a Virtual Function's MAC address is not assigned in the host, then the VF
280132db935SJakub Kicinski(virtual function) driver will use a random MAC address. This random MAC
281132db935SJakub Kicinskiaddress may change each time the VF driver is reloaded. You can assign a static
282132db935SJakub KicinskiMAC address in the host machine. This static MAC address will survive
283132db935SJakub Kicinskia VF driver reload.
284132db935SJakub Kicinski
285132db935SJakub KicinskiDriver Buffer Overflow Fix
286132db935SJakub Kicinski--------------------------
287132db935SJakub KicinskiThe fix to resolve CVE-2016-8105, referenced in Intel SA-00069
288132db935SJakub Kicinskihttps://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00069.html
289132db935SJakub Kicinskiis included in this and future versions of the driver.
290132db935SJakub Kicinski
291132db935SJakub KicinskiMultiple Interfaces on Same Ethernet Broadcast Network
292132db935SJakub Kicinski------------------------------------------------------
293132db935SJakub KicinskiDue to the default ARP behavior on Linux, it is not possible to have one system
294132db935SJakub Kicinskion two IP networks in the same Ethernet broadcast domain (non-partitioned
295132db935SJakub Kicinskiswitch) behave as expected. All Ethernet interfaces will respond to IP traffic
296132db935SJakub Kicinskifor any IP address assigned to the system. This results in unbalanced receive
297132db935SJakub Kicinskitraffic.
298132db935SJakub Kicinski
299132db935SJakub KicinskiIf you have multiple interfaces in a server, either turn on ARP filtering by
300132db935SJakub Kicinskientering::
301132db935SJakub Kicinski
302132db935SJakub Kicinski    # echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
303132db935SJakub Kicinski
304132db935SJakub KicinskiNOTE:
305132db935SJakub Kicinski  This setting is not saved across reboots. The configuration change can be
306132db935SJakub Kicinski  made permanent by adding the following line to the file /etc/sysctl.conf::
307132db935SJakub Kicinski
308132db935SJakub Kicinski    net.ipv4.conf.all.arp_filter = 1
309132db935SJakub Kicinski
310132db935SJakub KicinskiAnother alternative is to install the interfaces in separate broadcast domains
311132db935SJakub Kicinski(either in different switches or in a switch partitioned to VLANs).
312132db935SJakub Kicinski
313132db935SJakub KicinskiRx Page Allocation Errors
314132db935SJakub Kicinski-------------------------
315132db935SJakub Kicinski'Page allocation failure. order:0' errors may occur under stress.
316132db935SJakub KicinskiThis is caused by the way the Linux kernel reports this stressed condition.
317132db935SJakub Kicinski
318132db935SJakub Kicinski
319132db935SJakub KicinskiSupport
320132db935SJakub Kicinski=======
321132db935SJakub KicinskiFor general information, go to the Intel support website at:
322132db935SJakub Kicinskihttps://support.intel.com
323132db935SJakub Kicinski
324132db935SJakub KicinskiIf an issue is identified with the released source code on the supported kernel
325132db935SJakub Kicinskiwith a supported adapter, email the specific information related to the issue
326*8ba732beSTony Nguyento intel-wired-lan@lists.osuosl.org.
327