xref: /openbmc/linux/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst (revision c900529f3d9161bfde5cca0754f83b4d3c3e0220)
1132db935SJakub Kicinski.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2132db935SJakub Kicinski
3132db935SJakub Kicinski====================================
4132db935SJakub KicinskiMarvell OcteonTx2 RVU Kernel Drivers
5132db935SJakub Kicinski====================================
6132db935SJakub Kicinski
7132db935SJakub KicinskiCopyright (c) 2020 Marvell International Ltd.
8132db935SJakub Kicinski
9132db935SJakub KicinskiContents
10132db935SJakub Kicinski========
11132db935SJakub Kicinski
12132db935SJakub Kicinski- `Overview`_
13132db935SJakub Kicinski- `Drivers`_
14132db935SJakub Kicinski- `Basic packet flow`_
1580b94148SGeorge Cherian- `Devlink health reporters`_
16efe10306SHariprasad Kelam- `Quality of service`_
17132db935SJakub Kicinski
18132db935SJakub KicinskiOverview
19132db935SJakub Kicinski========
20132db935SJakub Kicinski
21132db935SJakub KicinskiResource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
22132db935SJakub Kicinskiresources from the network, crypto and other functional blocks into
23132db935SJakub KicinskiPCI-compatible physical and virtual functions. Each functional block
24132db935SJakub Kicinskiagain has multiple local functions (LFs) for provisioning to PCI devices.
25132db935SJakub KicinskiRVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
26132db935SJakub Kicinskifunctions (VFs). PF0 is called the administrative / admin function (AF)
27132db935SJakub Kicinskiand has privileges to provision RVU functional block's LFs to each of the
28132db935SJakub KicinskiPF/VF.
29132db935SJakub Kicinski
30132db935SJakub KicinskiRVU managed networking functional blocks
31132db935SJakub Kicinski - Network pool or buffer allocator (NPA)
32132db935SJakub Kicinski - Network interface controller (NIX)
33132db935SJakub Kicinski - Network parser CAM (NPC)
34132db935SJakub Kicinski - Schedule/Synchronize/Order unit (SSO)
35132db935SJakub Kicinski - Loopback interface (LBK)
36132db935SJakub Kicinski
37132db935SJakub KicinskiRVU managed non-networking functional blocks
38132db935SJakub Kicinski - Crypto accelerator (CPT)
39132db935SJakub Kicinski - Scheduled timers unit (TIM)
40132db935SJakub Kicinski - Schedule/Synchronize/Order unit (SSO)
41132db935SJakub Kicinski   Used for both networking and non networking usecases
42132db935SJakub Kicinski
43132db935SJakub KicinskiResource provisioning examples
44132db935SJakub Kicinski - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
45132db935SJakub Kicinski - A PF/VF with CPT-LF resource works as a pure crypto offload device.
46132db935SJakub Kicinski
47132db935SJakub KicinskiRVU functional blocks are highly configurable as per software requirements.
48132db935SJakub Kicinski
49132db935SJakub KicinskiFirmware setups following stuff before kernel boots
50132db935SJakub Kicinski - Enables required number of RVU PFs based on number of physical links.
51132db935SJakub Kicinski - Number of VFs per PF are either static or configurable at compile time.
52132db935SJakub Kicinski   Based on config, firmware assigns VFs to each of the PFs.
53132db935SJakub Kicinski - Also assigns MSIX vectors to each of PF and VFs.
54132db935SJakub Kicinski - These are not changed after kernel boot.
55132db935SJakub Kicinski
56132db935SJakub KicinskiDrivers
57132db935SJakub Kicinski=======
58132db935SJakub Kicinski
59132db935SJakub KicinskiLinux kernel will have multiple drivers registering to different PF and VFs
60132db935SJakub Kicinskiof RVU. Wrt networking there will be 3 flavours of drivers.
61132db935SJakub Kicinski
62132db935SJakub KicinskiAdmin Function driver
63132db935SJakub Kicinski---------------------
64132db935SJakub Kicinski
65132db935SJakub KicinskiAs mentioned above RVU PF0 is called the admin function (AF), this driver
66132db935SJakub Kicinskisupports resource provisioning and configuration of functional blocks.
67132db935SJakub KicinskiDoesn't handle any I/O. It sets up few basic stuff but most of the
68132db935SJakub Kicinskifuncionality is achieved via configuration requests from PFs and VFs.
69132db935SJakub Kicinski
70132db935SJakub KicinskiPF/VFs communicates with AF via a shared memory region (mailbox). Upon
71132db935SJakub Kicinskireceiving requests AF does resource provisioning and other HW configuration.
72132db935SJakub KicinskiAF is always attached to host kernel, but PFs and their VFs may be used by host
73132db935SJakub Kicinskikernel itself, or attached to VMs or to userspace applications like
74132db935SJakub KicinskiDPDK etc. So AF has to handle provisioning/configuration requests sent
75132db935SJakub Kicinskiby any device from any domain.
76132db935SJakub Kicinski
77132db935SJakub KicinskiAF driver also interacts with underlying firmware to
78132db935SJakub Kicinski - Manage physical ethernet links ie CGX LMACs.
79132db935SJakub Kicinski - Retrieve information like speed, duplex, autoneg etc
80132db935SJakub Kicinski - Retrieve PHY EEPROM and stats.
81132db935SJakub Kicinski - Configure FEC, PAM modes
82132db935SJakub Kicinski - etc
83132db935SJakub Kicinski
84132db935SJakub KicinskiFrom pure networking side AF driver supports following functionality.
85132db935SJakub Kicinski - Map a physical link to a RVU PF to which a netdev is registered.
86132db935SJakub Kicinski - Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs
87132db935SJakub Kicinski   for regular networking functionality.
88132db935SJakub Kicinski - Flow control (pause frames) enable/disable/config.
89132db935SJakub Kicinski - HW PTP timestamping related config.
90132db935SJakub Kicinski - NPC parser profile config, basically how to parse pkt and what info to extract.
91132db935SJakub Kicinski - NPC extract profile config, what to extract from the pkt to match data in MCAM entries.
92132db935SJakub Kicinski - Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.
93132db935SJakub Kicinski - Defines receive side scaling (RSS) algorithms.
94132db935SJakub Kicinski - Defines segmentation offload algorithms (eg TSO)
95132db935SJakub Kicinski - VLAN stripping, capture and insertion config.
96132db935SJakub Kicinski - SSO and TIM blocks config which provide packet scheduling support.
97132db935SJakub Kicinski - Debugfs support, to check current resource provising, current status of
98132db935SJakub Kicinski   NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.
99132db935SJakub Kicinski - And many more.
100132db935SJakub Kicinski
101132db935SJakub KicinskiPhysical Function driver
102132db935SJakub Kicinski------------------------
103132db935SJakub Kicinski
104132db935SJakub KicinskiThis RVU PF handles IO, is mapped to a physical ethernet link and this
105132db935SJakub Kicinskidriver registers a netdev. This supports SR-IOV. As said above this driver
106132db935SJakub Kicinskicommunicates with AF with a mailbox. To retrieve information from physical
107132db935SJakub Kicinskilinks this driver talks to AF and AF gets that info from firmware and responds
108132db935SJakub Kicinskiback ie cannot talk to firmware directly.
109132db935SJakub Kicinski
110132db935SJakub KicinskiSupports ethtool for configuring links, RSS, queue count, queue size,
111132db935SJakub Kicinskiflow control, ntuple filters, dump PHY EEPROM, config FEC etc.
112132db935SJakub Kicinski
113132db935SJakub KicinskiVirtual Function driver
114132db935SJakub Kicinski-----------------------
115132db935SJakub Kicinski
116132db935SJakub KicinskiThere are two types VFs, VFs that share the physical link with their parent
117132db935SJakub KicinskiSR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).
118132db935SJakub Kicinski
119132db935SJakub KicinskiType1:
120132db935SJakub Kicinski - These VFs and their parent PF share a physical link and used for outside communication.
121132db935SJakub Kicinski - VFs cannot communicate with AF directly, they send mbox message to PF and PF
122132db935SJakub Kicinski   forwards that to AF. AF after processing, responds back to PF and PF forwards
123132db935SJakub Kicinski   the reply to VF.
124132db935SJakub Kicinski - From functionality point of view there is no difference between PF and VF as same type
125132db935SJakub Kicinski   HW resources are attached to both. But user would be able to configure few stuff only
126132db935SJakub Kicinski   from PF as PF is treated as owner/admin of the link.
127132db935SJakub Kicinski
128132db935SJakub KicinskiType2:
129132db935SJakub Kicinski - RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
130132db935SJakub Kicinski - A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
131132db935SJakub Kicinski   VF0 will be received by VF1 and vice versa.
132132db935SJakub Kicinski - These VFs can be used by applications or virtual machines to communicate between them
133132db935SJakub Kicinski   without sending traffic outside. There is no switch present in HW, hence the support
134132db935SJakub Kicinski   for loopback VFs.
135132db935SJakub Kicinski - These communicate directly with AF (PF0) via mbox.
136132db935SJakub Kicinski
137132db935SJakub KicinskiExcept for the IO channels or links used for packet reception and transmission there is
138132db935SJakub Kicinskino other difference between these VF types. AF driver takes care of IO channel mapping,
139132db935SJakub Kicinskihence same VF driver works for both types of devices.
140132db935SJakub Kicinski
141132db935SJakub KicinskiBasic packet flow
142132db935SJakub Kicinski=================
143132db935SJakub Kicinski
144132db935SJakub KicinskiIngress
145132db935SJakub Kicinski-------
146132db935SJakub Kicinski
147132db935SJakub Kicinski1. CGX LMAC receives packet.
148132db935SJakub Kicinski2. Forwards the packet to the NIX block.
149132db935SJakub Kicinski3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.
150132db935SJakub Kicinski4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.
151132db935SJakub Kicinski5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.
152132db935SJakub Kicinski6. Packet is DMA'ed and driver is notified.
153132db935SJakub Kicinski
154132db935SJakub KicinskiEgress
155132db935SJakub Kicinski------
156132db935SJakub Kicinski
157132db935SJakub Kicinski1. Driver prepares a send descriptor and submits to SQ for transmission.
158132db935SJakub Kicinski2. The SQ is already configured (by AF) to transmit on a specific link/channel.
159132db935SJakub Kicinski3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
160132db935SJakub Kicinski4. NIX block transmits the pkt on the designated channel.
161132db935SJakub Kicinski5. NPC MCAM entries can be installed to divert pkt onto a different channel.
16280b94148SGeorge Cherian
16380b94148SGeorge CherianDevlink health reporters
16480b94148SGeorge Cherian========================
16580b94148SGeorge Cherian
16680b94148SGeorge CherianNPA Reporters
16780b94148SGeorge Cherian-------------
168f3562f5eSLukas BulwahnThe NPA reporters are responsible for reporting and recovering the following group of errors:
169f3562f5eSLukas Bulwahn
17080b94148SGeorge Cherian1. GENERAL events
171f3562f5eSLukas Bulwahn
17280b94148SGeorge Cherian   - Error due to operation of unmapped PF.
17380b94148SGeorge Cherian   - Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI and AURA).
174f3562f5eSLukas Bulwahn
17580b94148SGeorge Cherian2. ERROR events
176f3562f5eSLukas Bulwahn
17780b94148SGeorge Cherian   - Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write.
17880b94148SGeorge Cherian   - AQ Doorbell Error.
179f3562f5eSLukas Bulwahn
18080b94148SGeorge Cherian3. RAS events
181f3562f5eSLukas Bulwahn
18280b94148SGeorge Cherian   - RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S.
183f3562f5eSLukas Bulwahn
18480b94148SGeorge Cherian4. RVU events
185f3562f5eSLukas Bulwahn
18680b94148SGeorge Cherian   - Error due to unmapped slot.
18780b94148SGeorge Cherian
188f3562f5eSLukas BulwahnSample Output::
189f3562f5eSLukas Bulwahn
19080b94148SGeorge Cherian	~# devlink health
19180b94148SGeorge Cherian	pci/0002:01:00.0:
19280b94148SGeorge Cherian	  reporter hw_npa_intr
19380b94148SGeorge Cherian	      state healthy error 2872 recover 2872 last_dump_date 2020-12-10 last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true
19480b94148SGeorge Cherian	  reporter hw_npa_gen
19580b94148SGeorge Cherian	      state healthy error 2872 recover 2872 last_dump_date 2020-12-11 last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true
19680b94148SGeorge Cherian	  reporter hw_npa_err
19780b94148SGeorge Cherian	      state healthy error 2871 recover 2871 last_dump_date 2020-12-10 last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true
19880b94148SGeorge Cherian	   reporter hw_npa_ras
19980b94148SGeorge Cherian	      state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 09:32:40 grace_period 0 auto_recover true auto_dump true
20080b94148SGeorge Cherian
20180b94148SGeorge CherianEach reporter dumps the
202f3562f5eSLukas Bulwahn
20380b94148SGeorge Cherian - Error Type
20480b94148SGeorge Cherian - Error Register value
20580b94148SGeorge Cherian - Reason in words
20680b94148SGeorge Cherian
207f3562f5eSLukas BulwahnFor example::
208f3562f5eSLukas Bulwahn
20980b94148SGeorge Cherian	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_gen
21080b94148SGeorge Cherian	 NPA_AF_GENERAL:
21180b94148SGeorge Cherian	         NPA General Interrupt Reg : 1
21280b94148SGeorge Cherian	         NIX0: free disabled RX
21380b94148SGeorge Cherian	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_intr
21480b94148SGeorge Cherian	 NPA_AF_RVU:
21580b94148SGeorge Cherian	         NPA RVU Interrupt Reg : 1
21680b94148SGeorge Cherian	         Unmap Slot Error
21780b94148SGeorge Cherian	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
21880b94148SGeorge Cherian	 NPA_AF_ERR:
21980b94148SGeorge Cherian	        NPA Error Interrupt Reg : 4096
22080b94148SGeorge Cherian	        AQ Doorbell Error
221d41b3365SGeorge Cherian
222d41b3365SGeorge Cherian
223d41b3365SGeorge CherianNIX Reporters
224d41b3365SGeorge Cherian-------------
225d41b3365SGeorge CherianThe NIX reporters are responsible for reporting and recovering the following group of errors:
226d41b3365SGeorge Cherian
227d41b3365SGeorge Cherian1. GENERAL events
228d41b3365SGeorge Cherian
229d41b3365SGeorge Cherian   - Receive mirror/multicast packet drop due to insufficient buffer.
230d41b3365SGeorge Cherian   - SMQ Flush operation.
231d41b3365SGeorge Cherian
232d41b3365SGeorge Cherian2. ERROR events
233d41b3365SGeorge Cherian
234d41b3365SGeorge Cherian   - Memory Fault due to WQE read/write from multicast/mirror buffer.
235d41b3365SGeorge Cherian   - Receive multicast/mirror replication list error.
236d41b3365SGeorge Cherian   - Receive packet on an unmapped PF.
237d41b3365SGeorge Cherian   - Fault due to NIX_AQ_INST_S read or NIX_AQ_RES_S write.
238d41b3365SGeorge Cherian   - AQ Doorbell Error.
239d41b3365SGeorge Cherian
240d41b3365SGeorge Cherian3. RAS events
241d41b3365SGeorge Cherian
242d41b3365SGeorge Cherian   - RAS Error Reporting for NIX Receive Multicast/Mirror Entry Structure.
243d41b3365SGeorge Cherian   - RAS Error Reporting for WQE/Packet Data read from Multicast/Mirror Buffer..
244d41b3365SGeorge Cherian   - RAS Error Reporting for NIX_AQ_INST_S/NIX_AQ_RES_S.
245d41b3365SGeorge Cherian
246d41b3365SGeorge Cherian4. RVU events
247d41b3365SGeorge Cherian
248d41b3365SGeorge Cherian   - Error due to unmapped slot.
249d41b3365SGeorge Cherian
250d41b3365SGeorge CherianSample Output::
251d41b3365SGeorge Cherian
252d41b3365SGeorge Cherian	~# ./devlink health
253d41b3365SGeorge Cherian	pci/0002:01:00.0:
254d41b3365SGeorge Cherian	  reporter hw_npa_intr
255d41b3365SGeorge Cherian	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
256d41b3365SGeorge Cherian	  reporter hw_npa_gen
257d41b3365SGeorge Cherian	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
258d41b3365SGeorge Cherian	  reporter hw_npa_err
259d41b3365SGeorge Cherian	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
260d41b3365SGeorge Cherian	  reporter hw_npa_ras
261d41b3365SGeorge Cherian	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
262d41b3365SGeorge Cherian	  reporter hw_nix_intr
263d41b3365SGeorge Cherian	    state healthy error 1121 recover 1121 last_dump_date 2021-01-19 last_dump_time 05:42:26 grace_period 0 auto_recover true auto_dump true
264d41b3365SGeorge Cherian	  reporter hw_nix_gen
265d41b3365SGeorge Cherian	    state healthy error 949 recover 949 last_dump_date 2021-01-19 last_dump_time 05:42:43 grace_period 0 auto_recover true auto_dump true
266d41b3365SGeorge Cherian	  reporter hw_nix_err
267d41b3365SGeorge Cherian	    state healthy error 1147 recover 1147 last_dump_date 2021-01-19 last_dump_time 05:42:59 grace_period 0 auto_recover true auto_dump true
268d41b3365SGeorge Cherian	  reporter hw_nix_ras
269d41b3365SGeorge Cherian	    state healthy error 409 recover 409 last_dump_date 2021-01-19 last_dump_time 05:43:16 grace_period 0 auto_recover true auto_dump true
270d41b3365SGeorge Cherian
271d41b3365SGeorge CherianEach reporter dumps the
272d41b3365SGeorge Cherian
273d41b3365SGeorge Cherian - Error Type
274d41b3365SGeorge Cherian - Error Register value
275d41b3365SGeorge Cherian - Reason in words
276d41b3365SGeorge Cherian
277d41b3365SGeorge CherianFor example::
278d41b3365SGeorge Cherian
279d41b3365SGeorge Cherian	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
280d41b3365SGeorge Cherian	 NIX_AF_RVU:
281d41b3365SGeorge Cherian	        NIX RVU Interrupt Reg : 1
282d41b3365SGeorge Cherian	        Unmap Slot Error
283d41b3365SGeorge Cherian	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
284d41b3365SGeorge Cherian	 NIX_AF_GENERAL:
285d41b3365SGeorge Cherian	        NIX General Interrupt Reg : 1
286d41b3365SGeorge Cherian	        Rx multicast pkt drop
287d41b3365SGeorge Cherian	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_err
288d41b3365SGeorge Cherian	 NIX_AF_ERR:
289d41b3365SGeorge Cherian	        NIX Error Interrupt Reg : 64
290d41b3365SGeorge Cherian	        Rx on unmapped PF_FUNC
291efe10306SHariprasad Kelam
292efe10306SHariprasad Kelam
293efe10306SHariprasad KelamQuality of service
294efe10306SHariprasad Kelam==================
295efe10306SHariprasad Kelam
296efe10306SHariprasad Kelam
297efe10306SHariprasad KelamHardware algorithms used in scheduling
298efe10306SHariprasad Kelam--------------------------------------
299efe10306SHariprasad Kelam
300efe10306SHariprasad Kelamocteontx2 silicon and CN10K transmit interface consists of five transmit levels
301efe10306SHariprasad Kelamstarting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1
302efe10306SHariprasad Kelamlevels. Each level contains an array of queues to support scheduling and shaping.
303efe10306SHariprasad KelamThe hardware uses the below algorithms depending on the priority of scheduler queues.
304efe10306SHariprasad Kelamonce the usercreates tc classes with different priorities, the driver configures
305efe10306SHariprasad Kelamschedulers allocated to the class with specified priority along with rate-limiting
306efe10306SHariprasad Kelamconfiguration.
307efe10306SHariprasad Kelam
308efe10306SHariprasad Kelam1. Strict Priority
309efe10306SHariprasad Kelam
310efe10306SHariprasad Kelam      -  Once packets are submitted to MDQ, hardware picks all active MDQs having different priority
311efe10306SHariprasad Kelam         using strict priority.
312efe10306SHariprasad Kelam
313efe10306SHariprasad Kelam2. Round Robin
314efe10306SHariprasad Kelam
315efe10306SHariprasad Kelam      - Active MDQs having the same priority level are chosen using round robin.
316efe10306SHariprasad Kelam
317efe10306SHariprasad Kelam
318efe10306SHariprasad KelamSetup HTB offload
319efe10306SHariprasad Kelam-----------------
320efe10306SHariprasad Kelam
321efe10306SHariprasad Kelam1. Enable HW TC offload on the interface::
322efe10306SHariprasad Kelam
323efe10306SHariprasad Kelam        # ethtool -K <interface> hw-tc-offload on
324efe10306SHariprasad Kelam
325efe10306SHariprasad Kelam2. Crate htb root::
326efe10306SHariprasad Kelam
327efe10306SHariprasad Kelam        # tc qdisc add dev <interface> clsact
328efe10306SHariprasad Kelam        # tc qdisc replace dev <interface> root handle 1: htb offload
329efe10306SHariprasad Kelam
330efe10306SHariprasad Kelam3. Create tc classes with different priorities::
331efe10306SHariprasad Kelam
332efe10306SHariprasad Kelam        # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
333efe10306SHariprasad Kelam
334efe10306SHariprasad Kelam        # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7
335*6f71051fSHariprasad Kelam
336*6f71051fSHariprasad Kelam4. Create tc classes with same priorities and different quantum::
337*6f71051fSHariprasad Kelam
338*6f71051fSHariprasad Kelam        # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 2 quantum 409600
339*6f71051fSHariprasad Kelam
340*6f71051fSHariprasad Kelam        # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416
341*6f71051fSHariprasad Kelam
342*6f71051fSHariprasad Kelam        # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768
343