xref: /openbmc/linux/Documentation/accel/qaic/qaic.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1*830f3f27SJeffrey Hugo.. SPDX-License-Identifier: GPL-2.0-only
2*830f3f27SJeffrey Hugo
3*830f3f27SJeffrey Hugo=============
4*830f3f27SJeffrey Hugo QAIC driver
5*830f3f27SJeffrey Hugo=============
6*830f3f27SJeffrey Hugo
7*830f3f27SJeffrey HugoThe QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI
8*830f3f27SJeffrey Hugoaccelerator products.
9*830f3f27SJeffrey Hugo
10*830f3f27SJeffrey HugoInterrupts
11*830f3f27SJeffrey Hugo==========
12*830f3f27SJeffrey Hugo
13*830f3f27SJeffrey HugoWhile the AIC100 DMA Bridge hardware implements an IRQ storm mitigation
14*830f3f27SJeffrey Hugomechanism, it is still possible for an IRQ storm to occur. A storm can happen
15*830f3f27SJeffrey Hugoif the workload is particularly quick, and the host is responsive. If the host
16*830f3f27SJeffrey Hugocan drain the response FIFO as quickly as the device can insert elements into
17*830f3f27SJeffrey Hugoit, then the device will frequently transition the response FIFO from empty to
18*830f3f27SJeffrey Hugonon-empty and generate MSIs at a rate equivalent to the speed of the
19*830f3f27SJeffrey Hugoworkload's ability to process inputs. The lprnet (license plate reader network)
20*830f3f27SJeffrey Hugoworkload is known to trigger this condition, and can generate in excess of 100k
21*830f3f27SJeffrey HugoMSIs per second. It has been observed that most systems cannot tolerate this
22*830f3f27SJeffrey Hugofor long, and will crash due to some form of watchdog due to the overhead of
23*830f3f27SJeffrey Hugothe interrupt controller interrupting the host CPU.
24*830f3f27SJeffrey Hugo
25*830f3f27SJeffrey HugoTo mitigate this issue, the QAIC driver implements specific IRQ handling. When
26*830f3f27SJeffrey HugoQAIC receives an IRQ, it disables that line. This prevents the interrupt
27*830f3f27SJeffrey Hugocontroller from interrupting the CPU. Then AIC drains the FIFO. Once the FIFO
28*830f3f27SJeffrey Hugois drained, QAIC implements a "last chance" polling algorithm where QAIC will
29*830f3f27SJeffrey Hugosleep for a time to see if the workload will generate more activity. The IRQ
30*830f3f27SJeffrey Hugoline remains disabled during this time. If no activity is detected, QAIC exits
31*830f3f27SJeffrey Hugopolling mode and reenables the IRQ line.
32*830f3f27SJeffrey Hugo
33*830f3f27SJeffrey HugoThis mitigation in QAIC is very effective. The same lprnet usecase that
34*830f3f27SJeffrey Hugogenerates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64
35*830f3f27SJeffrey HugoIRQs over 5 minutes while keeping the host system stable, and having the same
36*830f3f27SJeffrey Hugoworkload throughput performance (within run to run noise variation).
37*830f3f27SJeffrey Hugo
38*830f3f27SJeffrey Hugo
39*830f3f27SJeffrey HugoNeural Network Control (NNC) Protocol
40*830f3f27SJeffrey Hugo=====================================
41*830f3f27SJeffrey Hugo
42*830f3f27SJeffrey HugoThe implementation of NNC is split between the KMD (QAIC) and UMD. In general
43*830f3f27SJeffrey HugoQAIC understands how to encode/decode NNC wire protocol, and elements of the
44*830f3f27SJeffrey Hugoprotocol which require kernel space knowledge to process (for example, mapping
45*830f3f27SJeffrey Hugohost memory to device IOVAs). QAIC understands the structure of a message, and
46*830f3f27SJeffrey Hugoall of the transactions. QAIC does not understand commands (the payload of a
47*830f3f27SJeffrey Hugopassthrough transaction).
48*830f3f27SJeffrey Hugo
49*830f3f27SJeffrey HugoQAIC handles and enforces the required little endianness and 64-bit alignment,
50*830f3f27SJeffrey Hugoto the degree that it can. Since QAIC does not know the contents of a
51*830f3f27SJeffrey Hugopassthrough transaction, it relies on the UMD to satisfy the requirements.
52*830f3f27SJeffrey Hugo
53*830f3f27SJeffrey HugoThe terminate transaction is of particular use to QAIC. QAIC is not aware of
54*830f3f27SJeffrey Hugothe resources that are loaded onto a device since the majority of that activity
55*830f3f27SJeffrey Hugooccurs within NNC commands. As a result, QAIC does not have the means to
56*830f3f27SJeffrey Hugoroll back userspace activity. To ensure that a userspace client's resources
57*830f3f27SJeffrey Hugoare fully released in the case of a process crash, or a bug, QAIC uses the
58*830f3f27SJeffrey Hugoterminate command to let QSM know when a user has gone away, and the resources
59*830f3f27SJeffrey Hugocan be released.
60*830f3f27SJeffrey Hugo
61*830f3f27SJeffrey HugoQSM can report a version number of the NNC protocol it supports. This is in the
62*830f3f27SJeffrey Hugoform of a Major number and a Minor number.
63*830f3f27SJeffrey Hugo
64*830f3f27SJeffrey HugoMajor number updates indicate changes to the NNC protocol which impact the
65*830f3f27SJeffrey Hugomessage format, or transactions (impacts QAIC).
66*830f3f27SJeffrey Hugo
67*830f3f27SJeffrey HugoMinor number updates indicate changes to the NNC protocol which impact the
68*830f3f27SJeffrey Hugocommands (does not impact QAIC).
69*830f3f27SJeffrey Hugo
70*830f3f27SJeffrey HugouAPI
71*830f3f27SJeffrey Hugo====
72*830f3f27SJeffrey Hugo
73*830f3f27SJeffrey HugoQAIC defines a number of driver specific IOCTLs as part of the userspace API.
74*830f3f27SJeffrey HugoThis section describes those APIs.
75*830f3f27SJeffrey Hugo
76*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_MANAGE
77*830f3f27SJeffrey Hugo  This IOCTL allows userspace to send a NNC request to the QSM. The call will
78*830f3f27SJeffrey Hugo  block until a response is received, or the request has timed out.
79*830f3f27SJeffrey Hugo
80*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_CREATE_BO
81*830f3f27SJeffrey Hugo  This IOCTL allows userspace to allocate a buffer object (BO) which can send
82*830f3f27SJeffrey Hugo  or receive data from a workload. The call will return a GEM handle that
83*830f3f27SJeffrey Hugo  represents the allocated buffer. The BO is not usable until it has been
84*830f3f27SJeffrey Hugo  sliced (see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
85*830f3f27SJeffrey Hugo
86*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_MMAP_BO
87*830f3f27SJeffrey Hugo  This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
88*830f3f27SJeffrey Hugo  userspace process.
89*830f3f27SJeffrey Hugo
90*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_ATTACH_SLICE_BO
91*830f3f27SJeffrey Hugo  This IOCTL allows userspace to slice a BO in preparation for sending the BO
92*830f3f27SJeffrey Hugo  to the device. Slicing is the operation of describing what portions of a BO
93*830f3f27SJeffrey Hugo  get sent where to a workload. This requires a set of DMA transfers for the
94*830f3f27SJeffrey Hugo  DMA Bridge, and as such, locks the BO to a specific DBC.
95*830f3f27SJeffrey Hugo
96*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_EXECUTE_BO
97*830f3f27SJeffrey Hugo  This IOCTL allows userspace to submit a set of sliced BOs to the device. The
98*830f3f27SJeffrey Hugo  call is non-blocking. Success only indicates that the BOs have been queued
99*830f3f27SJeffrey Hugo  to the device, but does not guarantee they have been executed.
100*830f3f27SJeffrey Hugo
101*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO
102*830f3f27SJeffrey Hugo  This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace
103*830f3f27SJeffrey Hugo  to shrink the BOs sent to the device for this specific call. If a BO
104*830f3f27SJeffrey Hugo  typically has N inputs, but only a subset of those is available, this IOCTL
105*830f3f27SJeffrey Hugo  allows userspace to indicate that only the first M bytes of the BO should be
106*830f3f27SJeffrey Hugo  sent to the device to minimize data transfer overhead. This IOCTL dynamically
107*830f3f27SJeffrey Hugo  recomputes the slicing, and therefore has some processing overhead before the
108*830f3f27SJeffrey Hugo  BOs can be queued to the device.
109*830f3f27SJeffrey Hugo
110*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_WAIT_BO
111*830f3f27SJeffrey Hugo  This IOCTL allows userspace to determine when a particular BO has been
112*830f3f27SJeffrey Hugo  processed by the device. The call will block until either the BO has been
113*830f3f27SJeffrey Hugo  processed and can be re-queued to the device, or a timeout occurs.
114*830f3f27SJeffrey Hugo
115*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_PERF_STATS_BO
116*830f3f27SJeffrey Hugo  This IOCTL allows userspace to collect performance statistics on the most
117*830f3f27SJeffrey Hugo  recent execution of a BO. This allows userspace to construct an end to end
118*830f3f27SJeffrey Hugo  timeline of the BO processing for a performance analysis.
119*830f3f27SJeffrey Hugo
120*830f3f27SJeffrey HugoDRM_IOCTL_QAIC_PART_DEV
121*830f3f27SJeffrey Hugo  This IOCTL allows userspace to request a duplicate "shadow device". This extra
122*830f3f27SJeffrey Hugo  accelN device is associated with a specific partition of resources on the
123*830f3f27SJeffrey Hugo  AIC100 device and can be used for limiting a process to some subset of
124*830f3f27SJeffrey Hugo  resources.
125*830f3f27SJeffrey Hugo
126*830f3f27SJeffrey HugoUserspace Client Isolation
127*830f3f27SJeffrey Hugo==========================
128*830f3f27SJeffrey Hugo
129*830f3f27SJeffrey HugoAIC100 supports multiple clients. Multiple DBCs can be consumed by a single
130*830f3f27SJeffrey Hugoclient, and multiple clients can each consume one or more DBCs. Workloads
131*830f3f27SJeffrey Hugomay contain sensitive information therefore only the client that owns the
132*830f3f27SJeffrey Hugoworkload should be allowed to interface with the DBC.
133*830f3f27SJeffrey Hugo
134*830f3f27SJeffrey HugoClients are identified by the instance associated with their open(). A client
135*830f3f27SJeffrey Hugomay only use memory they allocate, and DBCs that are assigned to their
136*830f3f27SJeffrey Hugoworkloads. Attempts to access resources assigned to other clients will be
137*830f3f27SJeffrey Hugorejected.
138*830f3f27SJeffrey Hugo
139*830f3f27SJeffrey HugoModule parameters
140*830f3f27SJeffrey Hugo=================
141*830f3f27SJeffrey Hugo
142*830f3f27SJeffrey HugoQAIC supports the following module parameters:
143*830f3f27SJeffrey Hugo
144*830f3f27SJeffrey Hugo**datapath_polling (bool)**
145*830f3f27SJeffrey Hugo
146*830f3f27SJeffrey HugoConfigures QAIC to use a polling thread for datapath events instead of relying
147*830f3f27SJeffrey Hugoon the device interrupts. Useful for platforms with broken multiMSI. Must be
148*830f3f27SJeffrey Hugoset at QAIC driver initialization. Default is 0 (off).
149*830f3f27SJeffrey Hugo
150*830f3f27SJeffrey Hugo**mhi_timeout_ms (unsigned int)**
151*830f3f27SJeffrey Hugo
152*830f3f27SJeffrey HugoSets the timeout value for MHI operations in milliseconds (ms). Must be set
153*830f3f27SJeffrey Hugoat the time the driver detects a device. Default is 2000 (2 seconds).
154*830f3f27SJeffrey Hugo
155*830f3f27SJeffrey Hugo**control_resp_timeout_s (unsigned int)**
156*830f3f27SJeffrey Hugo
157*830f3f27SJeffrey HugoSets the timeout value for QSM responses to NNC messages in seconds (s). Must
158*830f3f27SJeffrey Hugobe set at the time the driver is sending a request to QSM. Default is 60 (one
159*830f3f27SJeffrey Hugominute).
160*830f3f27SJeffrey Hugo
161*830f3f27SJeffrey Hugo**wait_exec_default_timeout_ms (unsigned int)**
162*830f3f27SJeffrey Hugo
163*830f3f27SJeffrey HugoSets the default timeout for the wait_exec ioctl in milliseconds (ms). Must be
164*830f3f27SJeffrey Hugoset prior to the waic_exec ioctl call. A value specified in the ioctl call
165*830f3f27SJeffrey Hugooverrides this for that call. Default is 5000 (5 seconds).
166*830f3f27SJeffrey Hugo
167*830f3f27SJeffrey Hugo**datapath_poll_interval_us (unsigned int)**
168*830f3f27SJeffrey Hugo
169*830f3f27SJeffrey HugoSets the polling interval in microseconds (us) when datapath polling is active.
170*830f3f27SJeffrey HugoTakes effect at the next polling interval. Default is 100 (100 us).
171