1*8c5577a5SOded Gabbay.. SPDX-License-Identifier: GPL-2.0
2*8c5577a5SOded Gabbay
3*8c5577a5SOded Gabbay============
4*8c5577a5SOded GabbayIntroduction
5*8c5577a5SOded Gabbay============
6*8c5577a5SOded Gabbay
7*8c5577a5SOded GabbayThe Linux compute accelerators subsystem is designed to expose compute
8*8c5577a5SOded Gabbayaccelerators in a common way to user-space and provide a common set of
9*8c5577a5SOded Gabbayfunctionality.
10*8c5577a5SOded Gabbay
11*8c5577a5SOded GabbayThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
12*8c5577a5SOded GabbayAlthough these devices are typically designed to accelerate
13*8c5577a5SOded GabbayMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
14*8c5577a5SOded Gabbayis not limited to handling these types of accelerators.
15*8c5577a5SOded Gabbay
16*8c5577a5SOded GabbayTypically, a compute accelerator will belong to one of the following
17*8c5577a5SOded Gabbaycategories:
18*8c5577a5SOded Gabbay
19*8c5577a5SOded Gabbay- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
20*8c5577a5SOded Gabbay  or an IP inside a SoC (e.g. laptop web camera). These devices
21*8c5577a5SOded Gabbay  are typically configured using registers and can work with or without DMA.
22*8c5577a5SOded Gabbay
23*8c5577a5SOded Gabbay- Inference data-center - single/multi user devices in a large server. This
24*8c5577a5SOded Gabbay  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
25*8c5577a5SOded Gabbay  have on-board DRAM (to hold the DL topology), DMA engines and
26*8c5577a5SOded Gabbay  command submission queues (either kernel or user-space queues).
27*8c5577a5SOded Gabbay  It might also have an MMU to manage multiple users and might also enable
28*8c5577a5SOded Gabbay  virtualization (SR-IOV) to support multiple VMs on the same device. In
29*8c5577a5SOded Gabbay  addition, these devices will usually have some tools, such as profiler and
30*8c5577a5SOded Gabbay  debugger.
31*8c5577a5SOded Gabbay
32*8c5577a5SOded Gabbay- Training data-center - Similar to Inference data-center cards, but typically
33*8c5577a5SOded Gabbay  have more computational power and memory b/w (e.g. HBM) and will likely have
34*8c5577a5SOded Gabbay  a method of scaling-up/out, i.e. connecting to other training cards inside
35*8c5577a5SOded Gabbay  the server or in other servers, respectively.
36*8c5577a5SOded Gabbay
37*8c5577a5SOded GabbayAll these devices typically have different runtime user-space software stacks,
38*8c5577a5SOded Gabbaythat are tailored-made to their h/w. In addition, they will also probably
39*8c5577a5SOded Gabbayinclude a compiler to generate programs to their custom-made computational
40*8c5577a5SOded Gabbayengines. Typically, the common layer in user-space will be the DL frameworks,
41*8c5577a5SOded Gabbaysuch as PyTorch and TensorFlow.
42*8c5577a5SOded Gabbay
43*8c5577a5SOded GabbaySharing code with DRM
44*8c5577a5SOded Gabbay=====================
45*8c5577a5SOded Gabbay
46*8c5577a5SOded GabbayBecause this type of devices can be an IP inside GPUs or have similar
47*8c5577a5SOded Gabbaycharacteristics as those of GPUs, the accel subsystem will use the
48*8c5577a5SOded GabbayDRM subsystem's code and functionality. i.e. the accel core code will
49*8c5577a5SOded Gabbaybe part of the DRM subsystem and an accel device will be a new type of DRM
50*8c5577a5SOded Gabbaydevice.
51*8c5577a5SOded Gabbay
52*8c5577a5SOded GabbayThis will allow us to leverage the extensive DRM code-base and
53*8c5577a5SOded Gabbaycollaborate with DRM developers that have experience with this type of
54*8c5577a5SOded Gabbaydevices. In addition, new features that will be added for the accelerator
55*8c5577a5SOded Gabbaydrivers can be of use to GPU drivers as well.
56*8c5577a5SOded Gabbay
57*8c5577a5SOded GabbayDifferentiation from GPUs
58*8c5577a5SOded Gabbay=========================
59*8c5577a5SOded Gabbay
60*8c5577a5SOded GabbayBecause we want to prevent the extensive user-space graphic software stack
61*8c5577a5SOded Gabbayfrom trying to use an accelerator as a GPU, the compute accelerators will be
62*8c5577a5SOded Gabbaydifferentiated from GPUs by using a new major number and new device char files.
63*8c5577a5SOded Gabbay
64*8c5577a5SOded GabbayFurthermore, the drivers will be located in a separate place in the kernel
65*8c5577a5SOded Gabbaytree - drivers/accel/.
66*8c5577a5SOded Gabbay
67*8c5577a5SOded GabbayThe accelerator devices will be exposed to the user space with the dedicated
68*8c5577a5SOded Gabbay261 major number and will have the following convention:
69*8c5577a5SOded Gabbay
70*8c5577a5SOded Gabbay- device char files - /dev/accel/accel*
71*8c5577a5SOded Gabbay- sysfs             - /sys/class/accel/accel*/
72*8c5577a5SOded Gabbay- debugfs           - /sys/kernel/debug/accel/accel*/
73*8c5577a5SOded Gabbay
74*8c5577a5SOded GabbayGetting Started
75*8c5577a5SOded Gabbay===============
76*8c5577a5SOded Gabbay
77*8c5577a5SOded GabbayFirst, read the DRM documentation at Documentation/gpu/index.rst.
78*8c5577a5SOded GabbayNot only it will explain how to write a new DRM driver but it will also
79*8c5577a5SOded Gabbaycontain all the information on how to contribute, the Code Of Conduct and
80*8c5577a5SOded Gabbaywhat is the coding style/documentation. All of that is the same for the
81*8c5577a5SOded Gabbayaccel subsystem.
82*8c5577a5SOded Gabbay
83*8c5577a5SOded GabbaySecond, make sure the kernel is configured with CONFIG_DRM_ACCEL.
84*8c5577a5SOded Gabbay
85*8c5577a5SOded GabbayTo expose your device as an accelerator, two changes are needed to
86*8c5577a5SOded Gabbaybe done in your driver (as opposed to a standard DRM driver):
87*8c5577a5SOded Gabbay
88*8c5577a5SOded Gabbay- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
89*8c5577a5SOded Gabbay  driver_features field. It is important to note that this driver feature is
90*8c5577a5SOded Gabbay  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
91*8c5577a5SOded Gabbay  to expose both graphics and compute device char files should be handled by
92*8c5577a5SOded Gabbay  two drivers that are connected using the auxiliary bus framework.
93*8c5577a5SOded Gabbay
94*8c5577a5SOded Gabbay- Change the open callback in your driver fops structure to accel_open().
95*8c5577a5SOded Gabbay  Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
96*8c5577a5SOded Gabbay  set the correct function operations pointers structure.
97*8c5577a5SOded Gabbay
98*8c5577a5SOded GabbayExternal References
99*8c5577a5SOded Gabbay===================
100*8c5577a5SOded Gabbay
101*8c5577a5SOded Gabbayemail threads
102*8c5577a5SOded Gabbay-------------
103*8c5577a5SOded Gabbay
104*8c5577a5SOded Gabbay* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
105*8c5577a5SOded Gabbay* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
106*8c5577a5SOded Gabbay
107*8c5577a5SOded GabbayConference talks
108*8c5577a5SOded Gabbay----------------
109*8c5577a5SOded Gabbay
110*8c5577a5SOded Gabbay* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
111