1.. SPDX-License-Identifier: GPL-2.0 2 3============ 4Introduction 5============ 6 7The Linux compute accelerators subsystem is designed to expose compute 8accelerators in a common way to user-space and provide a common set of 9functionality. 10 11These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. 12Although these devices are typically designed to accelerate 13Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer 14is not limited to handling these types of accelerators. 15 16Typically, a compute accelerator will belong to one of the following 17categories: 18 19- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, 20 or an IP inside a SoC (e.g. laptop web camera). These devices 21 are typically configured using registers and can work with or without DMA. 22 23- Inference data-center - single/multi user devices in a large server. This 24 type of device can be stand-alone or an IP inside a SoC or a GPU. It will 25 have on-board DRAM (to hold the DL topology), DMA engines and 26 command submission queues (either kernel or user-space queues). 27 It might also have an MMU to manage multiple users and might also enable 28 virtualization (SR-IOV) to support multiple VMs on the same device. In 29 addition, these devices will usually have some tools, such as profiler and 30 debugger. 31 32- Training data-center - Similar to Inference data-center cards, but typically 33 have more computational power and memory b/w (e.g. HBM) and will likely have 34 a method of scaling-up/out, i.e. connecting to other training cards inside 35 the server or in other servers, respectively. 36 37All these devices typically have different runtime user-space software stacks, 38that are tailored-made to their h/w. In addition, they will also probably 39include a compiler to generate programs to their custom-made computational 40engines. Typically, the common layer in user-space will be the DL frameworks, 41such as PyTorch and TensorFlow. 42 43Sharing code with DRM 44===================== 45 46Because this type of devices can be an IP inside GPUs or have similar 47characteristics as those of GPUs, the accel subsystem will use the 48DRM subsystem's code and functionality. i.e. the accel core code will 49be part of the DRM subsystem and an accel device will be a new type of DRM 50device. 51 52This will allow us to leverage the extensive DRM code-base and 53collaborate with DRM developers that have experience with this type of 54devices. In addition, new features that will be added for the accelerator 55drivers can be of use to GPU drivers as well. 56 57Differentiation from GPUs 58========================= 59 60Because we want to prevent the extensive user-space graphic software stack 61from trying to use an accelerator as a GPU, the compute accelerators will be 62differentiated from GPUs by using a new major number and new device char files. 63 64Furthermore, the drivers will be located in a separate place in the kernel 65tree - drivers/accel/. 66 67The accelerator devices will be exposed to the user space with the dedicated 68261 major number and will have the following convention: 69 70- device char files - /dev/accel/accel* 71- sysfs - /sys/class/accel/accel*/ 72- debugfs - /sys/kernel/debug/accel/accel*/ 73 74Getting Started 75=============== 76 77First, read the DRM documentation at Documentation/gpu/index.rst. 78Not only it will explain how to write a new DRM driver but it will also 79contain all the information on how to contribute, the Code Of Conduct and 80what is the coding style/documentation. All of that is the same for the 81accel subsystem. 82 83Second, make sure the kernel is configured with CONFIG_DRM_ACCEL. 84 85To expose your device as an accelerator, two changes are needed to 86be done in your driver (as opposed to a standard DRM driver): 87 88- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's 89 driver_features field. It is important to note that this driver feature is 90 mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want 91 to expose both graphics and compute device char files should be handled by 92 two drivers that are connected using the auxiliary bus framework. 93 94- Change the open callback in your driver fops structure to accel_open(). 95 Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily 96 set the correct function operations pointers structure. 97 98External References 99=================== 100 101email threads 102------------- 103 104* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) 105* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) 106 107Conference talks 108---------------- 109 110* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) 111