1*8c5577a5SOded Gabbay.. SPDX-License-Identifier: GPL-2.0 2*8c5577a5SOded Gabbay 3*8c5577a5SOded Gabbay============ 4*8c5577a5SOded GabbayIntroduction 5*8c5577a5SOded Gabbay============ 6*8c5577a5SOded Gabbay 7*8c5577a5SOded GabbayThe Linux compute accelerators subsystem is designed to expose compute 8*8c5577a5SOded Gabbayaccelerators in a common way to user-space and provide a common set of 9*8c5577a5SOded Gabbayfunctionality. 10*8c5577a5SOded Gabbay 11*8c5577a5SOded GabbayThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. 12*8c5577a5SOded GabbayAlthough these devices are typically designed to accelerate 13*8c5577a5SOded GabbayMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer 14*8c5577a5SOded Gabbayis not limited to handling these types of accelerators. 15*8c5577a5SOded Gabbay 16*8c5577a5SOded GabbayTypically, a compute accelerator will belong to one of the following 17*8c5577a5SOded Gabbaycategories: 18*8c5577a5SOded Gabbay 19*8c5577a5SOded Gabbay- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, 20*8c5577a5SOded Gabbay or an IP inside a SoC (e.g. laptop web camera). These devices 21*8c5577a5SOded Gabbay are typically configured using registers and can work with or without DMA. 22*8c5577a5SOded Gabbay 23*8c5577a5SOded Gabbay- Inference data-center - single/multi user devices in a large server. This 24*8c5577a5SOded Gabbay type of device can be stand-alone or an IP inside a SoC or a GPU. It will 25*8c5577a5SOded Gabbay have on-board DRAM (to hold the DL topology), DMA engines and 26*8c5577a5SOded Gabbay command submission queues (either kernel or user-space queues). 27*8c5577a5SOded Gabbay It might also have an MMU to manage multiple users and might also enable 28*8c5577a5SOded Gabbay virtualization (SR-IOV) to support multiple VMs on the same device. In 29*8c5577a5SOded Gabbay addition, these devices will usually have some tools, such as profiler and 30*8c5577a5SOded Gabbay debugger. 31*8c5577a5SOded Gabbay 32*8c5577a5SOded Gabbay- Training data-center - Similar to Inference data-center cards, but typically 33*8c5577a5SOded Gabbay have more computational power and memory b/w (e.g. HBM) and will likely have 34*8c5577a5SOded Gabbay a method of scaling-up/out, i.e. connecting to other training cards inside 35*8c5577a5SOded Gabbay the server or in other servers, respectively. 36*8c5577a5SOded Gabbay 37*8c5577a5SOded GabbayAll these devices typically have different runtime user-space software stacks, 38*8c5577a5SOded Gabbaythat are tailored-made to their h/w. In addition, they will also probably 39*8c5577a5SOded Gabbayinclude a compiler to generate programs to their custom-made computational 40*8c5577a5SOded Gabbayengines. Typically, the common layer in user-space will be the DL frameworks, 41*8c5577a5SOded Gabbaysuch as PyTorch and TensorFlow. 42*8c5577a5SOded Gabbay 43*8c5577a5SOded GabbaySharing code with DRM 44*8c5577a5SOded Gabbay===================== 45*8c5577a5SOded Gabbay 46*8c5577a5SOded GabbayBecause this type of devices can be an IP inside GPUs or have similar 47*8c5577a5SOded Gabbaycharacteristics as those of GPUs, the accel subsystem will use the 48*8c5577a5SOded GabbayDRM subsystem's code and functionality. i.e. the accel core code will 49*8c5577a5SOded Gabbaybe part of the DRM subsystem and an accel device will be a new type of DRM 50*8c5577a5SOded Gabbaydevice. 51*8c5577a5SOded Gabbay 52*8c5577a5SOded GabbayThis will allow us to leverage the extensive DRM code-base and 53*8c5577a5SOded Gabbaycollaborate with DRM developers that have experience with this type of 54*8c5577a5SOded Gabbaydevices. In addition, new features that will be added for the accelerator 55*8c5577a5SOded Gabbaydrivers can be of use to GPU drivers as well. 56*8c5577a5SOded Gabbay 57*8c5577a5SOded GabbayDifferentiation from GPUs 58*8c5577a5SOded Gabbay========================= 59*8c5577a5SOded Gabbay 60*8c5577a5SOded GabbayBecause we want to prevent the extensive user-space graphic software stack 61*8c5577a5SOded Gabbayfrom trying to use an accelerator as a GPU, the compute accelerators will be 62*8c5577a5SOded Gabbaydifferentiated from GPUs by using a new major number and new device char files. 63*8c5577a5SOded Gabbay 64*8c5577a5SOded GabbayFurthermore, the drivers will be located in a separate place in the kernel 65*8c5577a5SOded Gabbaytree - drivers/accel/. 66*8c5577a5SOded Gabbay 67*8c5577a5SOded GabbayThe accelerator devices will be exposed to the user space with the dedicated 68*8c5577a5SOded Gabbay261 major number and will have the following convention: 69*8c5577a5SOded Gabbay 70*8c5577a5SOded Gabbay- device char files - /dev/accel/accel* 71*8c5577a5SOded Gabbay- sysfs - /sys/class/accel/accel*/ 72*8c5577a5SOded Gabbay- debugfs - /sys/kernel/debug/accel/accel*/ 73*8c5577a5SOded Gabbay 74*8c5577a5SOded GabbayGetting Started 75*8c5577a5SOded Gabbay=============== 76*8c5577a5SOded Gabbay 77*8c5577a5SOded GabbayFirst, read the DRM documentation at Documentation/gpu/index.rst. 78*8c5577a5SOded GabbayNot only it will explain how to write a new DRM driver but it will also 79*8c5577a5SOded Gabbaycontain all the information on how to contribute, the Code Of Conduct and 80*8c5577a5SOded Gabbaywhat is the coding style/documentation. All of that is the same for the 81*8c5577a5SOded Gabbayaccel subsystem. 82*8c5577a5SOded Gabbay 83*8c5577a5SOded GabbaySecond, make sure the kernel is configured with CONFIG_DRM_ACCEL. 84*8c5577a5SOded Gabbay 85*8c5577a5SOded GabbayTo expose your device as an accelerator, two changes are needed to 86*8c5577a5SOded Gabbaybe done in your driver (as opposed to a standard DRM driver): 87*8c5577a5SOded Gabbay 88*8c5577a5SOded Gabbay- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's 89*8c5577a5SOded Gabbay driver_features field. It is important to note that this driver feature is 90*8c5577a5SOded Gabbay mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want 91*8c5577a5SOded Gabbay to expose both graphics and compute device char files should be handled by 92*8c5577a5SOded Gabbay two drivers that are connected using the auxiliary bus framework. 93*8c5577a5SOded Gabbay 94*8c5577a5SOded Gabbay- Change the open callback in your driver fops structure to accel_open(). 95*8c5577a5SOded Gabbay Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily 96*8c5577a5SOded Gabbay set the correct function operations pointers structure. 97*8c5577a5SOded Gabbay 98*8c5577a5SOded GabbayExternal References 99*8c5577a5SOded Gabbay=================== 100*8c5577a5SOded Gabbay 101*8c5577a5SOded Gabbayemail threads 102*8c5577a5SOded Gabbay------------- 103*8c5577a5SOded Gabbay 104*8c5577a5SOded Gabbay* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) 105*8c5577a5SOded Gabbay* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) 106*8c5577a5SOded Gabbay 107*8c5577a5SOded GabbayConference talks 108*8c5577a5SOded Gabbay---------------- 109*8c5577a5SOded Gabbay 110*8c5577a5SOded Gabbay* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) 111