18c5577a5SOded Gabbay.. SPDX-License-Identifier: GPL-2.0 28c5577a5SOded Gabbay 38c5577a5SOded Gabbay============ 48c5577a5SOded GabbayIntroduction 58c5577a5SOded Gabbay============ 68c5577a5SOded Gabbay 78c5577a5SOded GabbayThe Linux compute accelerators subsystem is designed to expose compute 88c5577a5SOded Gabbayaccelerators in a common way to user-space and provide a common set of 98c5577a5SOded Gabbayfunctionality. 108c5577a5SOded Gabbay 118c5577a5SOded GabbayThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU. 128c5577a5SOded GabbayAlthough these devices are typically designed to accelerate 138c5577a5SOded GabbayMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer 148c5577a5SOded Gabbayis not limited to handling these types of accelerators. 158c5577a5SOded Gabbay 168c5577a5SOded GabbayTypically, a compute accelerator will belong to one of the following 178c5577a5SOded Gabbaycategories: 188c5577a5SOded Gabbay 198c5577a5SOded Gabbay- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA, 208c5577a5SOded Gabbay or an IP inside a SoC (e.g. laptop web camera). These devices 218c5577a5SOded Gabbay are typically configured using registers and can work with or without DMA. 228c5577a5SOded Gabbay 238c5577a5SOded Gabbay- Inference data-center - single/multi user devices in a large server. This 248c5577a5SOded Gabbay type of device can be stand-alone or an IP inside a SoC or a GPU. It will 258c5577a5SOded Gabbay have on-board DRAM (to hold the DL topology), DMA engines and 268c5577a5SOded Gabbay command submission queues (either kernel or user-space queues). 278c5577a5SOded Gabbay It might also have an MMU to manage multiple users and might also enable 288c5577a5SOded Gabbay virtualization (SR-IOV) to support multiple VMs on the same device. In 298c5577a5SOded Gabbay addition, these devices will usually have some tools, such as profiler and 308c5577a5SOded Gabbay debugger. 318c5577a5SOded Gabbay 328c5577a5SOded Gabbay- Training data-center - Similar to Inference data-center cards, but typically 338c5577a5SOded Gabbay have more computational power and memory b/w (e.g. HBM) and will likely have 348c5577a5SOded Gabbay a method of scaling-up/out, i.e. connecting to other training cards inside 358c5577a5SOded Gabbay the server or in other servers, respectively. 368c5577a5SOded Gabbay 378c5577a5SOded GabbayAll these devices typically have different runtime user-space software stacks, 388c5577a5SOded Gabbaythat are tailored-made to their h/w. In addition, they will also probably 398c5577a5SOded Gabbayinclude a compiler to generate programs to their custom-made computational 408c5577a5SOded Gabbayengines. Typically, the common layer in user-space will be the DL frameworks, 418c5577a5SOded Gabbaysuch as PyTorch and TensorFlow. 428c5577a5SOded Gabbay 438c5577a5SOded GabbaySharing code with DRM 448c5577a5SOded Gabbay===================== 458c5577a5SOded Gabbay 468c5577a5SOded GabbayBecause this type of devices can be an IP inside GPUs or have similar 478c5577a5SOded Gabbaycharacteristics as those of GPUs, the accel subsystem will use the 488c5577a5SOded GabbayDRM subsystem's code and functionality. i.e. the accel core code will 498c5577a5SOded Gabbaybe part of the DRM subsystem and an accel device will be a new type of DRM 508c5577a5SOded Gabbaydevice. 518c5577a5SOded Gabbay 528c5577a5SOded GabbayThis will allow us to leverage the extensive DRM code-base and 538c5577a5SOded Gabbaycollaborate with DRM developers that have experience with this type of 548c5577a5SOded Gabbaydevices. In addition, new features that will be added for the accelerator 558c5577a5SOded Gabbaydrivers can be of use to GPU drivers as well. 568c5577a5SOded Gabbay 578c5577a5SOded GabbayDifferentiation from GPUs 588c5577a5SOded Gabbay========================= 598c5577a5SOded Gabbay 608c5577a5SOded GabbayBecause we want to prevent the extensive user-space graphic software stack 618c5577a5SOded Gabbayfrom trying to use an accelerator as a GPU, the compute accelerators will be 628c5577a5SOded Gabbaydifferentiated from GPUs by using a new major number and new device char files. 638c5577a5SOded Gabbay 648c5577a5SOded GabbayFurthermore, the drivers will be located in a separate place in the kernel 658c5577a5SOded Gabbaytree - drivers/accel/. 668c5577a5SOded Gabbay 678c5577a5SOded GabbayThe accelerator devices will be exposed to the user space with the dedicated 688c5577a5SOded Gabbay261 major number and will have the following convention: 698c5577a5SOded Gabbay 70*183ebe03SBagas Sanjaya- device char files - /dev/accel/accel\* 71*183ebe03SBagas Sanjaya- sysfs - /sys/class/accel/accel\*/ 72*183ebe03SBagas Sanjaya- debugfs - /sys/kernel/debug/accel/\*/ 738c5577a5SOded Gabbay 748c5577a5SOded GabbayGetting Started 758c5577a5SOded Gabbay=============== 768c5577a5SOded Gabbay 778c5577a5SOded GabbayFirst, read the DRM documentation at Documentation/gpu/index.rst. 788c5577a5SOded GabbayNot only it will explain how to write a new DRM driver but it will also 798c5577a5SOded Gabbaycontain all the information on how to contribute, the Code Of Conduct and 808c5577a5SOded Gabbaywhat is the coding style/documentation. All of that is the same for the 818c5577a5SOded Gabbayaccel subsystem. 828c5577a5SOded Gabbay 838c5577a5SOded GabbaySecond, make sure the kernel is configured with CONFIG_DRM_ACCEL. 848c5577a5SOded Gabbay 858c5577a5SOded GabbayTo expose your device as an accelerator, two changes are needed to 868c5577a5SOded Gabbaybe done in your driver (as opposed to a standard DRM driver): 878c5577a5SOded Gabbay 888c5577a5SOded Gabbay- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's 898c5577a5SOded Gabbay driver_features field. It is important to note that this driver feature is 908c5577a5SOded Gabbay mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want 918c5577a5SOded Gabbay to expose both graphics and compute device char files should be handled by 928c5577a5SOded Gabbay two drivers that are connected using the auxiliary bus framework. 938c5577a5SOded Gabbay 948c5577a5SOded Gabbay- Change the open callback in your driver fops structure to accel_open(). 958c5577a5SOded Gabbay Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily 968c5577a5SOded Gabbay set the correct function operations pointers structure. 978c5577a5SOded Gabbay 988c5577a5SOded GabbayExternal References 998c5577a5SOded Gabbay=================== 1008c5577a5SOded Gabbay 1018c5577a5SOded Gabbayemail threads 1028c5577a5SOded Gabbay------------- 1038c5577a5SOded Gabbay 1048c5577a5SOded Gabbay* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022) 1058c5577a5SOded Gabbay* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022) 1068c5577a5SOded Gabbay 1078c5577a5SOded GabbayConference talks 1088c5577a5SOded Gabbay---------------- 1098c5577a5SOded Gabbay 1108c5577a5SOded Gabbay* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022) 111