18c5577a5SOded Gabbay.. SPDX-License-Identifier: GPL-2.0
28c5577a5SOded Gabbay
38c5577a5SOded Gabbay============
48c5577a5SOded GabbayIntroduction
58c5577a5SOded Gabbay============
68c5577a5SOded Gabbay
78c5577a5SOded GabbayThe Linux compute accelerators subsystem is designed to expose compute
88c5577a5SOded Gabbayaccelerators in a common way to user-space and provide a common set of
98c5577a5SOded Gabbayfunctionality.
108c5577a5SOded Gabbay
118c5577a5SOded GabbayThese devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
128c5577a5SOded GabbayAlthough these devices are typically designed to accelerate
138c5577a5SOded GabbayMachine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
148c5577a5SOded Gabbayis not limited to handling these types of accelerators.
158c5577a5SOded Gabbay
168c5577a5SOded GabbayTypically, a compute accelerator will belong to one of the following
178c5577a5SOded Gabbaycategories:
188c5577a5SOded Gabbay
198c5577a5SOded Gabbay- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
208c5577a5SOded Gabbay  or an IP inside a SoC (e.g. laptop web camera). These devices
218c5577a5SOded Gabbay  are typically configured using registers and can work with or without DMA.
228c5577a5SOded Gabbay
238c5577a5SOded Gabbay- Inference data-center - single/multi user devices in a large server. This
248c5577a5SOded Gabbay  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
258c5577a5SOded Gabbay  have on-board DRAM (to hold the DL topology), DMA engines and
268c5577a5SOded Gabbay  command submission queues (either kernel or user-space queues).
278c5577a5SOded Gabbay  It might also have an MMU to manage multiple users and might also enable
288c5577a5SOded Gabbay  virtualization (SR-IOV) to support multiple VMs on the same device. In
298c5577a5SOded Gabbay  addition, these devices will usually have some tools, such as profiler and
308c5577a5SOded Gabbay  debugger.
318c5577a5SOded Gabbay
328c5577a5SOded Gabbay- Training data-center - Similar to Inference data-center cards, but typically
338c5577a5SOded Gabbay  have more computational power and memory b/w (e.g. HBM) and will likely have
348c5577a5SOded Gabbay  a method of scaling-up/out, i.e. connecting to other training cards inside
358c5577a5SOded Gabbay  the server or in other servers, respectively.
368c5577a5SOded Gabbay
378c5577a5SOded GabbayAll these devices typically have different runtime user-space software stacks,
388c5577a5SOded Gabbaythat are tailored-made to their h/w. In addition, they will also probably
398c5577a5SOded Gabbayinclude a compiler to generate programs to their custom-made computational
408c5577a5SOded Gabbayengines. Typically, the common layer in user-space will be the DL frameworks,
418c5577a5SOded Gabbaysuch as PyTorch and TensorFlow.
428c5577a5SOded Gabbay
438c5577a5SOded GabbaySharing code with DRM
448c5577a5SOded Gabbay=====================
458c5577a5SOded Gabbay
468c5577a5SOded GabbayBecause this type of devices can be an IP inside GPUs or have similar
478c5577a5SOded Gabbaycharacteristics as those of GPUs, the accel subsystem will use the
488c5577a5SOded GabbayDRM subsystem's code and functionality. i.e. the accel core code will
498c5577a5SOded Gabbaybe part of the DRM subsystem and an accel device will be a new type of DRM
508c5577a5SOded Gabbaydevice.
518c5577a5SOded Gabbay
528c5577a5SOded GabbayThis will allow us to leverage the extensive DRM code-base and
538c5577a5SOded Gabbaycollaborate with DRM developers that have experience with this type of
548c5577a5SOded Gabbaydevices. In addition, new features that will be added for the accelerator
558c5577a5SOded Gabbaydrivers can be of use to GPU drivers as well.
568c5577a5SOded Gabbay
578c5577a5SOded GabbayDifferentiation from GPUs
588c5577a5SOded Gabbay=========================
598c5577a5SOded Gabbay
608c5577a5SOded GabbayBecause we want to prevent the extensive user-space graphic software stack
618c5577a5SOded Gabbayfrom trying to use an accelerator as a GPU, the compute accelerators will be
628c5577a5SOded Gabbaydifferentiated from GPUs by using a new major number and new device char files.
638c5577a5SOded Gabbay
648c5577a5SOded GabbayFurthermore, the drivers will be located in a separate place in the kernel
658c5577a5SOded Gabbaytree - drivers/accel/.
668c5577a5SOded Gabbay
678c5577a5SOded GabbayThe accelerator devices will be exposed to the user space with the dedicated
688c5577a5SOded Gabbay261 major number and will have the following convention:
698c5577a5SOded Gabbay
70*183ebe03SBagas Sanjaya- device char files - /dev/accel/accel\*
71*183ebe03SBagas Sanjaya- sysfs             - /sys/class/accel/accel\*/
72*183ebe03SBagas Sanjaya- debugfs           - /sys/kernel/debug/accel/\*/
738c5577a5SOded Gabbay
748c5577a5SOded GabbayGetting Started
758c5577a5SOded Gabbay===============
768c5577a5SOded Gabbay
778c5577a5SOded GabbayFirst, read the DRM documentation at Documentation/gpu/index.rst.
788c5577a5SOded GabbayNot only it will explain how to write a new DRM driver but it will also
798c5577a5SOded Gabbaycontain all the information on how to contribute, the Code Of Conduct and
808c5577a5SOded Gabbaywhat is the coding style/documentation. All of that is the same for the
818c5577a5SOded Gabbayaccel subsystem.
828c5577a5SOded Gabbay
838c5577a5SOded GabbaySecond, make sure the kernel is configured with CONFIG_DRM_ACCEL.
848c5577a5SOded Gabbay
858c5577a5SOded GabbayTo expose your device as an accelerator, two changes are needed to
868c5577a5SOded Gabbaybe done in your driver (as opposed to a standard DRM driver):
878c5577a5SOded Gabbay
888c5577a5SOded Gabbay- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
898c5577a5SOded Gabbay  driver_features field. It is important to note that this driver feature is
908c5577a5SOded Gabbay  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
918c5577a5SOded Gabbay  to expose both graphics and compute device char files should be handled by
928c5577a5SOded Gabbay  two drivers that are connected using the auxiliary bus framework.
938c5577a5SOded Gabbay
948c5577a5SOded Gabbay- Change the open callback in your driver fops structure to accel_open().
958c5577a5SOded Gabbay  Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
968c5577a5SOded Gabbay  set the correct function operations pointers structure.
978c5577a5SOded Gabbay
988c5577a5SOded GabbayExternal References
998c5577a5SOded Gabbay===================
1008c5577a5SOded Gabbay
1018c5577a5SOded Gabbayemail threads
1028c5577a5SOded Gabbay-------------
1038c5577a5SOded Gabbay
1048c5577a5SOded Gabbay* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
1058c5577a5SOded Gabbay* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
1068c5577a5SOded Gabbay
1078c5577a5SOded GabbayConference talks
1088c5577a5SOded Gabbay----------------
1098c5577a5SOded Gabbay
1108c5577a5SOded Gabbay* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
111