1========================================== 2I915 VM_BIND feature design and use cases 3========================================== 4 5VM_BIND feature 6================ 7DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer 8objects (BOs) or sections of a BOs at specified GPU virtual addresses on a 9specified address space (VM). These mappings (also referred to as persistent 10mappings) will be persistent across multiple GPU submissions (execbuf calls) 11issued by the UMD, without user having to provide a list of all required 12mappings during each submission (as required by older execbuf mode). 13 14The VM_BIND/UNBIND calls allow UMDs to request a timeline out fence for 15signaling the completion of bind/unbind operation. 16 17VM_BIND feature is advertised to user via I915_PARAM_VM_BIND_VERSION. 18User has to opt-in for VM_BIND mode of binding for an address space (VM) 19during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension. 20 21VM_BIND/UNBIND ioctl calls executed on different CPU threads concurrently are 22not ordered. Furthermore, parts of the VM_BIND/UNBIND operations can be done 23asynchronously, when valid out fence is specified. 24 25VM_BIND features include: 26 27* Multiple Virtual Address (VA) mappings can map to the same physical pages 28 of an object (aliasing). 29* VA mapping can map to a partial section of the BO (partial binding). 30* Support capture of persistent mappings in the dump upon GPU error. 31* Support for userptr gem objects (no special uapi is required for this). 32 33TLB flush consideration 34------------------------ 35The i915 driver flushes the TLB for each submission and when an object's 36pages are released. The VM_BIND/UNBIND operation will not do any additional 37TLB flush. Any VM_BIND mapping added will be in the working set for subsequent 38submissions on that VM and will not be in the working set for currently running 39batches (which would require additional TLB flushes, which is not supported). 40 41Execbuf ioctl in VM_BIND mode 42------------------------------- 43A VM in VM_BIND mode will not support older execbuf mode of binding. 44The execbuf ioctl handling in VM_BIND mode differs significantly from the 45older execbuf2 ioctl (See struct drm_i915_gem_execbuffer2). 46Hence, a new execbuf3 ioctl has been added to support VM_BIND mode. (See 47struct drm_i915_gem_execbuffer3). The execbuf3 ioctl will not accept any 48execlist. Hence, no support for implicit sync. It is expected that the below 49work will be able to support requirements of object dependency setting in all 50use cases: 51 52"dma-buf: Add an API for exporting sync files" 53(https://lwn.net/Articles/859290/) 54 55The new execbuf3 ioctl only works in VM_BIND mode and the VM_BIND mode only 56works with execbuf3 ioctl for submission. All BOs mapped on that VM (through 57VM_BIND call) at the time of execbuf3 call are deemed required for that 58submission. 59 60The execbuf3 ioctl directly specifies the batch addresses instead of as 61object handles as in execbuf2 ioctl. The execbuf3 ioctl will also not 62support many of the older features like in/out/submit fences, fence array, 63default gem context and many more (See struct drm_i915_gem_execbuffer3). 64 65In VM_BIND mode, VA allocation is completely managed by the user instead of 66the i915 driver. Hence all VA assignment, eviction are not applicable in 67VM_BIND mode. Also, for determining object activeness, VM_BIND mode will not 68be using the i915_vma active reference tracking. It will instead use dma-resv 69object for that (See `VM_BIND dma_resv usage`_). 70 71So, a lot of existing code supporting execbuf2 ioctl, like relocations, VA 72evictions, vma lookup table, implicit sync, vma active reference tracking etc., 73are not applicable for execbuf3 ioctl. Hence, all execbuf3 specific handling 74should be in a separate file and only functionalities common to these ioctls 75can be the shared code where possible. 76 77VM_PRIVATE objects 78------------------- 79By default, BOs can be mapped on multiple VMs and can also be dma-buf 80exported. Hence these BOs are referred to as Shared BOs. 81During each execbuf submission, the request fence must be added to the 82dma-resv fence list of all shared BOs mapped on the VM. 83 84VM_BIND feature introduces an optimization where user can create BO which 85is private to a specified VM via I915_GEM_CREATE_EXT_VM_PRIVATE flag during 86BO creation. Unlike Shared BOs, these VM private BOs can only be mapped on 87the VM they are private to and can't be dma-buf exported. 88All private BOs of a VM share the dma-resv object. Hence during each execbuf 89submission, they need only one dma-resv fence list updated. Thus, the fast 90path (where required mappings are already bound) submission latency is O(1) 91w.r.t the number of VM private BOs. 92 93VM_BIND locking hierarchy 94------------------------- 95The locking design here supports the older (execlist based) execbuf mode, the 96newer VM_BIND mode, the VM_BIND mode with GPU page faults and possible future 97system allocator support (See `Shared Virtual Memory (SVM) support`_). 98The older execbuf mode and the newer VM_BIND mode without page faults manages 99residency of backing storage using dma_fence. The VM_BIND mode with page faults 100and the system allocator support do not use any dma_fence at all. 101 102VM_BIND locking order is as below. 103 1041) Lock-A: A vm_bind mutex will protect vm_bind lists. This lock is taken in 105 vm_bind/vm_unbind ioctl calls, in the execbuf path and while releasing the 106 mapping. 107 108 In future, when GPU page faults are supported, we can potentially use a 109 rwsem instead, so that multiple page fault handlers can take the read side 110 lock to lookup the mapping and hence can run in parallel. 111 The older execbuf mode of binding do not need this lock. 112 1132) Lock-B: The object's dma-resv lock will protect i915_vma state and needs to 114 be held while binding/unbinding a vma in the async worker and while updating 115 dma-resv fence list of an object. Note that private BOs of a VM will all 116 share a dma-resv object. 117 118 The future system allocator support will use the HMM prescribed locking 119 instead. 120 1213) Lock-C: Spinlock/s to protect some of the VM's lists like the list of 122 invalidated vmas (due to eviction and userptr invalidation) etc. 123 124When GPU page faults are supported, the execbuf path do not take any of these 125locks. There we will simply smash the new batch buffer address into the ring and 126then tell the scheduler run that. The lock taking only happens from the page 127fault handler, where we take lock-A in read mode, whichever lock-B we need to 128find the backing storage (dma_resv lock for gem objects, and hmm/core mm for 129system allocator) and some additional locks (lock-D) for taking care of page 130table races. Page fault mode should not need to ever manipulate the vm lists, 131so won't ever need lock-C. 132 133VM_BIND LRU handling 134--------------------- 135We need to ensure VM_BIND mapped objects are properly LRU tagged to avoid 136performance degradation. We will also need support for bulk LRU movement of 137VM_BIND objects to avoid additional latencies in execbuf path. 138 139The page table pages are similar to VM_BIND mapped objects (See 140`Evictable page table allocations`_) and are maintained per VM and needs to 141be pinned in memory when VM is made active (ie., upon an execbuf call with 142that VM). So, bulk LRU movement of page table pages is also needed. 143 144VM_BIND dma_resv usage 145----------------------- 146Fences needs to be added to all VM_BIND mapped objects. During each execbuf 147submission, they are added with DMA_RESV_USAGE_BOOKKEEP usage to prevent 148over sync (See enum dma_resv_usage). One can override it with either 149DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE usage during explicit object 150dependency setting. 151 152Note that DRM_I915_GEM_WAIT and DRM_I915_GEM_BUSY ioctls do not check for 153DMA_RESV_USAGE_BOOKKEEP usage and hence should not be used for end of batch 154check. Instead, the execbuf3 out fence should be used for end of batch check 155(See struct drm_i915_gem_execbuffer3). 156 157Also, in VM_BIND mode, use dma-resv apis for determining object activeness 158(See dma_resv_test_signaled() and dma_resv_wait_timeout()) and do not use the 159older i915_vma active reference tracking which is deprecated. This should be 160easier to get it working with the current TTM backend. 161 162Mesa use case 163-------------- 164VM_BIND can potentially reduce the CPU overhead in Mesa (both Vulkan and Iris), 165hence improving performance of CPU-bound applications. It also allows us to 166implement Vulkan's Sparse Resources. With increasing GPU hardware performance, 167reducing CPU overhead becomes more impactful. 168 169 170Other VM_BIND use cases 171======================== 172 173Long running Compute contexts 174------------------------------ 175Usage of dma-fence expects that they complete in reasonable amount of time. 176Compute on the other hand can be long running. Hence it is appropriate for 177compute to use user/memory fence (See `User/Memory Fence`_) and dma-fence usage 178must be limited to in-kernel consumption only. 179 180Where GPU page faults are not available, kernel driver upon buffer invalidation 181will initiate a suspend (preemption) of long running context, finish the 182invalidation, revalidate the BO and then resume the compute context. This is 183done by having a per-context preempt fence which is enabled when someone tries 184to wait on it and triggers the context preemption. 185 186User/Memory Fence 187~~~~~~~~~~~~~~~~~~ 188User/Memory fence is a <address, value> pair. To signal the user fence, the 189specified value will be written at the specified virtual address and wakeup the 190waiting process. User fence can be signaled either by the GPU or kernel async 191worker (like upon bind completion). User can wait on a user fence with a new 192user fence wait ioctl. 193 194Here is some prior work on this: 195https://patchwork.freedesktop.org/patch/349417/ 196 197Low Latency Submission 198~~~~~~~~~~~~~~~~~~~~~~~ 199Allows compute UMD to directly submit GPU jobs instead of through execbuf 200ioctl. This is made possible by VM_BIND is not being synchronized against 201execbuf. VM_BIND allows bind/unbind of mappings required for the directly 202submitted jobs. 203 204Debugger 205--------- 206With debug event interface user space process (debugger) is able to keep track 207of and act upon resources created by another process (debugged) and attached 208to GPU via vm_bind interface. 209 210GPU page faults 211---------------- 212GPU page faults when supported (in future), will only be supported in the 213VM_BIND mode. While both the older execbuf mode and the newer VM_BIND mode of 214binding will require using dma-fence to ensure residency, the GPU page faults 215mode when supported, will not use any dma-fence as residency is purely managed 216by installing and removing/invalidating page table entries. 217 218Page level hints settings 219-------------------------- 220VM_BIND allows any hints setting per mapping instead of per BO. Possible hints 221include placement and atomicity. Sub-BO level placement hint will be even more 222relevant with upcoming GPU on-demand page fault support. 223 224Page level Cache/CLOS settings 225------------------------------- 226VM_BIND allows cache/CLOS settings per mapping instead of per BO. 227 228Evictable page table allocations 229--------------------------------- 230Make pagetable allocations evictable and manage them similar to VM_BIND 231mapped objects. Page table pages are similar to persistent mappings of a 232VM (difference here are that the page table pages will not have an i915_vma 233structure and after swapping pages back in, parent page link needs to be 234updated). 235 236Shared Virtual Memory (SVM) support 237------------------------------------ 238VM_BIND interface can be used to map system memory directly (without gem BO 239abstraction) using the HMM interface. SVM is only supported with GPU page 240faults enabled. 241 242VM_BIND UAPI 243============= 244 245.. kernel-doc:: Documentation/gpu/rfc/i915_vm_bind.h 246