Revision tags: v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57 |
|
#
da1ea128 |
| 06-Aug-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Free the fence after a fence-chain lookup failure
If dma_fence_chain_find_seqno() reports an error, it does so in its preamble before it disposes of the input fence. On handling the er
drm/i915/gem: Free the fence after a fence-chain lookup failure
If dma_fence_chain_find_seqno() reports an error, it does so in its preamble before it disposes of the input fence. On handling the error, we need to drop the reference to the fence.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2292 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 13149e8bafc4 ("drm/i915: add syncobj timeline support") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200806161056.17593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v5.4.56, v5.8, v5.7.12, v5.4.55 |
|
#
5d934137 |
| 31-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Export a preallocate variant of i915_active_acquire()
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915
drm/i915: Export a preallocate variant of i915_active_acquire()
Sometimes we have to be very careful not to allocate underneath a mutex (or spinlock) and yet still want to track activity. Enter i915_active_acquire_for_context(). This raises the activity counter on i915_active prior to use and ensures that the fence-tree contains a slot for the context.
v2: Refactor active_lookup() so it can be called again before/after locking to resolve contention. Since we protect the rbtree until we idle, we can do a lockfree lookup, with the caveat that if another thread performs a concurrent insertion, the rotations from the insert may cause us to not find our target. A second pass holding the treelock will find the target if it exists, or the place to perform our insertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200731085015.32368-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v5.7.11, v5.4.54 |
|
#
27a5dcfe |
| 28-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Remove disordered per-file request list for throttling
I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline an
drm/i915/gem: Remove disordered per-file request list for throttling
I915_GEM_THROTTLE dates back to the time before contexts where there was just a single engine, and therefore a single timeline and request list globally. That request list was in execution/retirement order, and so walking it to find a particular aged request made sense and could be split per file.
That is no more. We now have many timelines with a file, as many as the user wants to construct (essentially per-engine, per-context). Each of those run independently and so make the single list futile. Remove the disordered list, and iterate over all the timelines to find a request to wait on in each to satisfy the criteria that the CPU is no more than 20ms ahead of its oldest request.
It should go without saying that the I915_GEM_THROTTLE ioctl is no longer used as the primary means of throttling, so it makes sense to push the complication into the ioctl where it only impacts upon its few irregular users, rather than the execbuf/retire where everybody has to pay the cost. Fortunately, the few users do not create vast amount of contexts, so the loops over contexts/engines should be concise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
#
13149e8b |
| 04-Aug-2020 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: add syncobj timeline support
Introduces a new parameters to execbuf so that we can specify syncobj handles as well as timeline points.
v2: Reuse i915_user_extension_fn
v3: Check that the
drm/i915: add syncobj timeline support
Introduces a new parameters to execbuf so that we can specify syncobj handles as well as timeline points.
v2: Reuse i915_user_extension_fn
v3: Check that the chained extension is only present once (Chris)
v4: Check that dma_fence_chain_find_seqno returns a non NULL fence (Lionel)
v5: Use BIT_ULL (Chris)
v6: Fix issue with already signaled timeline points, dma_fence_chain_find_seqno() setting fence to NULL (Chris)
v7: Report ENOENT with invalid syncobj handle (Lionel)
v8: Check for out of order timeline point insertion (Chris)
v9: After explanations on https://lists.freedesktop.org/archives/dri-devel/2019-August/229287.html drop the ordering check from v8 (Lionel)
v10: Set first extension enum item to 1 (Jason)
v11: Rebase
v12: Allow multiple extension nodes of timeline syncobj (Chris)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-authored-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v11) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200804085954.350343-3-lionel.g.landwerlin@intel.com Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
#
cda9edd0 |
| 04-Aug-2020 |
Lionel Landwerlin <lionel.g.landwerlin@intel.com> |
drm/i915: introduce a mechanism to extend execbuf2
We're planning to use this for a couple of new feature where we need to provide additional parameters to execbuf.
v2: Check for invalid flags in e
drm/i915: introduce a mechanism to extend execbuf2
We're planning to use this for a couple of new feature where we need to provide additional parameters to execbuf.
v2: Check for invalid flags in execbuffer2 (Lionel)
v3: Rename I915_EXEC_EXT -> I915_EXEC_USE_EXTENSIONS (Chris)
v4: Rebase Move array fence parsing in i915_gem_do_execbuffer()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200804085954.350343-2-lionel.g.landwerlin@intel.com Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
Revision tags: v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51 |
|
#
792592e7 |
| 07-Jul-2020 |
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> |
drm/i915: Move the engine mask to intel_gt_info
Since the engines belong to the GT, move the runtime-updated list of available engines to the intel_gt struct. The original mask has been renamed to i
drm/i915: Move the engine mask to intel_gt_info
Since the engines belong to the GT, move the runtime-updated list of available engines to the intel_gt struct. The original mask has been renamed to indicate it contains the maximum engine list that can be found on a matching device.
In preparation for other info being moved to the gt in follow up patches (sseu), introduce an intel_gt_info structure to group all gt-related runtime info.
v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com
show more ...
|
#
f7ce8639 |
| 02-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Split the context's obj:vma lut into its own mutex
Rather than reuse the common ctx->mutex for locking the execbuffer LUT, split it into its own lock to avoid being taken [as part of c
drm/i915/gem: Split the context's obj:vma lut into its own mutex
Rather than reuse the common ctx->mutex for locking the execbuffer LUT, split it into its own lock to avoid being taken [as part of ctx->mutex] at inappropriate times. In particular to avoid the inversion from taking the timeline->mutex for the whole execbuf submission in the next patch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200703004306.11117-1-chris@chris-wilson.co.uk
show more ...
|
#
096a42dd |
| 01-Jul-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Move obj->lut_list under its own lock
The obj->lut_list is traversed when the object is closed as the file table is destroyed during process termination. As this occurs before we kill
drm/i915/gem: Move obj->lut_list under its own lock
The obj->lut_list is traversed when the object is closed as the file table is destroyed during process termination. As this occurs before we kill any outstanding context if, due to some bug or another, the closure is blocked, then we fail to shootdown any inflight operations potentially leaving the GPU spinning forever. As we only need to guard the list against concurrent closures and insertions, the hold is short and merits being treated as a simple spinlock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200701084439.17025-1-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1 |
|
#
d7466a5a |
| 04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Mark the buffer pool as active for the cmdparser
If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would att
drm/i915/gem: Mark the buffer pool as active for the cmdparser
If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would attempt to clean up the cmdparser early. While we held active references to the vma being parsed and constructed, we did not hold an active reference for the buffer pool itself. The result was that an interrupted execbuf could still have run the cmdparser pipeline, but since the buffer pool was idle, its target vma could have been recycled.
Note this problem only occurs if the cmdparser is running async due to pipelined waits on busy fences, and the execbuf is interrupted.
Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Fixes: 16e87459673a ("drm/i915/gt: Move the batch buffer pool from the engine to the gt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604103751.18816-1-chris@chris-wilson.co.uk (cherry picked from commit 57a78ca4eceab1ecb0299fba8a10211289329889) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
#
7ac2d253 |
| 05-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Delete unused code
Unused as of commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), but left behind.
>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:933:21: error: unu
drm/i915/gem: Delete unused code
Unused as of commit 9e0f9464e2ab ("drm/i915/gem: Async GPU relocations only"), but left behind.
>> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:933:21: error: unused function 'unmask_page' [-Werror,-Wunused-function] static inline void *unmask_page(unsigned long p) ^ >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:938:28: error: unused function 'unmask_flags' [-Werror,-Wunused-function] static inline unsigned int unmask_flags(unsigned long p) ^ >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:945:33: error: unused function 'cache_to_ggtt' [-Werror,-Wunused-function] static inline struct i915_ggtt *cache_to_ggtt(struct reloc_cache *cache)
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200605200357.13069-1-chris@chris-wilson.co.uk
show more ...
|
#
9e0f9464 |
| 04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Async GPU relocations only
Reduce the 3 relocation paths down to the single path that accommodates all. The primary motivation for this is to guard the relocations with a natural fence
drm/i915/gem: Async GPU relocations only
Reduce the 3 relocation paths down to the single path that accommodates all. The primary motivation for this is to guard the relocations with a natural fence (derived from the i915_request used to write the relocation from the GPU).
The tradeoff in using async gpu relocations is that it increases latency over using direct CPU relocations, for the cases where the target is idle and accessible by the CPU. The benefit is greatly reduced lock contention and improved concurrency by pipelining.
Note that forcing the async gpu relocations does reveal a few issues they have. Firstly, is that they are visible as writes to gem_busy, causing to mark some buffers are being to written to by the GPU even though userspace only reads. Secondly is that, in combination with the cmdparser, they can cause priority inversions. This should be the case where the work is being put into a common workqueue losing our priority information and so being executed in FIFO from the worker, denying us the opportunity to reorder the requests afterwards.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604211457.19696-1-chris@chris-wilson.co.uk
show more ...
|
#
57a78ca4 |
| 04-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Mark the buffer pool as active for the cmdparser
If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would att
drm/i915/gem: Mark the buffer pool as active for the cmdparser
If the execbuf is interrupted after building the cmdparser pipeline, and before we commit to submitting the request to HW, we would attempt to clean up the cmdparser early. While we held active references to the vma being parsed and constructed, we did not hold an active reference for the buffer pool itself. The result was that an interrupted execbuf could still have run the cmdparser pipeline, but since the buffer pool was idle, its target vma could have been recycled.
Note this problem only occurs if the cmdparser is running async due to pipelined waits on busy fences, and the execbuf is interrupted.
Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser") Fixes: 16e87459673a ("drm/i915/gt: Move the batch buffer pool from the engine to the gt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200604103751.18816-1-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.4.44 |
|
#
5a833995 |
| 02-Jun-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop i915_request.i915 backpointer
We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the mos
drm/i915: Drop i915_request.i915 backpointer
We infrequently use the direct i915 backpointer from the i915_request, so do we really need to waste the space in the struct for it? 8 bytes from the most frequently allocated struct vs an 3 bytes and pointer chasing in using rq->engine->i915?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200602220953.21178-1-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.7, v5.4.43 |
|
#
ea97c4ca |
| 25-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Suppress some random warnings
Leave the error propagation in place, but limit the warnings to only show up in CI if the unlikely errors are hit.
Signed-off-by: Chris Wilson <chris@chr
drm/i915/gem: Suppress some random warnings
Leave the error propagation in place, but limit the warnings to only show up in CI if the unlikely errors are hit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200525141957.3061-2-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.4.42, v5.4.41, v5.4.40, v5.4.39 |
|
#
6db20e27 |
| 04-May-2020 |
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> |
drm/i915/gem: Prefer drm_WARN* over WARN*
struct drm_device specific drm_WARN* macros include device information in the backtrace, so we know what device the warnings originate from.
Prefer drm_WAR
drm/i915/gem: Prefer drm_WARN* over WARN*
struct drm_device specific drm_WARN* macros include device information in the backtrace, so we know what device the warnings originate from.
Prefer drm_WARN* over WARN* at places where struct drm_device pointer can be extracted.
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504181600.18503-6-pankaj.laxminarayan.bharadiya@intel.com
show more ...
|
#
18e4af04 |
| 13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Drop no-semaphore boosting
Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next
drm/i915: Drop no-semaphore boosting
Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
show more ...
|
#
889333c7 |
| 13-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Remove redundant exec_fence
Since there can only be one of in_fence/exec_fence, just use the single in_fence local.
v2: Consolidate lookup
Signed-off-by: Chris Wilson <chris@chris-wi
drm/i915/gem: Remove redundant exec_fence
Since there can only be one of in_fence/exec_fence, just use the single in_fence local.
v2: Consolidate lookup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513180937.28992-1-chris@chris-wilson.co.uk
show more ...
|
#
eec39e44 |
| 07-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915: Remove wait priority boosting
Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled nex
drm/i915: Remove wait priority boosting
Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams.
However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek.
Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
show more ...
|
#
e3d29130 |
| 04-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but left them writing to a physical address. The notes suggest that the primary
drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but left them writing to a physical address. The notes suggest that the primary reason would be so that the writes were cache coherent, as the CPU cache uses physical tagging. As such we did not implement the legacy variant of MI_STORE_DATA_IMM and so left all the relocations synchronous -- but with a small function to convert from the vma address into the physical address, we can implement asynchronous relocs on these older arches, fixing up a few tests that require them.
In order to be able to test the legacy paths, refactor the gpu relocations so that we can hook them up to a selftest.
v2: Use an array of offsets not enum labels for the selftest v3: Refactor the common igt_hexdump()
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504140629.28240-1-chris@chris-wilson.co.uk
show more ...
|
#
f5b62bdb |
| 04-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Specify address type for chained reloc batches
It is required that a chained batch be in the same address domain as its parent, and also that must be specified in the command for earli
drm/i915/gem: Specify address type for chained reloc batches
It is required that a chained batch be in the same address domain as its parent, and also that must be specified in the command for earlier gen as it is not inferred from the chaining until gen6.
Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200504125149.4396-1-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.4.38, v5.4.37 |
|
#
6f576d62 |
| 01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Try an alternate engine for relocations
If at first we don't succeed, try try again.
Not all engines may support the MI ops we need to perform asynchronous relocation patching, and so
drm/i915/gem: Try an alternate engine for relocations
If at first we don't succeed, try try again.
Not all engines may support the MI ops we need to perform asynchronous relocation patching, and so we end up falling back to a synchronous operation that has a liability of blocking. However, Tvrtko pointed out we don't need to use the same engine to perform the relocations as we are planning to execute the execbuf on, and so if we switch over to a working engine, we can perform the relocation asynchronously. The user execbuf will be queued after the relocations by virtue of fencing.
This patch creates a new context per execbuf requiring asynchronous relocations on an unusable engines. This is perhaps a bit excessive and can be ameliorated by a small context cache, but for the moment we only need it for working around a little used engine on Sandybridge, and only if relocations are actually required to an active batch buffer.
Now we just need to teach the relocation code to handle physical addressing for gen2/3, and we should then have universal support!
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Testcase: igt/gem_exec_reloc/basic-spin # snb Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-3-chris@chris-wilson.co.uk
show more ...
|
#
0e97fbb0 |
| 01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Use a single chained reloc batches for a single execbuf
As we can now keep chaining together a relocation batch to process any number of relocations, we can keep building that relocati
drm/i915/gem: Use a single chained reloc batches for a single execbuf
As we can now keep chaining together a relocation batch to process any number of relocations, we can keep building that relocation batch for all of the target vma. This avoiding emitting a new request into the ring for each target, consuming precious ring space and a potential stall.
v2: Propagate the failure from submitting the relocation batch.
Testcase: igt/gem_exec_reloc/basic-wide-active Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-2-chris@chris-wilson.co.uk
show more ...
|
#
964a9b0f |
| 01-May-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gem: Use chained reloc batches
The ring is a precious resource: we anticipate to only use a few hundred bytes for a request, and only try to reserve that before we start. If we go beyond ou
drm/i915/gem: Use chained reloc batches
The ring is a precious resource: we anticipate to only use a few hundred bytes for a request, and only try to reserve that before we start. If we go beyond our guess in building the request, then instead of waiting at the start of execbuf before we hold any locks or other resources, we may trigger a wait inside a critical region. One example is in using gpu relocations, where currently we emit a new MI_BB_START from the ring every time we overflow a page of relocation entries. However, instead of insert the command into the precious ring, we can chain the next page of relocation entries as MI_BB_START from the end of the previous.
v2: Delay the emit_bb_start until after all the chained vma synchronisation is complete. Since the buffer pool batches are idle, this _should_ be a no-op, but one day we may some fancy async GPU bindings for new vma!
v3: Use pool/batch consitently, once we start thinking in terms of the batch vma, use batch->obj. v4: Explain the magic number 4.
Tvrtko spotted that we lose propagation of the error for failing to submit the relocation request; that's easier to fix up in the next patch.
Testcase: igt/gem_exec_reloc/basic-many-active Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-1-chris@chris-wilson.co.uk
show more ...
|
Revision tags: v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31 |
|
#
b44f6873 |
| 03-Apr-2020 |
Christophe Leroy <christophe.leroy@c-s.fr> |
drm/i915/gem: Replace user_access_begin by user_write_access_begin
When i915_gem_execbuffer2_ioctl() is using user_access_begin(), that's only to perform unsafe_put_user() so use user_write_access_b
drm/i915/gem: Replace user_access_begin by user_write_access_begin
When i915_gem_execbuffer2_ioctl() is using user_access_begin(), that's only to perform unsafe_put_user() so use user_write_access_begin() in order to only open write access.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ebf1250b6d4f351469fb339e5399d8b92aa8a1c1.1585898438.git.christophe.leroy@c-s.fr
show more ...
|
#
16e87459 |
| 30-Apr-2020 |
Chris Wilson <chris@chris-wilson.co.uk> |
drm/i915/gt: Move the batch buffer pool from the engine to the gt
Since the introduction of 'soft-rc6', we aim to park the device quickly and that results in frequent idling of the whole device. Cur
drm/i915/gt: Move the batch buffer pool from the engine to the gt
Since the introduction of 'soft-rc6', we aim to park the device quickly and that results in frequent idling of the whole device. Currently upon idling we free the batch buffer pool, and so this renders the cache ineffective for many workloads. If we want to have an effective cache of recently allocated buffers available for reuse, we need to decouple that cache from the engine powermanagement and make it timer based. As there is no reason then to keep it within the engine (where it once made retirement order easier to track), we can move it up the hierarchy to the owner of the memory allocations.
v2: Hook up to debugfs/drop_caches to clear the cache on demand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200430111819.10262-2-chris@chris-wilson.co.uk
show more ...
|