History log of /openbmc/linux/drivers/gpu/drm/i915/i915_gpu_error.h (Results 1 – 25 of 118)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4
# cd378c37 01-Dec-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use internal class when counting engine resets

[ Upstream commit 1f721a93a528268fa97875cff515d1fcb69f4f44 ]

Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi

drm/i915: Use internal class when counting engine resets

[ Upstream commit 1f721a93a528268fa97875cff515d1fcb69f4f44 ]

Commit 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class")
made the GSC0 engine not have a valid uabi class and so broke the engine
reset counting, which in turn was made class based in cb823ed9915b ("drm/i915/gt: Use intel_gt as the primary object for handling resets").

Despite the title and commit text of the latter is not mentioning it (and
has left the storage array incorrectly sized), tracking by class, despite
it adding aliasing in hypthotetical multi-tile systems, is handy for
virtual engines which for instance do not have a valid engine->id.

Therefore we keep that but just change it to use the internal class which
is always valid. We also add a helper to increment the count, which
aligns with the existing getter.

What was broken without this fix were out of bounds reads every time a
reset would happen on the GSC0 engine, or during selftests when storing
and cross-checking the counts in igt_live_test_begin and
igt_live_test_end.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 503579448db9 ("drm/i915/gsc: Mark internal GSC engine with reserved uabi class")
[tursulin: fixed Fixes tag]
Reported-by: Alan Previn Teres Alexis <alan.previn.teres.alexis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231201122109.729006-2-tvrtko.ursulin@linux.intel.com
(cherry picked from commit cf9cb028ac56696ff879af1154c4b2f0b12701fd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36
# 4ae7eb92 27-Jun-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: separate display info printing from the rest

Add new function intel_display_device_info_print() and print the display
device info there instead of intel_device_info_print(). This also fixe

drm/i915: separate display info printing from the rest

Add new function intel_display_device_info_print() and print the display
device info there instead of intel_device_info_print(). This also fixes
the display runtime info printing to use the actual runtime info instead
of the static defaults.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/30d4f93c58839bc9312b43423cd43bc0ef655a35.1687878757.git.jani.nikula@intel.com

show more ...


Revision tags: v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25
# 6197cff3 18-Apr-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Dump error capture to kernel log

This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is e

drm/i915: Dump error capture to kernel log

This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is especially useful for things like retrieving
GuC logs as GuC operation can't be tracked by adding printk or ftrace
entries.

v2: Add CONFIG_DRM_I915_DEBUG_GEM wrapper (review feedback by Rodrigo).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230418181744.3251240-2-John.C.Harrison@Intel.com

show more ...


Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17
# c8a76df6 11-Mar-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Include timeline seqno in error capture

The seqno value actually written out to memory is no longer in the
regular HWSP. Instead, it is now in its own private timeline buffer.
Thus, it is

drm/i915: Include timeline seqno in error capture

The seqno value actually written out to memory is no longer in the
regular HWSP. Instead, it is now in its own private timeline buffer.
Thus, it is no longer visible in an error capture. So, explicitly read
the value and include that in the capture.

v2: %d -> %u (Alan)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-4-John.C.Harrison@Intel.com

show more ...


Revision tags: v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9
# 583ebae7 26-Jan-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Rename GuC register state capture node to be more obvious

The GuC specific register state entry in the error capture object was
just called 'capture'. Although the companion 'node' ent

drm/i915/guc: Rename GuC register state capture node to be more obvious

The GuC specific register state entry in the error capture object was
just called 'capture'. Although the companion 'node' entry was called
'guc_capture_node'. Rename the base entry to be 'guc_capture' instead
so that it is a) more consistent and b) more obvious what it is.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-9-John.C.Harrison@Intel.com

show more ...


Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58
# c5de70f6 27-Jul-2022 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Record CTB info in error logs

When debugging GuC communication issues, it is useful to have the CTB
info available. So add the state and buffer contents to the error
capture log.

Also

drm/i915/guc: Record CTB info in error logs

When debugging GuC communication issues, it is useful to have the CTB
info available. So add the state and buffer contents to the error
capture log.

Also, add a sub-structure for the GuC specific error capture info as
it is now becoming numerous.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-5-John.C.Harrison@Intel.com

show more ...


# 368d179a 27-Jul-2022 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Add GuC <-> kernel time stamp translation information

It is useful to be able to match GuC events to kernel events when
looking at the GuC log. That requires being able to convert GuC

drm/i915/guc: Add GuC <-> kernel time stamp translation information

It is useful to be able to match GuC events to kernel events when
looking at the GuC log. That requires being able to convert GuC
timestamps to kernel time. So, when dumping error captures and/or GuC
logs, include a stamp in both time zones plus the clock frequency.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-4-John.C.Harrison@Intel.com

show more ...


Revision tags: v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45
# b729cfee 01-Jun-2022 Stuart Summers <stuart.summers@intel.com>

drm/i915: Add extra registers to GPU error dump

Our internal teams have identified a few additional engine registers
that are worth inspecting in error state dumps during development &
debug. Let's

drm/i915: Add extra registers to GPU error dump

Our internal teams have identified a few additional engine registers
that are worth inspecting in error state dumps during development &
debug. Let's capture and print them as part of our error dump.

For simplicity we'll just dump these registers on gen11 and beyond.
Most of these registers have existed since earlier platforms (e.g., gen6
or gen7) but were initially introduced only for a subset of the
platforms' engines; gen11 seems to be where they became available on all
engines.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220601210646.615946-1-matthew.d.roper@intel.com

show more ...


Revision tags: v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33
# bb6287cb 01-Apr-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Track context current active time

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting

drm/i915: Track context current active time

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting in conjunction with already tracked pphwsp accumulated
runtime.

The latter is only updated on context save so does not give us visibility
to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the
new ce->stats member and updated under the seqlock. This provides the
ability to atomically read out accumulated plus active runtime.

v2:
* Rename and make __intel_context_get_active_time unlocked.

v3:
* Use GRAPHICS_VER.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-6-tvrtko.ursulin@linux.intel.com

show more ...


# 5efde05f 30-Mar-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915/dmc: abstract GPU error state dump

Only intel_dmc.c should be accessing dmc details directly.

Need to add an i915_error_printf() stub for
CONFIG_DRM_I915_CAPTURE_ERROR=n.

v2: Add the stub

drm/i915/dmc: abstract GPU error state dump

Only intel_dmc.c should be accessing dmc details directly.

Need to add an i915_error_printf() stub for
CONFIG_DRM_I915_CAPTURE_ERROR=n.

v2: Add the stub (kernel test robot <lkp@intel.com>)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> # v1
Link: https://patchwork.freedesktop.org/patch/msgid/20220330113417.220964-1-jani.nikula@intel.com

show more ...


Revision tags: v5.15.32, v5.15.31
# a0f1f7b4 21-Mar-2022 Alan Previn <alan.previn.teres.alexis@intel.com>

drm/i915/guc: Print the GuC error capture output register list.

Print the GuC captured error state register list (string names
and values) when gpu_coredump_state printout is invoked via
the i915 de

drm/i915/guc: Print the GuC error capture output register list.

Print the GuC captured error state register list (string names
and values) when gpu_coredump_state printout is invoked via
the i915 debugfs for flushing the gpu error-state that was
captured prior.

Since GuC could have reported multiple engine register dumps
in a single notification event, parse the captured data
(appearing as a stream of structures) to identify each dump as
a different 'engine-capture-group-output'.

Finally, for each 'engine-capture-group-output' that is found,
verify if the engine register dump corresponds to the
engine_coredump content that was previously populated by the
i915_gpu_coredump function. That function would have copied
the context's vma's including the bacth buffer during the
G2H-context-reset notification that occurred earlier. Perform
this verification check by comparing guc_id, lrca and engine-
instance obtained from the 'engine-capture-group-output' vs a
copy of that same info taken during i915_gpu_coredump. If
they match, then print those vma's as well (such as the batch
buffers).

NOTE: the output format was verified using the gem_exec_capture
IGT test.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-14-alan.previn.teres.alexis@intel.com

show more ...


# a6f0f9cf 21-Mar-2022 Alan Previn <alan.previn.teres.alexis@intel.com>

drm/i915/guc: Plumb GuC-capture into gpu_coredump

Add a flags parameter through all of the coredump creation
functions. Add a bitmask flag to indicate if the top
level gpu_coredump event is triggere

drm/i915/guc: Plumb GuC-capture into gpu_coredump

Add a flags parameter through all of the coredump creation
functions. Add a bitmask flag to indicate if the top
level gpu_coredump event is triggered in response to
a GuC context reset notification.

Using that flag, ensure all coredump functions that
read or print mmio-register values related to work submission
or command-streamer engines are skipped and replaced with
a calls guc-capture module equivalent functions to retrieve
or print the register dump.

While here, split out display related register reading
and printing into its own function that is called agnostic
to whether GuC had triggered the reset.

For now, introduce an empty printing function that can
filled in on a subsequent patch just to handle formatting.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321164527.2500062-13-alan.previn.teres.alexis@intel.com

show more ...


Revision tags: v5.17, v5.15.30, v5.15.29, v5.15.28
# 239bbb2f 11-Mar-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/gt: Remove GEN12_SFC_DONE_MAX from register defs header

We shouldn't really be keeping track of how many SFC_DONE registers
our platforms can have, but rather how many SFC hardware units th

drm/i915/gt: Remove GEN12_SFC_DONE_MAX from register defs header

We shouldn't really be keeping track of how many SFC_DONE registers
our platforms can have, but rather how many SFC hardware units there can
be (each SFC unit will have one corresponding SFC_DONE register). So
drop the stray GEN12_SFC_DONE_MAX definition we had in the register
definition file and replace it with an I915_MAX_SFC that follows the
pattern we use for other hardware units. Note that our hardware has a
2:1:1 ratio of VD:VE:SFC, and as far as we know that pattern should
carry forward to future platforms, so we'll define it as #VCS/2.

Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220311062835.163744-1-matthew.d.roper@intel.com

show more ...


Revision tags: v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23
# f9bf77df 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: move i915_reset_count()/i915_reset_engine_count() out of i915_drv.h

It doesn't help much, as i915_drv.h includes i915_gpu_error.h, but it's
a step in the right direction.

Cc: Tvrtko Ursul

drm/i915: move i915_reset_count()/i915_reset_engine_count() out of i915_drv.h

It doesn't help much, as i915_drv.h includes i915_gpu_error.h, but it's
a step in the right direction.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/7af2f698a320c1efb0563f56a432c6d122d40b94.1644507885.git.jani.nikula@intel.com

show more ...


Revision tags: v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2
# e45b98ba 08-Nov-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Avoid allocating a page array for the gpu coredump

The gpu coredump typically takes place in a dma_fence signalling
critical path, and hence can't use GFP_KERNEL allocations, as that
means

drm/i915: Avoid allocating a page array for the gpu coredump

The gpu coredump typically takes place in a dma_fence signalling
critical path, and hence can't use GFP_KERNEL allocations, as that
means we might hit deadlocks under memory pressure. However
changing to __GFP_KSWAPD_RECLAIM which will be done in an upcoming
patch will instead mean a lower chance of the allocation succeeding.
In particular large contigous allocations like the coredump page
vector.
Remove the page vector in favor of a linked list of single pages.
Use the page lru list head as the list link, as the page owner is
allowed to do that.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-2-thomas.hellstrom@linux.intel.com

show more ...


Revision tags: v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39
# e52e4a31 19-May-2021 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

gpu: drm: replace occurrences of invalid character

There are some places at drm that ended receiving a
REPLACEMENT CHARACTER U+fffd ('�'), probably because of
some bad charset conversion.

Fix them

gpu: drm: replace occurrences of invalid character

There are some places at drm that ended receiving a
REPLACEMENT CHARACTER U+fffd ('�'), probably because of
some bad charset conversion.

Fix them by using what it seems to be the proper
character.

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e606930c73029f16673849c57acac061dd923866.1621412009.git.mchehab+huawei@kernel.org

show more ...


Revision tags: v5.4.119, v5.10.36, v5.10.35
# e7c46e43 05-May-2021 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Nuke display error state

I doubt anyone has used the display error state since CS flips
went the way of the dodo. Just nuke it.

It might be semi interesting to have something like this fo

drm/i915: Nuke display error state

I doubt anyone has used the display error state since CS flips
went the way of the dodo. Just nuke it.

It might be semi interesting to have something like this for
FIFO underruns and the like, but as it stands this wouldn't
provide a sufficient amount of information. So would need
an extensive rewrite anyway.

The lockless power well handling is also racy, so this could
just be contributing noise to test results if we end up
accessing something with the relevant power well already
disabled.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210505191140.14215-1-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>

show more ...


Revision tags: v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10
# bda30024 04-Nov-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Improve record of hung engines in error state

Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Impr

drm/i915: Improve record of hung engines in error state

Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.

Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.

v2:
* Stub prototype. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com

show more ...


Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51
# 792592e7 07-Jul-2020 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: Move the engine mask to intel_gt_info

Since the engines belong to the GT, move the runtime-updated list of
available engines to the intel_gt struct. The original mask has been
renamed to i

drm/i915: Move the engine mask to intel_gt_info

Since the engines belong to the GT, move the runtime-updated list of
available engines to the intel_gt struct. The original mask has been
renamed to indicate it contains the maximum engine list that can be
found on a matching device.

In preparation for other info being moved to the gt in follow up patches
(sseu), introduce an intel_gt_info structure to group all gt-related
runtime info.

v2: s/max_engine_mask/platform_engine_mask (tvrtko), fix selftest

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200708003952.21831-5-daniele.ceraolospurio@intel.com

show more ...


Revision tags: v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40
# f1e79c7e 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

drm/i915: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variabl

drm/i915: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507185408.GA14561@embeddedor

show more ...


Revision tags: v5.4.39, v5.4.38, v5.4.37, v5.4.36
# 9669a507 24-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop rq->ring->vma peeking from error capture

We only hold the active spinlock while dumping the error state, and this
does not prevent another thread from retiring the request -- as it is

drm/i915: Drop rq->ring->vma peeking from error capture

We only hold the active spinlock while dumping the error state, and this
does not prevent another thread from retiring the request -- as it is
quite possible that despite us capturing the current state, the GPU has
completed the request. As such, it is dangerous to dereference state
below the request as it may already be freed, and the simplest way to
avoid the danger is not include it in the error state.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1788
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424191410.27570-1-chris@chris-wilson.co.uk

show more ...


Revision tags: v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21
# 1883a0a4 16-Feb-2020 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Track hw reported context runtime

GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busynes

drm/i915: Track hw reported context runtime

GPU saves accumulated context runtime (in CS timestamp units) in PPHWSP
which will be useful for us in cases when we are not able to track context
busyness ourselves (like with GuC). Keep a copy of this in struct
intel_context from where it can be easily read even if the context is not
pinned.

v2:
(Chris)
* Do not store pphwsp address in intel_context.
* Log CS wrap-around.
* Simplify calculation by relying on integer wraparound.
v3:
* Include total/avg in traces and error state for debugging

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200216133620.394962-1-chris@chris-wilson.co.uk

show more ...


Revision tags: v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16
# 70a76a9b 28-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Hook up CS_MASTER_ERROR_INTERRUPT

Now that we have offline error capture and can reset an engine from
inside an atomic context while also preserving the GPU state for
post-mortem analys

drm/i915/gt: Hook up CS_MASTER_ERROR_INTERRUPT

Now that we have offline error capture and can reset an engine from
inside an atomic context while also preserving the GPU state for
post-mortem analysis, it is time to handle error interrupts thrown by
the command parser.

This provides a much, much faster mechanism for us to detect known
problems than using heartbeats/hangchecks, and also provides a mechanism
for when those are disabled. However, it is limited to problems the HW
can detect in the CS and so not a complete solution for detecting lockups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200128204318.4182039-2-chris@chris-wilson.co.uk

show more ...


Revision tags: v5.5, v5.4.15
# 7e36505d 24-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stub out i915_gpu_coredump_put

i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.

Reported-by: Mike Lothian <mike@fire

drm/i915: Stub out i915_gpu_coredump_put

i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture")
Fixes: 748317386afb ("drm/i915/execlists: Offline error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk

show more ...


Revision tags: v5.4.14, v5.4.13
# 04062c58 17-Jan-2020 Zhang Xiaoxu <zhangxiaoxu5@huawei.com>

drm/i915: Fix i915_error_state_store error defination

Since commit 742379c0c4001 ("drm/i915: Start chopping up the GPU error
capture"), function 'i915_error_state_store' was defined and used with
on

drm/i915: Fix i915_error_state_store error defination

Since commit 742379c0c4001 ("drm/i915: Start chopping up the GPU error
capture"), function 'i915_error_state_store' was defined and used with
only one parameter.

But if no 'CONFIG_DRM_I915_CAPTURE_ERROR', this function was defined
with two parameter.

This may lead compile error. This patch fix it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117073436.6507-1-zhangxiaoxu5@huawei.com

show more ...


12345