Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46 |
|
#
f1740b1a |
| 14-Aug-2023 |
Tim Huang <Tim.Huang@amd.com> |
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are disabled when S0ix suspend but fails to re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0 [ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0 [ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers to configure the fence driver interrupts for rings that belong to GFX. The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
show more ...
|
#
603b9a57 |
| 14-Aug-2023 |
Tim Huang <Tim.Huang@amd.com> |
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are
drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on SDMA and GFX rings after S0ix resume. This is generated by EOP interrupts are disabled when S0ix suspend but fails to re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0 [ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0 [ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers to configure the fence driver interrupts for rings that belong to GFX. The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
0a33b11d |
| 17-Apr-2023 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: mark force completed fences with -ECANCELED
When we force complete fences we should mark them as canceled.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben T
drm/amdgpu: mark force completed fences with -ECANCELED
When we force complete fences we should mark them as canceled.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
b13eb02b |
| 19-Apr-2023 |
Christian König <christian.koenig@amd.com> |
drm/amdgpu: add amdgpu_error_* debugfs file
This allows us to insert some error codes into the bottom of the pipeline on an engine.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewe
drm/amdgpu: add amdgpu_error_* debugfs file
This allows us to insert some error codes into the bottom of the pipeline on an engine.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
109b4d8c |
| 14-May-2023 |
Su Hui <suhui@nfschina.com> |
drm/amdgpu: remove unnecessary (void*) conversions
No need cast (void*) to (struct amdgpu_device *).
Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.co
drm/amdgpu: remove unnecessary (void*) conversions
No need cast (void*) to (struct amdgpu_device *).
Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
3083b100 |
| 09-May-2023 |
Guchun Chen <guchun.chen@amd.com> |
drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged
When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_devic
drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged
When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive.
[ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] <TASK> [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
6e87c422 |
| 24-Apr-2023 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdgpu: improve wait logic at fence polling
Accomplish this by reading the seq number right away instead of sleep for 5us. There are certain cases where the fence is ready almost immediately. Sl
drm/amdgpu: improve wait logic at fence polling
Accomplish this by reading the seq number right away instead of sleep for 5us. There are certain cases where the fence is ready almost immediately. Sleep number granularity was also reduced as the majority of the kiq tlb flush takes between 2us to 6us.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
9e690184 |
| 03-May-2023 |
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> |
drm/amd/amdgpu: Fix errors & warnings in amdgpu _bios, _cs, _dma_buf, _fence.c
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}' ERROR: trailing statem
drm/amd/amdgpu: Fix errors & warnings in amdgpu _bios, _cs, _dma_buf, _fence.c
The following checkpatch errors & warning is removed.
ERROR: else should follow close brace '}' ERROR: trailing statements should be on next line WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Possible repeated word: 'Fences' WARNING: Missing a blank line after declarations WARNING: braces {} are not necessary for single statement blocks WARNING: Comparisons should place the constant on the right side of the test WARNING: printk() should include KERN_<LEVEL> facility level
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c1a322a7 |
| 09-May-2023 |
Guchun Chen <guchun.chen@amd.com> |
drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged
When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_devic
drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged
When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive.
[ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] <TASK> [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20 |
|
#
033c5647 |
| 15-Mar-2023 |
YuBiao Wang <YuBiao.Wang@amd.com> |
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. Th
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang.
[How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
f466b111 |
| 15-Mar-2023 |
YuBiao Wang <YuBiao.Wang@amd.com> |
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. Th
drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang.
[How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete.
Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10 |
|
#
5ad7bbf3 |
| 02-Feb-2023 |
Guilherme G. Piccoli <gpiccoli@igalia.com> |
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after t
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...]
To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
show more ...
|
#
70f1872e |
| 02-Feb-2023 |
Guilherme G. Piccoli <gpiccoli@igalia.com> |
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after t
drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully.
Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops:
amdgpu: probe of 0000:04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0000000000000090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: <TASK> amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...]
To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part.
Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1].
[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/
Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66 |
|
#
3f4c175d |
| 06-Sep-2022 |
Jiadong.Zhu <Jiadong.Zhu@amd.com> |
drm/amdgpu: MCBP based on DRM scheduler (v9)
Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition.
The muxer saves the locatio
drm/amdgpu: MCBP based on DRM scheduler (v9)
Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition.
The muxer saves the locations of the indirect buffer frames from the software ring together with the fence sequence number in its fifo queue, and pops out those records when the fences are signalled. The locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring.
v2: Update comment style. v3: Fix conflict caused by previous modifications. v4: Remove unnecessary prints. v5: Fix corner cases for resubmission cases. v6: Refactor functions for resubmission, calling fence_process in irq handler. v7: Solve conflict for removing amdgpu_sw_ring.c. v8: Add time threshold to judge if preemption request is needed. v9: Correct comment spelling. Set fence emit timestamp before rsu assignment.
Cc: Christian Koenig <Christian.Koenig@amd.com> Cc: Luben Tuikov <Luben.Tuikov@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Michel Dänzer <michel@daenzer.net> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
e7b8e90a |
| 23-Sep-2022 |
Jiadong.Zhu <Jiadong.Zhu@amd.com> |
drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler.
Reviewed-by: Ch
drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
11e38360 |
| 23-Sep-2022 |
Jiadong.Zhu <Jiadong.Zhu@amd.com> |
drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler.
Reviewed-by: Ch
drm/amdgpu: Remove fence_process in count_emitted
The function amdgpu_fence_count_emitted used in work_hander should not call amdgpu_fence_process which must be used in irq handler.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56 |
|
#
9dd4545f |
| 21-Jul-2022 |
Slark Xiao <slark_xiao@163.com> |
drm/amd: Fix typo 'the the' in comment
Replace 'the the' with 'the' in the comment.
Signed-off-by: Slark Xiao <slark_xiao@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
Revision tags: v5.15.55, v5.15.54 |
|
#
f1549c09 |
| 07-Jul-2022 |
Likun Gao <Likun.Gao@amd.com> |
drm/amdgpu: support reset flag set for gpu reset
Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, for
drm/amdgpu: support reset flag set for gpu reset
Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset.
Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49 |
|
#
9ae55f03 |
| 20-Jun-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Follow up change to previous drm scheduler change.
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is
drm/amdgpu: Follow up change to previous drm scheduler change.
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.
Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: Yiqing Yao <yiqing.yao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
9e225fb9 |
| 17-Jun-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they
drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt.
Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line.
v2: Switch from irq_get/put to full enable/disable_irq for amdgpu
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
dd70748e |
| 20-Jun-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_proc
drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
This function should drop the fence refcount when it extracts the fence from the fence array, just as it's done in amdgpu_fence_process.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41 |
|
#
cf727044 |
| 17-May-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name.
Sig
drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
2f83658f |
| 17-May-2022 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Add work_struct for GPU reset from debugfs
We need to have a work_struct to cancel this reset if another already in progress.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com
drm/amdgpu: Add work_struct for GPU reset from debugfs
We need to have a work_struct to cancel this reset if another already in progress.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27 |
|
#
ae9fd76f |
| 20-Mar-2020 |
Jack Xiao <Jack.Xiao@amd.com> |
drm/amdgpu: assign the cpu/gpu address of fence from ring
assign the cpu/gpu address of fence for the normal or mes ring from ring structure.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by:
drm/amdgpu: assign the cpu/gpu address of fence from ring
assign the cpu/gpu address of fence for the normal or mes ring from ring structure.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
5fd8518d |
| 06-Dec-2021 |
Andrey Grodzovsky <andrey.grodzovsky@amd.com> |
drm/amdgpu: Move scheduler init to after XGMI is ready
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so singl
drm/amdgpu: Move scheduler init to after XGMI is ready
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74112.html
show more ...
|