Revision tags: v5.15.33 |
|
#
c3eb12df |
| 07-Apr-2022 |
Felix Kuehling <Felix.Kuehling@amd.com> |
drm/amdkfd: Ignore bogus signals from MEC efficiently
MEC firmware sometimes sends signal interrupts without a valid context ID on end of pipe events that don't intend to signal any HSA signals. Thi
drm/amdkfd: Ignore bogus signals from MEC efficiently
MEC firmware sometimes sends signal interrupts without a valid context ID on end of pipe events that don't intend to signal any HSA signals. This triggers the slow path in kfd_signal_event_interrupt that scans the entire event page for signaled events. Detect these signals in the top half interrupt handler to stop processing them as early as possible.
Because we now always treat event ID 0 as invalid, reserve that ID during process initialization.
v2: Update firmware version checks to support more GPUs
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.32, v5.15.31 |
|
#
ed94aca6 |
| 21-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: print unmap queue status for RAS poison consumption (v3)
Print the status out when it passes, and also tell user gpu reset is triggered when we fall back to legacy way.
v2: make the mes
drm/amdkfd: print unmap queue status for RAS poison consumption (v3)
Print the status out when it passes, and also tell user gpu reset is triggered when we fall back to legacy way.
v2: make the message more explicit. v3: change succeeds to succeeded. replace pr_warn with dev_warn.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.17, v5.15.30, v5.15.29 |
|
#
1990e29b |
| 16-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2)
Do RAS page retirement and use gpu reset as fallback in UTCL2 fault handler.
v2: replace vm fault event with posion consumed event in
drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2)
Do RAS page retirement and use gpu reset as fallback in UTCL2 fault handler.
v2: replace vm fault event with posion consumed event in UTCL2 poison consumption.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
9d8a8d78 |
| 15-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: replace source_id with client_id for RAS poison consumption
Client ID is more accruate here and we can deal with more different cases with client ID.
Signed-off-by: Tao Zhou <tao.zhou1@
drm/amdkfd: replace source_id with client_id for RAS poison consumption
Client ID is more accruate here and we can deal with more different cases with client ID.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
eed41975 |
| 15-Mar-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: refine event_interrupt_poison_consumption
Combine reading and setting poison flag as one atomic operation and add print message for the function.
Signed-off-by: Tao Zhou <tao.zhou1@amd.
drm/amdkfd: refine event_interrupt_poison_consumption
Combine reading and setting poison flag as one atomic operation and add print message for the function.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24 |
|
#
29b440d2 |
| 16-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: add return value check for queue eviction
Otherwise gpu reset will be triggered unconditionally in poison consumption.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Z
drm/amdkfd: add return value check for queue eviction
Otherwise gpu reset will be triggered unconditionally in poison consumption.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.23 |
|
#
2243f493 |
| 10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: Fix leftover errors and warnings
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warn
drm/amdkfd: Fix leftover errors and warnings
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d87f36a0 |
| 10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: update SPDX license header
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj
drm/amdkfd: update SPDX license header
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.22 |
|
#
b1c87b08 |
| 07-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: use unmap all queues for poison consumption
Replace reset queue for specific PASID with unmap all queues, reset queue could break CP scheduler.
Signed-off-by: Tao Zhou <tao.zhou1@amd.co
drm/amdkfd: use unmap all queues for poison consumption
Replace reset queue for specific PASID with unmap all queues, reset queue could break CP scheduler.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
03e5b167 |
| 07-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid
As the function is used in more different cases, use a more general name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Fel
drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid
As the function is used in more different cases, use a more general name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16 |
|
#
5b0ce2d4 |
| 29-Dec-2021 |
yipechai <YiPeng.Chai@amd.com> |
drm/amdkfd: enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9
Enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9.
Signed-off-by: yipechai <YiPeng.Chai@amd
drm/amdkfd: enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9
Enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9.
Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
b6485bed |
| 06-Dec-2021 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: reset queue which consumes RAS poison (v2)
CP supports unmap queue with reset mode which only destroys specific queue without affecting others. Replacing whole gpu reset with reset queue
drm/amdkfd: reset queue which consumes RAS poison (v2)
CP supports unmap queue with reset mode which only destroys specific queue without affecting others. Replacing whole gpu reset with reset queue mode for RAS poison consumption saves much time, and we can also fallback to gpu reset solution if reset queue fails.
v2: Return directly if process is NULL; Reset queue solution is not applicable to SDMA, fallback to legacy way; Call kfd_unref_process after lookup process.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3 |
|
#
f0dc99a6 |
| 17-Nov-2021 |
Graham Sider <Graham.Sider@amd.com> |
drm/amdkfd: add kfd_device_info_init function
Initializes kfd->device_info given either asic_type (enum) if GFX version is less than GFX9, or GC IP version if greater. Also takes in vf and the targe
drm/amdkfd: add kfd_device_info_init function
Initializes kfd->device_info given either asic_type (enum) if GFX version is less than GFX9, or GC IP version if greater. Also takes in vf and the target compiler gfx version. Uses SDMA version to determine num_sdma_queues_per_engine.
Convert device_info to a non-pointer member of kfd, change references accordingly.
Change unsupported asic condition to only probe f2g, move device_info initialization post-switch.
Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.15.2, v5.15.1, v5.15, v5.14.14 |
|
#
6bfc7c7e |
| 19-Oct-2021 |
Graham Sider <Graham.Sider@amd.com> |
drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs
Modified definitions:
- amdgpu_amdkfd_submit_ib - amdgpu_amdkfd_set_compute_idle - amdgpu_amdkfd_have_atomics_support - amdgpu_amdkfd_flush
drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs
Modified definitions:
- amdgpu_amdkfd_submit_ib - amdgpu_amdkfd_set_compute_idle - amdgpu_amdkfd_have_atomics_support - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_flush_gpu_tlb_pasid - amdgpu_amdkfd_gpu_reset - amdgpu_amdkfd_alloc_gtt_mem - amdgpu_amdkfd_free_gtt_mem - amdgpu_amdkfd_alloc_gws - amdgpu_amdkfd_free_gws - amdgpu_amdkfd_ras_poison_consumption_handler
Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8 |
|
#
c7490949 |
| 23-Sep-2021 |
Tao Zhou <tao.zhou1@amd.com> |
amd/amdkfd: add ras page retirement handling for sq/sdma (v3)
In ras poison mode, page retirement will be handled by the irq handler of the module which consumes corrupted data.
v2: rename ras_proc
amd/amdkfd: add ras page retirement handling for sq/sdma (v3)
In ras poison mode, page retirement will be handled by the irq handler of the module which consumes corrupted data.
v2: rename ras_process_cb to ras_poison_consumption_handler. move the handler's implementation from ASIC specific file to common file.
v3: call gpu reset for xGMI connected mode.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43 |
|
#
4a1d4b6d |
| 03-Jun-2021 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdkfd: add sdma poison consumption handling
Follow the same apporach as GFX to handle SDMA poison consumption. Send SIGBUS to application when receives SDMA_ECC interrupt and issue gpu reset ei
drm/amdkfd: add sdma poison consumption handling
Follow the same apporach as GFX to handle SDMA poison consumption. Send SIGBUS to application when receives SDMA_ECC interrupt and issue gpu reset either mode 2 or mode 1 to get the engine back
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li<dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36 |
|
#
e2b1f9f5 |
| 11-May-2021 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdkfd: refine the poison data consumption handling
The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison da
drm/amdkfd: refine the poison data consumption handling
The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data consumed. Beside that, some applications maybe register SIGBUS signal hander. These applications will handle poison data by themselves, exit or re-create context to re-dispatch works.
Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12 |
|
#
be9064b7 |
| 25-Apr-2021 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdgpu: remove unnecessary header include
amdgpu.h is included in kfd_priv.h
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-b
drm/amdgpu: remove unnecessary header include
amdgpu.h is included in kfd_priv.h
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10.32, v5.10.31 |
|
#
20161e51 |
| 14-Apr-2021 |
Dennis Li <Dennis.Li@amd.com> |
drm/amdkfd: add edc error interrupt handle for poison propogate mode
In poison progogate mode, when driver receive the edc error interrupt from SQ, driver should kill the process by pasid which is u
drm/amdkfd: add edc error interrupt handle for poison propogate mode
In poison progogate mode, when driver receive the edc error interrupt from SQ, driver should kill the process by pasid which is using the poison data, and then trigger GPU reset.
Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6 |
|
#
6d909c5d |
| 22-Jun-2020 |
Oak Zeng <Oak.Zeng@amd.com> |
drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault
This is to keep wavefront context for debug purpose
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix
drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault
This is to keep wavefront context for debug purpose
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
7af103ea |
| 05-Jan-2021 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: check more client ids in interrupt handler
Add check for SExSH clients in kfd interrupt handler.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Alex Deucher <alexander.deucher
drm/amdkfd: check more client ids in interrupt handler
Add check for SExSH clients in kfd interrupt handler.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
ae279f69 |
| 18-Dec-2020 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: check both client id and src id in interrupt handlers
We can have the same src ids for different client ids so make sure to check both the client id and the source id when handling inter
drm/amdkfd: check both client id and src id in interrupt handlers
We can have the same src ids for different client ids so make sure to check both the client id and the source id when handling interrupts.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41 |
|
#
938a0650 |
| 13-May-2020 |
Amber Lin <Amber.Lin@amd.com> |
drm/amdkfd: Provide SMI events watch
When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong.
drm/amdkfd: Provide SMI events watch
When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong. This patch provides an event watch interface for the user space to register devices and subscribe events they are interested. After registered, the user can use annoymous file descriptor's poll function with wait-time specified and wait for events to happen. Once an event happens, the user can use read() to retrieve information related to the event.
VM fault event is done in this patch.
v2: - remove UNREGISTER and add event ENABLE/DISABLE - correct kfifo usage - move event message API to kfd_ioctl.h v3: send the event msg in text than in binary v4: support multiple clients v5: move events enablement from ioctl to fd write v6: sparse fix
Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
8c8e1f69 |
| 18-May-2020 |
Aishwarya Ramakrishnan <aishwaryarj100@gmail.com> |
drm/amdkfd: Fix boolreturn.cocci warnings
Return statements in functions returning bool should use true/false instead of 1/0.
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10: WARNING: retur
drm/amdkfd: Fix boolreturn.cocci warnings
Return statements in functions returning bool should use true/false instead of 1/0.
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10: WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Signed-off-by: Aishwarya Ramakrishnan <aishwaryarj100@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2 |
|
#
3fe023d4 |
| 25-Sep-2019 |
Yong Zhao <Yong.Zhao@amd.com> |
drm/amdkfd: Query vmid pasid mapping through stored info for non HWS
Because we record the mapping under non HWS mode in the software, we can query pasid through vmid using the stored mapping instea
drm/amdkfd: Query vmid pasid mapping through stored info for non HWS
Because we record the mapping under non HWS mode in the software, we can query pasid through vmid using the stored mapping instead of reading from ATC registers.
This also prepares for the defeatured ATC block in future ASICs.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|