History log of /openbmc/linux/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c (Results 26 – 50 of 67)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v5.15.33
# c3eb12df 07-Apr-2022 Felix Kuehling <Felix.Kuehling@amd.com>

drm/amdkfd: Ignore bogus signals from MEC efficiently

MEC firmware sometimes sends signal interrupts without a valid context ID
on end of pipe events that don't intend to signal any HSA signals.
Thi

drm/amdkfd: Ignore bogus signals from MEC efficiently

MEC firmware sometimes sends signal interrupts without a valid context ID
on end of pipe events that don't intend to signal any HSA signals.
This triggers the slow path in kfd_signal_event_interrupt that scans the
entire event page for signaled events. Detect these signals in the top
half interrupt handler to stop processing them as early as possible.

Because we now always treat event ID 0 as invalid, reserve that ID during
process initialization.

v2: Update firmware version checks to support more GPUs

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.32, v5.15.31
# ed94aca6 21-Mar-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: print unmap queue status for RAS poison consumption (v3)

Print the status out when it passes, and also tell user gpu reset
is triggered when we fall back to legacy way.

v2: make the mes

drm/amdkfd: print unmap queue status for RAS poison consumption (v3)

Print the status out when it passes, and also tell user gpu reset
is triggered when we fall back to legacy way.

v2: make the message more explicit.
v3: change succeeds to succeeded.
replace pr_warn with dev_warn.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.17, v5.15.30, v5.15.29
# 1990e29b 16-Mar-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2)

Do RAS page retirement and use gpu reset as fallback in UTCL2 fault
handler.

v2: replace vm fault event with posion consumed event in

drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2)

Do RAS page retirement and use gpu reset as fallback in UTCL2 fault
handler.

v2: replace vm fault event with posion consumed event in UTCL2
poison consumption.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 9d8a8d78 15-Mar-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: replace source_id with client_id for RAS poison consumption

Client ID is more accruate here and we can deal with more different
cases with client ID.

Signed-off-by: Tao Zhou <tao.zhou1@

drm/amdkfd: replace source_id with client_id for RAS poison consumption

Client ID is more accruate here and we can deal with more different
cases with client ID.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# eed41975 15-Mar-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: refine event_interrupt_poison_consumption

Combine reading and setting poison flag as one atomic operation
and add print message for the function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.

drm/amdkfd: refine event_interrupt_poison_consumption

Combine reading and setting poison flag as one atomic operation
and add print message for the function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24
# 29b440d2 16-Feb-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: add return value check for queue eviction

Otherwise gpu reset will be triggered unconditionally in poison
consumption.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Z

drm/amdkfd: add return value check for queue eviction

Otherwise gpu reset will be triggered unconditionally in poison
consumption.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.23
# 2243f493 10-Feb-2022 Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

drm/amdkfd: Fix leftover errors and warnings

A bunch of errors and warnings are leftover KFD over the years, attempt
to fix the errors and most warnings reported by checkpatch tool. Still a
few warn

drm/amdkfd: Fix leftover errors and warnings

A bunch of errors and warnings are leftover KFD over the years, attempt
to fix the errors and most warnings reported by checkpatch tool. Still a
few warnings remain which may be false positives so ignore them for now.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# d87f36a0 10-Feb-2022 Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

drm/amdkfd: update SPDX license header

Update the SPDX License header for all the KFD files.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj

drm/amdkfd: update SPDX license header

Update the SPDX License header for all the KFD files.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.22
# b1c87b08 07-Feb-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: use unmap all queues for poison consumption

Replace reset queue for specific PASID with unmap all queues, reset
queue could break CP scheduler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.co

drm/amdkfd: use unmap all queues for poison consumption

Replace reset queue for specific PASID with unmap all queues, reset
queue could break CP scheduler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 03e5b167 07-Feb-2022 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid

As the function is used in more different cases, use a more general
name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Fel

drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid

As the function is used in more different cases, use a more general
name.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16
# 5b0ce2d4 29-Dec-2021 yipechai <YiPeng.Chai@amd.com>

drm/amdkfd: enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9

Enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9.

Signed-off-by: yipechai <YiPeng.Chai@amd

drm/amdkfd: enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9

Enable sdma ecc interrupt event can be handled by event_interrupt_wq_v9.

Signed-off-by: yipechai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.10, v5.15.9, v5.15.8, v5.15.7
# b6485bed 06-Dec-2021 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: reset queue which consumes RAS poison (v2)

CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue

drm/amdkfd: reset queue which consumes RAS poison (v2)

CP supports unmap queue with reset mode which only destroys specific queue without affecting others.
Replacing whole gpu reset with reset queue mode for RAS poison consumption
saves much time, and we can also fallback to gpu reset solution if reset
queue fails.

v2: Return directly if process is NULL;
Reset queue solution is not applicable to SDMA, fallback to legacy
way;
Call kfd_unref_process after lookup process.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3
# f0dc99a6 17-Nov-2021 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: add kfd_device_info_init function

Initializes kfd->device_info given either asic_type (enum) if GFX
version is less than GFX9, or GC IP version if greater. Also takes in vf
and the targe

drm/amdkfd: add kfd_device_info_init function

Initializes kfd->device_info given either asic_type (enum) if GFX
version is less than GFX9, or GC IP version if greater. Also takes in vf
and the target compiler gfx version. Uses SDMA version to determine
num_sdma_queues_per_engine.

Convert device_info to a non-pointer member of kfd, change references
accordingly.

Change unsupported asic condition to only probe f2g, move device_info
initialization post-switch.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.15.2, v5.15.1, v5.15, v5.14.14
# 6bfc7c7e 19-Oct-2021 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs

Modified definitions:

- amdgpu_amdkfd_submit_ib
- amdgpu_amdkfd_set_compute_idle
- amdgpu_amdkfd_have_atomics_support
- amdgpu_amdkfd_flush

drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs

Modified definitions:

- amdgpu_amdkfd_submit_ib
- amdgpu_amdkfd_set_compute_idle
- amdgpu_amdkfd_have_atomics_support
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_gpu_reset
- amdgpu_amdkfd_alloc_gtt_mem
- amdgpu_amdkfd_free_gtt_mem
- amdgpu_amdkfd_alloc_gws
- amdgpu_amdkfd_free_gws
- amdgpu_amdkfd_ras_poison_consumption_handler

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8
# c7490949 23-Sep-2021 Tao Zhou <tao.zhou1@amd.com>

amd/amdkfd: add ras page retirement handling for sq/sdma (v3)

In ras poison mode, page retirement will be handled by the irq handler of the
module which consumes corrupted data.

v2: rename ras_proc

amd/amdkfd: add ras page retirement handling for sq/sdma (v3)

In ras poison mode, page retirement will be handled by the irq handler of the
module which consumes corrupted data.

v2: rename ras_process_cb to ras_poison_consumption_handler.
move the handler's implementation from ASIC specific file to common
file.

v3: call gpu reset for xGMI connected mode.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43
# 4a1d4b6d 03-Jun-2021 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdkfd: add sdma poison consumption handling

Follow the same apporach as GFX to handle SDMA
poison consumption. Send SIGBUS to application
when receives SDMA_ECC interrupt and issue gpu
reset ei

drm/amdkfd: add sdma poison consumption handling

Follow the same apporach as GFX to handle SDMA
poison consumption. Send SIGBUS to application
when receives SDMA_ECC interrupt and issue gpu
reset either mode 2 or mode 1 to get the engine
back

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<dennis.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36
# e2b1f9f5 11-May-2021 Dennis Li <Dennis.Li@amd.com>

drm/amdkfd: refine the poison data consumption handling

The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and
KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison da

drm/amdkfd: refine the poison data consumption handling

The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and
KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data
consumed. Beside that, some applications maybe register SIGBUS signal
hander. These applications will handle poison data by themselves, exit
or re-create context to re-dispatch works.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12
# be9064b7 25-Apr-2021 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: remove unnecessary header include

amdgpu.h is included in kfd_priv.h

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-b

drm/amdgpu: remove unnecessary header include

amdgpu.h is included in kfd_priv.h

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: John Clements <John.Clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.10.32, v5.10.31
# 20161e51 14-Apr-2021 Dennis Li <Dennis.Li@amd.com>

drm/amdkfd: add edc error interrupt handle for poison propogate mode

In poison progogate mode, when driver receive the edc error interrupt
from SQ, driver should kill the process by pasid which is u

drm/amdkfd: add edc error interrupt handle for poison propogate mode

In poison progogate mode, when driver receive the edc error interrupt
from SQ, driver should kill the process by pasid which is using the
poison data, and then trigger GPU reset.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6
# 6d909c5d 22-Jun-2020 Oak Zeng <Oak.Zeng@amd.com>

drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault

This is to keep wavefront context for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix

drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault

This is to keep wavefront context for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 7af103ea 05-Jan-2021 Tao Zhou <tao.zhou1@amd.com>

drm/amdkfd: check more client ids in interrupt handler

Add check for SExSH clients in kfd interrupt handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher

drm/amdkfd: check more client ids in interrupt handler

Add check for SExSH clients in kfd interrupt handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# ae279f69 18-Dec-2020 Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: check both client id and src id in interrupt handlers

We can have the same src ids for different client ids so make sure to
check both the client id and the source id when handling inter

drm/amdkfd: check both client id and src id in interrupt handlers

We can have the same src ids for different client ids so make sure to
check both the client id and the source id when handling interrupts.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41
# 938a0650 13-May-2020 Amber Lin <Amber.Lin@amd.com>

drm/amdkfd: Provide SMI events watch

When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong.

drm/amdkfd: Provide SMI events watch

When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.

VM fault event is done in this patch.

v2: - remove UNREGISTER and add event ENABLE/DISABLE
- correct kfifo usage
- move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
v5: move events enablement from ioctl to fd write
v6: sparse fix

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 8c8e1f69 18-May-2020 Aishwarya Ramakrishnan <aishwaryarj100@gmail.com>

drm/amdkfd: Fix boolreturn.cocci warnings

Return statements in functions returning bool should use
true/false instead of 1/0.

drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10:
WARNING: retur

drm/amdkfd: Fix boolreturn.cocci warnings

Return statements in functions returning bool should use
true/false instead of 1/0.

drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10:
WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Signed-off-by: Aishwarya Ramakrishnan <aishwaryarj100@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


Revision tags: v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2
# 3fe023d4 25-Sep-2019 Yong Zhao <Yong.Zhao@amd.com>

drm/amdkfd: Query vmid pasid mapping through stored info for non HWS

Because we record the mapping under non HWS mode in the software,
we can query pasid through vmid using the stored mapping instea

drm/amdkfd: Query vmid pasid mapping through stored info for non HWS

Because we record the mapping under non HWS mode in the software,
we can query pasid through vmid using the stored mapping instead of
reading from ATC registers.

This also prepares for the defeatured ATC block in future ASICs.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


123