History log of /openbmc/linux/drivers/gpu/drm/amd/amdkfd/kfd_device.c (Results 1 – 25 of 257)
Revision Date Author Comments
# 30ceb873 14-Jul-2024 Philip Yang <Philip.Yang@amd.com>

drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer

[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ]

Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherw

drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer

[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ]

Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# bb430ea4 20-May-2024 Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"

commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.

This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.

Rever

Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"

commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.

This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.

Revert this commit as apparently the LLVM code to take advantage of
this never landed.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Feifei Xu <feifei.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# b6f66265 18-Mar-2024 Zhigang Luo <Zhigang.Luo@amd.com>

amd/amdkfd: sync all devices to wait all processes being evicted

[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]

If there are more than one device doing reset in parallel, the first
de

amd/amdkfd: sync all devices to wait all processes being evicted

[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]

If there are more than one device doing reset in parallel, the first
device will call kfd_suspend_all_processes() to evict all processes
on all devices, this call takes time to finish. other device will
start reset and recover without waiting. if the process has not been
evicted before doing recover, it will be restored, then caused page
fault.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# c99a2e7a 28-Jul-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: drop IOMMUv2 support

Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.

v2: drop the now unused queue manager functions for gfx7/8 APUs

Reviewed-by: Felix Kuehling <

drm/amdkfd: drop IOMMUv2 support

Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.

v2: drop the now unused queue manager functions for gfx7/8 APUs

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 2b4adeb3 28-Jul-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: disable IOMMUv2 support for Raven

Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The

drm/amdkfd: disable IOMMUv2 support for Raven

Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The dGPU path has been
used for a long time with newer APUs and works fine. This
also paves the way to simplify the driver significantly.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 99c15019 28-Jul-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: disable IOMMUv2 support for KV/CZ

Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The

drm/amdkfd: disable IOMMUv2 support for KV/CZ

Use the dGPU path instead. There were a lot of platform
issues with IOMMU in general on these chips due to windows
not enabling IOMMU at the time. The dGPU path has been
used for a long time with newer APUs and works fine. This
also paves the way to simplify the driver significantly.

v2: use the dGPU queue manager functions

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# c3186665 14-Jul-2023 Shashank Sharma <shashank.sharma@amd.com>

drm/amdgpu: use doorbell mgr for kfd kernel doorbells

This patch:
- adds a doorbell bo in kfd device structure.
- creates doorbell page for kfd kernel usages.
- updates the get_kernel_doorbell and f

drm/amdgpu: use doorbell mgr for kfd kernel doorbells

This patch:
- adds a doorbell bo in kfd device structure.
- creates doorbell page for kfd kernel usages.
- updates the get_kernel_doorbell and free_kernel_doorbell functions
accordingly

V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex)
V3:
- Move single variable declaration below (Christian)
- Add a to-do item to reuse the KGD kernel level doorbells for
KFD for non-MES cases, instead of reserving one page (Felix)

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 7a1c5c67 12-Jul-2023 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: enable cooperative groups for gfx11

MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS

drm/amdkfd: enable cooperative groups for gfx11

MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS. Similar to the F32 HWS, MES will manage
quality of service for these queues.
Use this for cooperative groups since cooperative groups are device
occupancy limited.

Since some GFX11 devices can only be debugged with partial CUs, do not
allow the debugging of cooperative groups on these devices as the CU
occupancy limit will change on attach.

In addition, zero initialize the MES add queue submission vector for MES
initialization tests as we do not want these to be cooperative
dispatches.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# d4300362 22-Jun-2023 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Update interrupt handling for GFX 9.4.3

For GFX 9.4.3, interrupt handling needs to be updated for:
- Interrupt cookie will have a NodeId field. Each KFD
node needs to check the NodeId

drm/amdkfd: Update interrupt handling for GFX 9.4.3

For GFX 9.4.3, interrupt handling needs to be updated for:
- Interrupt cookie will have a NodeId field. Each KFD
node needs to check the NodeId before processing the
interrupt.
- For CPX mode, there are additional checks of client ID
needed to process the interrupt.
- Add NodeId to the process drain interrupt.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# fc133acc 15-Jun-2023 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Enable GWS on GFX9.4.3

Enable GWS capable queue creation for forward
progress gaurantee on GFX 9.4.3.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix

drm/amdkfd: Enable GWS on GFX9.4.3

Enable GWS capable queue creation for forward
progress gaurantee on GFX 9.4.3.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 597364ad 31-May-2023 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Fix reserved SDMA queues handling

This patch fixes a regression caused by a bad merge where
the handling of reserved SDMA queues was accidentally removed.
With the fix, the reserved SDMA

drm/amdkfd: Fix reserved SDMA queues handling

This patch fixes a regression caused by a bad merge where
the handling of reserved SDMA queues was accidentally removed.
With the fix, the reserved SDMA queues are again correctly
marked as unavailable for allocation.

Fixes: a805889a1531 ("drm/amdkfd: Update SDMA queue management for GFX9.4.3")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# e0f85f46 06-May-2022 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: add debug set and clear address watch points operation

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger t

drm/amdkfd: add debug set and clear address watch points operation

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 12fb1ad7 22-Apr-2022 Jonathan Kim <jonathan.kim@amd.com>

drm/amdkfd: update process interrupt handling for debug events

The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, a

drm/amdkfd: update process interrupt handling for debug events

The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint. Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed. Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c. This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# d3116d9f 30-May-2023 Yang Li <yang.lee@linux.alibaba.com>

drm/amdkfd: clean up one inconsistent indenting

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting

Signed-off-by: Yang Li <yang.lee@linux.alibab

drm/amdkfd: clean up one inconsistent indenting

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 07a14752 03-Apr-2023 Graham Sider <Graham.Sider@amd.com>

drm/amdkfd: Add new gfx_target_versions for GC 9.4.3

For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU
or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU.

Signed-off-by: G

drm/amdkfd: Add new gfx_target_versions for GC 9.4.3

For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU
or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 28ebbb49 24-May-2023 Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices

Certain boards with GC IP 11.0.3 need slightly different handling
in the shader compiler due to board specific bounding box
optimization

drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices

Certain boards with GC IP 11.0.3 need slightly different handling
in the shader compiler due to board specific bounding box
optimizations.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 55a6dc60 23-May-2023 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Set event interrupt class for GFX 9.4.3

Fix the warning during driver load because the event
interrupt class is not set for GFX9.4.3.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Ac

drm/amdkfd: Set event interrupt class for GFX 9.4.3

Fix the warning during driver load because the event
interrupt class is not set for GFX9.4.3.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 2ad00e75 19-May-2023 Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>

drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitializ

drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitialized]
num_xcd, kfd->adev->gfx.num_xcc_per_xcp);
^~~~~~~
include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err'
dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~
include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap'
_p_func(dev, fmt, ##__VA_ARGS__); \
^~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:597:13: note: initialize the variable 'num_xcd' to silence this warning
int num_xcd, partition_mode;
^
= 0
1 error generated.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 9a3ce1a7 12-May-2023 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdgpu: Do not access members of xcp w/o check (v2)

Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.

v2: add missing change in kfd_topology.c

Signed-off-by:

drm/amdgpu: Do not access members of xcp w/o check (v2)

Not all the asic needs xcp. ensure check xcp availabity
before accessing its member.

v2: add missing change in kfd_topology.c

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 0409022c 11-May-2023 Hawking Zhang <Hawking.Zhang@amd.com>

drm/amdkfd: Fix null ptr access

Avoid access null xcp_mgr pointer.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <ale

drm/amdkfd: Fix null ptr access

Avoid access null xcp_mgr pointer.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 84b4dd3f 31-Mar-2023 Philip Yang <Philip.Yang@amd.com>

drm/amdkfd: Refactor migrate init to support partition switch

Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
k

drm/amdkfd: Refactor migrate init to support partition switch

Rename smv_migrate_init to a better name kgd2kfd_init_zone_device
because it setup zone devive pgmap for page migration and keep it in
kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it
only once in amdgpu_device_ip_init after adev ip blocks are initialized,
but before amdgpu_amdkfd_device_init initialize kfd nodes which enable
SVM support based on pgmap.

svm_range_set_max_pages is called by kgd2kfd_device_init everytime after
switching compute partition mode.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 25f50704 23-Mar-2023 Philip Yang <Philip.Yang@amd.com>

drm/amdkfd: APU mode set max svm range pages

svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.

kgd2kfd_dev

drm/amdkfd: APU mode set max svm range pages

svm_migrate_init set the max svm range pages based on the KFD nodes
partition size. APU mode don't init pgmap because there is no migration.

kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation
and initialization.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 315e29ec 20-Mar-2023 Mukul Joshi <mukul.joshi@amd.com>

drm/amdkfd: Move local_mem_info to kfd_node

We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.

v2: squash in f

drm/amdkfd: Move local_mem_info to kfd_node

We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.

v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# 4c6ce75f 26-Jan-2023 Philip Yang <Philip.Yang@amd.com>

drm/amdkfd: Show KFD node memory partition info

Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory

drm/amdkfd: Show KFD node memory partition info

Show KFD node memory partition id and size, add helper function
KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used
later to support memory accounting per partition.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


# c4050ff1 09-Feb-2023 Lijo Lazar <lijo.lazar@amd.com>

drm/amdkfd: Use xcc mask for identifying xcc

Instead of start xcc id and number of xcc per node, use the xcc mask
which is the mask of logical ids of xccs belonging to a parition.

Signed-off-by: Li

drm/amdkfd: Use xcc mask for identifying xcc

Instead of start xcc id and number of xcc per node, use the xcc mask
which is the mask of logical ids of xccs belonging to a parition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

show more ...


1234567891011