#
30ceb873 |
| 14-Jul-2024 |
Philip Yang <Philip.Yang@amd.com> |
drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ]
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherw
drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ]
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
bb430ea4 |
| 20-May-2024 |
Alex Deucher <alexander.deucher@amd.com> |
Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.
This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.
Rever
Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices"
commit dd2b75fd9a79bf418e088656822af06fc253dbe3 upstream.
This reverts commit 28ebbb4981cb1fad12e0b1227dbecc88810b1ee8.
Revert this commit as apparently the LLVM code to take advantage of this never landed.
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Feifei Xu <feifei.xu@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
b6f66265 |
| 18-Mar-2024 |
Zhigang Luo <Zhigang.Luo@amd.com> |
amd/amdkfd: sync all devices to wait all processes being evicted
[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]
If there are more than one device doing reset in parallel, the first de
amd/amdkfd: sync all devices to wait all processes being evicted
[ Upstream commit d06af584be5a769d124b7302b32a033e9559761d ]
If there are more than one device doing reset in parallel, the first device will call kfd_suspend_all_processes() to evict all processes on all devices, this call takes time to finish. other device will start reset and recover without waiting. if the process has not been evicted before doing recover, it will be restored, then caused page fault.
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c99a2e7a |
| 28-Jul-2023 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: drop IOMMUv2 support
Now that we use the dGPU path for all APUs, drop the IOMMUv2 support.
v2: drop the now unused queue manager functions for gfx7/8 APUs
Reviewed-by: Felix Kuehling <
drm/amdkfd: drop IOMMUv2 support
Now that we use the dGPU path for all APUs, drop the IOMMUv2 support.
v2: drop the now unused queue manager functions for gfx7/8 APUs
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
2b4adeb3 |
| 28-Jul-2023 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: disable IOMMUv2 support for Raven
Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The
drm/amdkfd: disable IOMMUv2 support for Raven
Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
99c15019 |
| 28-Jul-2023 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: disable IOMMUv2 support for KV/CZ
Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The
drm/amdkfd: disable IOMMUv2 support for KV/CZ
Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly.
v2: use the dGPU queue manager functions
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c3186665 |
| 14-Jul-2023 |
Shashank Sharma <shashank.sharma@amd.com> |
drm/amdgpu: use doorbell mgr for kfd kernel doorbells
This patch: - adds a doorbell bo in kfd device structure. - creates doorbell page for kfd kernel usages. - updates the get_kernel_doorbell and f
drm/amdgpu: use doorbell mgr for kfd kernel doorbells
This patch: - adds a doorbell bo in kfd device structure. - creates doorbell page for kfd kernel usages. - updates the get_kernel_doorbell and free_kernel_doorbell functions accordingly
V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex) V3: - Move single variable declaration below (Christian) - Add a to-do item to reuse the KGD kernel level doorbells for KFD for non-MES cases, instead of reserving one page (Felix)
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
7a1c5c67 |
| 12-Jul-2023 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: enable cooperative groups for gfx11
MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS
drm/amdkfd: enable cooperative groups for gfx11
MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS. Similar to the F32 HWS, MES will manage quality of service for these queues. Use this for cooperative groups since cooperative groups are device occupancy limited.
Since some GFX11 devices can only be debugged with partial CUs, do not allow the debugging of cooperative groups on these devices as the CU occupancy limit will change on attach.
In addition, zero initialize the MES add queue submission vector for MES initialization tests as we do not want these to be cooperative dispatches.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d4300362 |
| 22-Jun-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Update interrupt handling for GFX 9.4.3
For GFX 9.4.3, interrupt handling needs to be updated for: - Interrupt cookie will have a NodeId field. Each KFD node needs to check the NodeId
drm/amdkfd: Update interrupt handling for GFX 9.4.3
For GFX 9.4.3, interrupt handling needs to be updated for: - Interrupt cookie will have a NodeId field. Each KFD node needs to check the NodeId before processing the interrupt. - For CPX mode, there are additional checks of client ID needed to process the interrupt. - Add NodeId to the process drain interrupt.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
fc133acc |
| 15-Jun-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Enable GWS on GFX9.4.3
Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix
drm/amdkfd: Enable GWS on GFX9.4.3
Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
597364ad |
| 31-May-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Fix reserved SDMA queues handling
This patch fixes a regression caused by a bad merge where the handling of reserved SDMA queues was accidentally removed. With the fix, the reserved SDMA
drm/amdkfd: Fix reserved SDMA queues handling
This patch fixes a regression caused by a bad merge where the handling of reserved SDMA queues was accidentally removed. With the fix, the reserved SDMA queues are again correctly marked as unavailable for allocation.
Fixes: a805889a1531 ("drm/amdkfd: Update SDMA queue management for GFX9.4.3") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
e0f85f46 |
| 06-May-2022 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: add debug set and clear address watch points operation
Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception.
Allow the debugger t
drm/amdkfd: add debug set and clear address watch points operation
Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception.
Allow the debugger to pass in a watch point to a particular memory address per device.
Note that there exists only 4 watch points per devices to date, so have the KFD keep track of what watch points are allocated or not.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
12fb1ad7 |
| 22-Apr-2022 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: update process interrupt handling for debug events
The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts.
If a debugger session exits, a
drm/amdkfd: update process interrupt handling for debug events
The debugger must be notified by any debugger subscribed exception that comes from hardware interrupts.
If a debugger session exits, any exceptions it subscribed to may still have interrupts in the interrupt ring buffer or KGD/KFD pipeline. To prevent a new session from inheriting stale interrupts, when a new queue is created, open an interrupt drain and allow the IH ring to drain from a timestamped checkpoint. Then inject a custom IV so that once the custom IV is picked up by the KFD, it's safe to close the drain and proceed with queue creation.
The drain must also be on debug disable as SW interrupts may still be processed. Drain at this time and clear all the exception status.
The debugger may also not be attached nor subscibed to certain exceptions so forward them directly to the runtime.
GFX10 also requires its own IV processing, hence the creation of kfd_int_process_v10.c. This is because the IV from SQ interrupts are packed into a new continguous format unlike GFX9. To make this clear, a separate interrupting handling code file was created.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d3116d9f |
| 30-May-2023 |
Yang Li <yang.lee@linux.alibaba.com> |
drm/amdkfd: clean up one inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting
Signed-off-by: Yang Li <yang.lee@linux.alibab
drm/amdkfd: clean up one inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
07a14752 |
| 03-Apr-2023 |
Graham Sider <Graham.Sider@amd.com> |
drm/amdkfd: Add new gfx_target_versions for GC 9.4.3
For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU.
Signed-off-by: G
drm/amdkfd: Add new gfx_target_versions for GC 9.4.3
For GC 9.4.3, set gfx_target_version to 90402 for rev 1 and later (APU or dGPU), 90401 for rev 0 dGPU, and 90400 for rev 0 APU.
Signed-off-by: Graham Sider <Graham.Sider@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
28ebbb49 |
| 24-May-2023 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices
Certain boards with GC IP 11.0.3 need slightly different handling in the shader compiler due to board specific bounding box optimization
drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices
Certain boards with GC IP 11.0.3 need slightly different handling in the shader compiler due to board specific bounding box optimizations.
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
55a6dc60 |
| 23-May-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Set event interrupt class for GFX 9.4.3
Fix the warning during driver load because the event interrupt class is not set for GFX9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Ac
drm/amdkfd: Set event interrupt class for GFX 9.4.3
Fix the warning during driver load because the event interrupt class is not set for GFX9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
2ad00e75 |
| 19-May-2023 |
Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> |
drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitializ
drm/amdgpu: Fix uninitalized variable in kgd2kfd_device_init
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:613:4: error: variable 'num_xcd' is uninitialized when used here [-Werror,-Wuninitialized] num_xcd, kfd->adev->gfx.num_xcc_per_xcp); ^~~~~~~ include/linux/dev_printk.h:144:65: note: expanded from macro 'dev_err' dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~~ include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap' _p_func(dev, fmt, ##__VA_ARGS__); \ ^~~~~~~~~~~ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:597:13: note: initialize the variable 'num_xcd' to silence this warning int num_xcd, partition_mode; ^ = 0 1 error generated.
Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
9a3ce1a7 |
| 12-May-2023 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdgpu: Do not access members of xcp w/o check (v2)
Not all the asic needs xcp. ensure check xcp availabity before accessing its member.
v2: add missing change in kfd_topology.c
Signed-off-by:
drm/amdgpu: Do not access members of xcp w/o check (v2)
Not all the asic needs xcp. ensure check xcp availabity before accessing its member.
v2: add missing change in kfd_topology.c
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
0409022c |
| 11-May-2023 |
Hawking Zhang <Hawking.Zhang@amd.com> |
drm/amdkfd: Fix null ptr access
Avoid access null xcp_mgr pointer.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <ale
drm/amdkfd: Fix null ptr access
Avoid access null xcp_mgr pointer.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
84b4dd3f |
| 31-Mar-2023 |
Philip Yang <Philip.Yang@amd.com> |
drm/amdkfd: Refactor migrate init to support partition switch
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in k
drm/amdkfd: Refactor migrate init to support partition switch
Rename smv_migrate_init to a better name kgd2kfd_init_zone_device because it setup zone devive pgmap for page migration and keep it in kfd_migrate.c to access static functions svm_migrate_pgmap_ops. Call it only once in amdgpu_device_ip_init after adev ip blocks are initialized, but before amdgpu_amdkfd_device_init initialize kfd nodes which enable SVM support based on pgmap.
svm_range_set_max_pages is called by kgd2kfd_device_init everytime after switching compute partition mode.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
25f50704 |
| 23-Mar-2023 |
Philip Yang <Philip.Yang@amd.com> |
drm/amdkfd: APU mode set max svm range pages
svm_migrate_init set the max svm range pages based on the KFD nodes partition size. APU mode don't init pgmap because there is no migration.
kgd2kfd_dev
drm/amdkfd: APU mode set max svm range pages
svm_migrate_init set the max svm range pages based on the KFD nodes partition size. APU mode don't init pgmap because there is no migration.
kgd2kfd_device_init calls svm_migrate_init after KFD nodes allocation and initialization.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
315e29ec |
| 20-Mar-2023 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Move local_mem_info to kfd_node
We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device.
v2: squash in f
drm/amdkfd: Move local_mem_info to kfd_node
We need to track memory usage on a per partition basis. To do that, store the local memory information in KFD node instead of kfd device.
v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
4c6ce75f |
| 26-Jan-2023 |
Philip Yang <Philip.Yang@amd.com> |
drm/amdkfd: Show KFD node memory partition info
Show KFD node memory partition id and size, add helper function KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used later to support memory
drm/amdkfd: Show KFD node memory partition info
Show KFD node memory partition id and size, add helper function KFD_XCP_MEMORY_SIZE to get kfd node memory size, will be used later to support memory accounting per partition.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c4050ff1 |
| 09-Feb-2023 |
Lijo Lazar <lijo.lazar@amd.com> |
drm/amdkfd: Use xcc mask for identifying xcc
Instead of start xcc id and number of xcc per node, use the xcc mask which is the mask of logical ids of xccs belonging to a parition.
Signed-off-by: Li
drm/amdkfd: Use xcc mask for identifying xcc
Instead of start xcc id and number of xcc per node, use the xcc mask which is the mask of logical ids of xccs belonging to a parition.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|