#
459ccca5 |
| 14-Apr-2022 |
Lang Yu <Lang.Yu@amd.com> |
drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h
To make kfd_flush_tlb_after_unmap visible in kfd_svm.c, move it into kfd_priv.h. And change it to an inline function.
Signed-off-by: Lang
drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h
To make kfd_flush_tlb_after_unmap visible in kfd_svm.c, move it into kfd_priv.h. And change it to an inline function.
Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
46d18d51 |
| 06-Apr-2022 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Cleanup IO links during KFD device removal
Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD
drm/amdkfd: Cleanup IO links during KFD device removal
Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal.
CC: Shuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: Shuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
8fde0248 |
| 25-Mar-2022 |
Philip Yang <Philip.Yang@amd.com> |
drm/amdkfd: Use atomic64_t type for pdd->tlb_seq
To support multi-thread update page table.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
drm/amdkfd: Use atomic64_t type for pdd->tlb_seq
To support multi-thread update page table.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
bffa91da |
| 17-Mar-2022 |
Christian König <christian.koenig@amd.com> |
drm/amdkfd: start using tlb_seq from the VM subsystem
Instead of trying to figure out if a TLB flush is necessary or not use the information provided by the VM subsystem now.
Signed-off-by: Christi
drm/amdkfd: start using tlb_seq from the VM subsystem
Instead of trying to figure out if a TLB flush is necessary or not use the information provided by the VM subsystem now.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
dc90f084 |
| 15-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pul
mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h.
Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
a0c5fd46 |
| 18-Feb-2022 |
Felix Kuehling <Felix.Kuehling@amd.com> |
drm/amdkfd: Use real device for messages
kfd_chardev() doesn't provide much useful information in dev_... messages on multi-GPU systems because there is only one KFD device, which doesn't correspond
drm/amdkfd: Use real device for messages
kfd_chardev() doesn't provide much useful information in dev_... messages on multi-GPU systems because there is only one KFD device, which doesn't correspond to any particular GPU. Use the actual GPU device to indicate the GPU that caused a message.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d5c83156 |
| 15-Feb-2022 |
Changcheng Deng <deng.changcheng@zte.com.cn> |
drm/amdkfd: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure.
drm/amdkfd: Replace zero-length array with flexible-array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members" for these cases. The older style of one-element or zero-length arrays should no longer be used. Reference: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d2cb0b21 |
| 10-Feb-2022 |
Jonathan Kim <jonathan.kim@amd.com> |
drm/amdkfd: remove unneeded unmap single queue option
The KFD only unmaps all queues, all dynamics queues or all process queues since RUN_LIST is mapped with all KFD queues.
There's no need to prov
drm/amdkfd: remove unneeded unmap single queue option
The KFD only unmaps all queues, all dynamics queues or all process queues since RUN_LIST is mapped with all KFD queues.
There's no need to provide a single type unmap so remove this option.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
2243f493 |
| 10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: Fix leftover errors and warnings
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warn
drm/amdkfd: Fix leftover errors and warnings
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
d87f36a0 |
| 10-Feb-2022 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: update SPDX license header
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj
drm/amdkfd: update SPDX license header
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
5bdd3eb2 |
| 04-Feb-2022 |
Mukul Joshi <mukul.joshi@amd.com> |
drm/amdkfd: Remove unused old debugger implementation
Cleanup the kfd code by removing the unused old debugger implementation. The address watch was only ever implemented in the upstream driver for
drm/amdkfd: Remove unused old debugger implementation
Cleanup the kfd code by removing the unused old debugger implementation. The address watch was only ever implemented in the upstream driver for GFXv7 (Kaveri). The user mode tools runtime using this API was never open-sourced. Work on the old debugger prototype that used this API has been discontinued years ago. Only a small piece of resetting wavefronts is kept and is moved to kfd_device_queue_manager.c.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
03e5b167 |
| 07-Feb-2022 |
Tao Zhou <tao.zhou1@amd.com> |
drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid
As the function is used in more different cases, use a more general name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Fel
drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid
As the function is used in more different cases, use a more general name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c2db32ce |
| 08-Nov-2021 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU prepare for svm resume
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to succe
drm/amdkfd: CRIU prepare for svm resume
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
08a987a8 |
| 02-Nov-2021 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU Discover svm ranges
A KFD process may contain a number of virtual address ranges for shared virtual memory management and each such range can have many SVM attributes spanning acros
drm/amdkfd: CRIU Discover svm ranges
A KFD process may contain a number of virtual address ranges for shared virtual memory management and each such range can have many SVM attributes spanning across various nodes within the process boundary. This change reports the total number of such SVM ranges and their total private data size by extending the PROCESS_INFO op of the the CRIU IOCTL to discover the svm ranges in the target process and a future patches brings in the required support for checkpoint and restore for SVM ranges.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
4717fe3d |
| 19-Nov-2021 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU checkpoint and restore xnack mode
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we
drm/amdkfd: CRIU checkpoint and restore xnack mode
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
bef153b7 |
| 09-Apr-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU implement gpu_id remapping
When doing a restore on a different node, the gpu_id's on the restore node may be different. But the user space application will still refer use the origi
drm/amdkfd: CRIU implement gpu_id remapping
When doing a restore on a different node, the gpu_id's on the restore node may be different. But the user space application will still refer use the original gpu_id's in the ioctl calls. Adding code to create a gpu id mapping so that kfd can determine actual gpu_id during the user ioctl's.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
40e8a766 |
| 05-Mar-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU checkpoint and restore events
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.co
drm/amdkfd: CRIU checkpoint and restore events
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
3a9822d7 |
| 25-Jan-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU checkpoint and restore queue control stack
Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore.
Reviewed-by: Felix Kuehling <Felix.Kuehlin
drm/amdkfd: CRIU checkpoint and restore queue control stack
Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
42c6c482 |
| 25-Jan-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU checkpoint and restore queue mqds
Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-
drm/amdkfd: CRIU checkpoint and restore queue mqds
Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
8668dfc3 |
| 25-Jan-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU restore queue ids
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd
drm/amdkfd: CRIU restore queue ids
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
626f7b31 |
| 25-Jan-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU add queues support
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore.
Reviewed-by:
drm/amdkfd: CRIU add queues support
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
cd9f7910 |
| 16-Aug-2021 |
David Yat Sin <david.yatsin@amd.com> |
drm/amdkfd: CRIU Implement KFD unpause operation
Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO op the queues will be stay in an evicted state. Once the plugin is done drai
drm/amdkfd: CRIU Implement KFD unpause operation
Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO op the queues will be stay in an evicted state. Once the plugin is done draining BO contents, it is safe to perform an UNPAUSE op for the queues to resume.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
011bbb03 |
| 11-Jan-2021 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU Implement KFD resume ioctl
This adds support to create userptr BOs on restore and introduces a new ioctl op to restart memory notifiers for the restored userptr BOs. When doing CRIU
drm/amdkfd: CRIU Implement KFD resume ioctl
This adds support to create userptr BOs on restore and introduces a new ioctl op to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the restore process i.e. criu_resume ioctl op is received, and the process is ready to be resumed. This ioctl is different from other KFD CRIU ioctls since its called by CRIU master restore process for all the target processes being resumed by CRIU.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v5.10 |
|
#
5ccbb057 |
| 30-Nov-2020 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU Implement KFD checkpoint ioctl
This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is retur
drm/amdkfd: CRIU Implement KFD checkpoint ioctl
This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is returned to user space plugin running under criu master context which then stores this info to recreate these buffer objects during a restore operation.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
36988070 |
| 24-Aug-2021 |
Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> |
drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can snapshot a running process and later restore it on same or a remote machine but
drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can snapshot a running process and later restore it on same or a remote machine but expects the processes that have a device file (e.g. GPU) associated with them, provide necessary driver support to assist CRIU and its extensible plugin interface. Thus, In order to support the Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute Kernel driver, needs to provide a set of new APIs that provide necessary VRAM metadata and its contents to a userspace component (CRIU plugin) that can store it in form of image files.
This introduces some new ioctls which will be used to checkpoint-Restore any KFD bound user process. KFD only allows ioctl calls from the same process that opened the KFD file descriptor. Since these ioctls are expected to be called from a KFD criu plugin which has elevated ptrace attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with the file descriptors so modify KFD to allow such calls.
(API redesigned by David Yat Sin) Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: David Yat Sin <david.yatsin@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|