#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
36770c04 |
| 29-Oct-2021 |
Alex Sierra <alex.sierra@amd.com> |
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't
drm/amdkfd: avoid recursive lock in migrations back to RAM
[ Upstream commit a6283010e2907a5576f96b839e1a1c82659f137c ]
[Why]: When we call hmm_range_fault to map memory after a migration, we don't expect memory to be migrated again as a result of hmm_range_fault. The driver ensures that all memory is in GPU-accessible locations so that no migration should be needed. However, there is one corner case where hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE back to system memory due to a write-fault when a system memory page in the same range was mapped read-only (e.g. COW). Ranges with individual pages in different locations are usually the result of failed page migrations (e.g. page lock contention). The unexpected migration back to system memory causes a deadlock from recursive locking in our driver.
[How]: Creating a task reference new member under svm_range_list struct. Setting this with "current" reference, right before the hmm_range_fault is called. This member is checked against "current" reference at svm_migrate_to_ram callback function. If equal, the migration will be ignored.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Stable-dep-of: 5b994354af3c ("drm/amdkfd: Fix NULL pointer dereference in svm_migrate_to_ram()") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|