eb501c2d | 16-Aug-2023 |
Yang Yingliang <yangyingliang@huawei.com> |
iommufd/selftest: Don't leak the platform device memory when unloading the module
It should call platform_device_unregister() instead of platform_device_del() to unregister and free the device.
Fix
iommufd/selftest: Don't leak the platform device memory when unloading the module
It should call platform_device_unregister() instead of platform_device_del() to unregister and free the device.
Fixes: 23a1b46f15d5 ("iommufd/selftest: Make the mock iommu driver into a real driver") Link: https://lore.kernel.org/r/20230816081318.1232865-1-yangyingliang@huawei.com Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
55dd4023 | 18-Aug-2023 |
Yi Liu <yi.l.liu@intel.com> |
iommufd: Add IOMMU_GET_HW_INFO
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.).
iommufd: Add IOMMU_GET_HW_INFO
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and need to be compatible with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information (a.k.a capability) for a given device. The returned data is vendor specific, userspace needs to decode it with the structure by the output @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Link: https://lore.kernel.org/r/20230818101033.4100-4-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
a881b496 | 09-Aug-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
vfio: align capability structures
The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability structs:
+------+---------+--
vfio: align capability structures
The VFIO_DEVICE_GET_INFO, VFIO_DEVICE_GET_REGION_INFO, and VFIO_IOMMU_GET_INFO ioctls fill in an info struct followed by capability structs:
+------+---------+---------+-----+ | info | caps[0] | caps[1] | ... | +------+---------+---------+-----+
Both the info and capability struct sizes are not always multiples of sizeof(u64), leaving u64 fields in later capability structs misaligned.
Userspace applications currently need to handle misalignment manually in order to support CPU architectures and programming languages with strict alignment requirements.
Make life easier for userspace by ensuring alignment in the kernel. This is done by padding info struct definitions and by copying out zeroes after capability structs that are not aligned.
The new layout is as follows:
+------+---------+---+---------+-----+ | info | caps[0] | 0 | caps[1] | ... | +------+---------+---+---------+-----+
In this example caps[0] has a size that is not multiples of sizeof(u64), so zero padding is added to align the subsequent structure.
Adding zero padding between structs does not break the uapi. The memory layout is specified by the info.cap_offset and caps[i].next fields filled in by the kernel. Applications use these field values to locate structs and are therefore unaffected by the addition of zero padding.
Note that code that copies out info structs with padding is updated to always zero the struct and copy out as many bytes as userspace requested. This makes the code shorter and avoids potential information leaks by ensuring padding is initialized.
Originally-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230809203144.2880050-1-stefanha@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
show more ...
|
65aaca11 | 14-Aug-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Remove iommufd_ref_to_users()
This no longer has any callers, remove the function
Kevin noticed that after commit 99f98a7c0d69 ("iommufd: IOMMUFD_DESTROY should not increase the refcount")
iommufd: Remove iommufd_ref_to_users()
This no longer has any callers, remove the function
Kevin noticed that after commit 99f98a7c0d69 ("iommufd: IOMMUFD_DESTROY should not increase the refcount") there was only one other user and it turns out the rework in commit 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") got rid of the last one.
Link: https://lore.kernel.org/r/0-v1-abb31bedd888+c1-iommufd_ref_to_users_jgg@nvidia.com Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
23a1b46f | 02-Aug-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd/selftest: Make the mock iommu driver into a real driver
I've avoided doing this because there is no way to make this happen without an intrusion into the core code. Up till now this has avoi
iommufd/selftest: Make the mock iommu driver into a real driver
I've avoided doing this because there is no way to make this happen without an intrusion into the core code. Up till now this has avoided needing the core code's probe path with some hackery - but now that default domains are becoming mandatory it is unavoidable.
This became a serious problem when the core code stopped allowing partially registered iommu drivers in commit 14891af3799e ("iommu: Move the iommu driver sysfs setup into iommu_init/deinit_device()") which breaks the selftest. That series was developed along with a second series that contained this patch so it was not noticed.
Make it so that iommufd selftest can create a real iommu driver and bind it only to is own private bus. Add iommu_device_register_bus() as a core code helper to make this possible. It simply sets the right pointers and registers the notifier block. The mock driver then works like any normal driver should, with probe triggered by the bus ops
When the bus->iommu_ops stuff is fully unwound we can probably do better here and remove this special case.
Link: https://lore.kernel.org/r/15-v6-e8114faedade+425-iommu_all_defdom_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
c154660b | 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the access->ioas, corresponding to the iommufd_access_replace() helper
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the access->ioas, corresponding to the iommufd_access_replace() helper.
Then add replace coverage as a part of user_copy test case, which basically repeats the copy test after replacing the old ioas with a new one.
Link: https://lore.kernel.org/r/a4897f93d41c34b972213243b8dbf4c3832842e4.1690523699.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
70c16123 | 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd: Add iommufd_access_replace() API
Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for the VFIO emulated pathway to use.
Link: https://l
iommufd: Add iommufd_access_replace() API
Taking advantage of the new iommufd_access_change_ioas_id helper, add an iommufd_access_replace() API for the VFIO emulated pathway to use.
Link: https://lore.kernel.org/r/a3267b924fd5f45e0d3a1dd13a9237e923563862.1690523699.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
6129b59f | 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd: Use iommufd_access_change_ioas in iommufd_access_destroy_object
Update iommufd_access_destroy_object() to call the new iommufd_access_change_ioas() helper.
It is impossible to legitimately
iommufd: Use iommufd_access_change_ioas in iommufd_access_destroy_object
Update iommufd_access_destroy_object() to call the new iommufd_access_change_ioas() helper.
It is impossible to legitimately race iommufd_access_destroy_object() with iommufd_access_change_ioas() as iommufd_access_destroy_object() is only called once the refcount reache zero, so any concurrent iommufd_access_change_ioas() is already UAFing the memory.
Link: https://lore.kernel.org/r/f9fbeca2cde7f8515da18d689b3e02a6a40a5e14.1690523699.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
9227da78 | 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd: Add iommufd_access_change_ioas(_id) helpers
The complication of the mutex and refcount will be amplified after we introduce the replace support for accesses. So, add a preparatory change of
iommufd: Add iommufd_access_change_ioas(_id) helpers
The complication of the mutex and refcount will be amplified after we introduce the replace support for accesses. So, add a preparatory change of a constitutive helper iommufd_access_change_ioas() and its wrapper iommufd_access_change_ioas_id(). They can simply take care of existing iommufd_access_attach() and iommufd_access_detach(), properly sequencing the refcount puts so that they are truely at the end of the sequence after we know the IOAS pointer is not required any more.
Link: https://lore.kernel.org/r/da0c462532193b447329c4eb975a596f47e49b70.1690523699.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
5d5c85ff | 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
This is a preparatory change for ioas replacement support for accesses. The replacement routine does an iopt_add_access() for a
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
This is a preparatory change for ioas replacement support for accesses. The replacement routine does an iopt_add_access() for a new IOAS first and then iopt_remove_access() for the old IOAS upon the success of the first call. However, the first call overrides the iopt_access_list_id in the access struct, resulting in iopt_remove_access() being unable to work on the old IOAS.
Add an iopt_access_list_id as a parameter to iopt_remove_access, so the replacement routine can save the id before it gets overwritten. Pass the id in iopt_remove_access() for a proper cleanup.
The existing callers should just pass in access->iopt_access_list_id.
Link: https://lore.kernel.org/r/7bb939b9e0102da0c099572bb3de78ab7622221e.1690523699.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
b7c822fa | 25-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Set end correctly when doing batch carry
Even though the test suite covers this it somehow became obscured that this wasn't working.
The test iommufd_ioas.mock_domain.access_domain_destory
iommufd: Set end correctly when doing batch carry
Even though the test suite covers this it somehow became obscured that this wasn't working.
The test iommufd_ioas.mock_domain.access_domain_destory would blow up rarely.
end should be set to 1 because this just pushed an item, the carry, to the pfns list.
Sometimes the test would blow up with:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:batch_unpin+0xa2/0x100 [iommufd] Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89 ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc RSP: 0018:ffffc90001677a58 EFLAGS: 00010246 RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200 R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001 R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe FS: 00007f9323679740(0000) GS:ffff8881ba540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x5c/0x70 ? __die+0x1f/0x60 ? page_fault_oops+0x15d/0x440 ? lock_release+0xbc/0x240 ? exc_page_fault+0x4a4/0x970 ? asm_exc_page_fault+0x27/0x30 ? batch_unpin+0xa2/0x100 [iommufd] ? batch_unpin+0xba/0x100 [iommufd] __iopt_area_unfill_domain+0x198/0x430 [iommufd] ? __mutex_lock+0x8c/0xb80 ? __mutex_lock+0x6aa/0xb80 ? xa_erase+0x28/0x30 ? iopt_table_remove_domain+0x162/0x320 [iommufd] ? lock_release+0xbc/0x240 iopt_area_unfill_domain+0xd/0x10 [iommufd] iopt_table_remove_domain+0x195/0x320 [iommufd] iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd] iommufd_object_destroy_user+0x8e/0xf0 [iommufd] iommufd_device_detach+0xc5/0x140 [iommufd] iommufd_selftest_destroy+0x1f/0x70 [iommufd] iommufd_object_destroy_user+0x8e/0xf0 [iommufd] iommufd_destroy+0x3a/0x50 [iommufd] iommufd_fops_ioctl+0xfb/0x170 [iommufd] __x64_sys_ioctl+0x40d/0x9a0 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lore.kernel.org/r/3-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com Cc: <stable@vger.kernel.org> Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reported-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
7074d7bd | 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add IOMMU_HWPT_ALLOC
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace().
Following series will extend this to al
iommufd: Add IOMMU_HWPT_ALLOC
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace().
Following series will extend this to allow creating iommu_domains with driver specific parameters.
Link: https://lore.kernel.org/r/17-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
fa1ffdb9 | 17-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd/selftest: Test iommufd_device_replace()
Allow the selftest to call the function on the mock idev, add some tests to exercise it.
Link: https://lore.kernel.org/r/16-v8-6659224517ea+532-iommu
iommufd/selftest: Test iommufd_device_replace()
Allow the selftest to call the function on the mock idev, add some tests to exercise it.
Link: https://lore.kernel.org/r/16-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
83f7bc6f | 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Make destroy_rwsem use a lock class per object type
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfor
iommufd: Make destroy_rwsem use a lock class per object type
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfortunately this triggers lockdep. A lock class per type will solve the problem.
Link: https://lore.kernel.org/r/15-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
e88d4ec1 | 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add iommufd_device_replace()
Replace allows all the devices in a group to move in one step to a new HWPT. Further, the HWPT move is done without going through a blocking domain so that the
iommufd: Add iommufd_device_replace()
Replace allows all the devices in a group to move in one step to a new HWPT. Further, the HWPT move is done without going through a blocking domain so that the IOMMU driver can implement some level of non-distruption to ongoing DMA if that has meaning for it (eg for future special driver domains)
Replace uses a lot of the same logic as normal attach, except the actual domain change over has different restrictions, and we are careful to sequence things so that failure is going to leave everything the way it was, and not get trapped in a blocking domain or something if there is ENOMEM.
Link: https://lore.kernel.org/r/14-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|