Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
55dd4023 |
| 18-Aug-2023 |
Yi Liu <yi.l.liu@intel.com> |
iommufd: Add IOMMU_GET_HW_INFO
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.).
iommufd: Add IOMMU_GET_HW_INFO
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and need to be compatible with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information (a.k.a capability) for a given device. The returned data is vendor specific, userspace needs to decode it with the structure by the output @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Link: https://lore.kernel.org/r/20230818101033.4100-4-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.46 |
|
#
65aaca11 |
| 14-Aug-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Remove iommufd_ref_to_users()
This no longer has any callers, remove the function
Kevin noticed that after commit 99f98a7c0d69 ("iommufd: IOMMUFD_DESTROY should not increase the refcount")
iommufd: Remove iommufd_ref_to_users()
This no longer has any callers, remove the function
Kevin noticed that after commit 99f98a7c0d69 ("iommufd: IOMMUFD_DESTROY should not increase the refcount") there was only one other user and it turns out the rework in commit 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") got rid of the last one.
Link: https://lore.kernel.org/r/0-v1-abb31bedd888+c1-iommufd_ref_to_users_jgg@nvidia.com Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.45, v6.1.44, v6.1.43 |
|
#
23a1b46f |
| 02-Aug-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd/selftest: Make the mock iommu driver into a real driver
I've avoided doing this because there is no way to make this happen without an intrusion into the core code. Up till now this has avoi
iommufd/selftest: Make the mock iommu driver into a real driver
I've avoided doing this because there is no way to make this happen without an intrusion into the core code. Up till now this has avoided needing the core code's probe path with some hackery - but now that default domains are becoming mandatory it is unavoidable.
This became a serious problem when the core code stopped allowing partially registered iommu drivers in commit 14891af3799e ("iommu: Move the iommu driver sysfs setup into iommu_init/deinit_device()") which breaks the selftest. That series was developed along with a second series that contained this patch so it was not noticed.
Make it so that iommufd selftest can create a real iommu driver and bind it only to is own private bus. Add iommu_device_register_bus() as a core code helper to make this possible. It simply sets the right pointers and registers the notifier block. The mock driver then works like any normal driver should, with probe triggered by the bus ops
When the bus->iommu_ops stuff is fully unwound we can probably do better here and remove this special case.
Link: https://lore.kernel.org/r/15-v6-e8114faedade+425-iommu_all_defdom_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
5d5c85ff |
| 28-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
This is a preparatory change for ioas replacement support for accesses. The replacement routine does an iopt_add_access() for a
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
This is a preparatory change for ioas replacement support for accesses. The replacement routine does an iopt_add_access() for a new IOAS first and then iopt_remove_access() for the old IOAS upon the success of the first call. However, the first call overrides the iopt_access_list_id in the access struct, resulting in iopt_remove_access() being unable to work on the old IOAS.
Add an iopt_access_list_id as a parameter to iopt_remove_access, so the replacement routine can save the id before it gets overwritten. Pass the id in iopt_remove_access() for a proper cleanup.
The existing callers should just pass in access->iopt_access_list_id.
Link: https://lore.kernel.org/r/7bb939b9e0102da0c099572bb3de78ab7622221e.1690523699.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.42 |
|
#
99f98a7c |
| 25-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: IOMMUFD_DESTROY should not increase the refcount
syzkaller found a race where IOMMUFD_DESTROY increments the refcount:
obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY)
iommufd: IOMMUFD_DESTROY should not increase the refcount
syzkaller found a race where IOMMUFD_DESTROY increments the refcount:
obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY); if (IS_ERR(obj)) return PTR_ERR(obj); iommufd_ref_to_users(obj); /* See iommufd_ref_to_users() */ if (!iommufd_object_destroy_user(ucmd->ictx, obj))
As part of the sequence to join the two existing primitives together.
Allowing the refcount the be elevated without holding the destroy_rwsem violates the assumption that all temporary refcount elevations are protected by destroy_rwsem. Racing IOMMUFD_DESTROY with iommufd_object_destroy_user() will cause spurious failures:
WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477 iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478 Modules linked in: CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 RIP: 0010:iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:477 Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89 fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41 RSP: 0018:ffffc90003067e08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500 R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88 R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe FS: 00007ff7254fe6c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0 Call Trace: <TASK> iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596 [inline] iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813 iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The solution is to not increment the refcount on the IOMMUFD_DESTROY path at all. Instead use the xa_lock to serialize everything. The refcount check == 1 and xa_erase can be done under a single critical region. This avoids the need for any refcount incrementing.
It has the downside that if userspace races destroy with other operations it will get an EBUSY instead of waiting, but this is kind of racing is already dangerous.
Fixes: 2ff4bed7fee7 ("iommufd: File descriptor, context, kconfig and makefiles") Link: https://lore.kernel.org/r/2-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+7574ebfe589049630608@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.41, v6.1.40, v6.1.39 |
|
#
7074d7bd |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add IOMMU_HWPT_ALLOC
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace().
Following series will extend this to al
iommufd: Add IOMMU_HWPT_ALLOC
This allows userspace to manually create HWPTs on IOAS's and then use those HWPTs as inputs to iommufd_device_attach/replace().
Following series will extend this to allow creating iommu_domains with driver specific parameters.
Link: https://lore.kernel.org/r/17-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
83f7bc6f |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Make destroy_rwsem use a lock class per object type
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfor
iommufd: Make destroy_rwsem use a lock class per object type
The selftest invokes things like replace under the object lock of its idev which protects the idev in a similar way to a real user. Unfortunately this triggers lockdep. A lock class per type will solve the problem.
Link: https://lore.kernel.org/r/15-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
70eadc7f |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Allow a hwpt to be aborted after allocation
During creation the hwpt must have the ioas->mutex held until the object is finalized. This means we need to be able to call iommufd_object_abort
iommufd: Allow a hwpt to be aborted after allocation
During creation the hwpt must have the ioas->mutex held until the object is finalized. This means we need to be able to call iommufd_object_abort_and_destroy() while holding the mutex.
Since iommufd_hw_pagetable_destroy() also needs the mutex this is problematic.
Fix it by creating a special abort op for the object that can assume the caller is holding the lock, as required by the contract.
The next patch will add another iommufd_object_abort_and_destroy() for a hwpt.
Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/10-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
17bad527 |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
Logically the HWPT should have the coherency set properly for the device that it is being created for when it is created.
This
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
Logically the HWPT should have the coherency set properly for the device that it is being created for when it is created.
This was happening implicitly if the immediate_attach was set because iommufd_hw_pagetable_attach() does it as the first thing.
Do it unconditionally so !immediate_attach works properly.
Link: https://lore.kernel.org/r/9-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
d03f1336 |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Move putting a hwpt to a helper function
Next patch will need to call this from two places.
Link: https://lore.kernel.org/r/8-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by:
iommufd: Move putting a hwpt to a helper function
Next patch will need to call this from two places.
Link: https://lore.kernel.org/r/8-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
1d149ab2 |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Make sw_msi_start a group global
The sw_msi_start is only set by the ARM drivers and it is always constant. Due to the way vfio/iommufd allow domains to be re-used between devices we have a
iommufd: Make sw_msi_start a group global
The sw_msi_start is only set by the ARM drivers and it is always constant. Due to the way vfio/iommufd allow domains to be re-used between devices we have a built in assumption that there is only one value for sw_msi_start and it is global to the system.
To make replace simpler where we may not reparse the iommu_get_resv_regions() move the sw_msi_start to the iommufd_group so it is always available once any HWPT has been attached.
Link: https://lore.kernel.org/r/7-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
34f327a9 |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Keep track of each device's reserved regions instead of groups
The driver facing API in the iommu core makes the reserved regions per-device. An algorithm in the core code consolidates the
iommufd: Keep track of each device's reserved regions instead of groups
The driver facing API in the iommu core makes the reserved regions per-device. An algorithm in the core code consolidates the regions of all the devices in a group to return the group view.
To allow for devices to be hotplugged into the group iommufd would re-load the entire group's reserved regions for each device, just in case they changed.
Further iommufd already has to deal with duplicated/overlapping reserved regions as it must union all the groups together.
Thus simplify all of this to just use the device reserved regions interface directly from the iommu driver.
Link: https://lore.kernel.org/r/5-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
91a2e17e |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Replace the hwpt->devices list with iommufd_group
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-
iommufd: Replace the hwpt->devices list with iommufd_group
The devices list was used as a simple way to avoid having per-group information. Now that this seems to be unavoidable, just commit to per-group information fully and remove the devices list from the HWPT.
The iommufd_group stores the currently assigned HWPT for the entire group and we can manage the per-device attach/detach with a list in the iommufd_group.
For destruction the flow is organized to make the following patches easier, the actual call to iommufd_object_destroy_user() is done at the top of the call chain without holding any locks. The HWPT to be destroyed is returned out from the locked region to make this possible. Later patches create locking that requires this.
Link: https://lore.kernel.org/r/3-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
3a3329a7 |
| 17-Jul-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add iommufd_group
When the hwpt to device attachment is fairly static we could get away with the simple approach of keeping track of the groups via a device list. But with replace this is i
iommufd: Add iommufd_group
When the hwpt to device attachment is fairly static we could get away with the simple approach of keeping track of the groups via a device list. But with replace this is infeasible.
Add an automatically managed struct that is 1:1 with the iommu_group per-ictx so we can store the necessary tracking information there.
Link: https://lore.kernel.org/r/2-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
e23a6217 |
| 18-Jul-2023 |
Nicolin Chen <nicolinc@nvidia.com> |
iommufd/device: Add iommufd_access_detach() API
Previously, the detach routine is only done by the destroy(). And it was called by vfio_iommufd_emulated_unbind() when the device runs close(), so all
iommufd/device: Add iommufd_access_detach() API
Previously, the detach routine is only done by the destroy(). And it was called by vfio_iommufd_emulated_unbind() when the device runs close(), so all the mappings in iopt were cleaned in that setup, when the call trace reaches this detach() routine.
Now, there's a need of a detach uAPI, meaning that it does not only need a new iommufd_access_detach() API, but also requires access->ops->unmap() call as a cleanup. So add one.
However, leaving that unprotected can introduce some potential of a race condition during the pin_/unpin_pages() call, where access->ioas->iopt is getting referenced. So, add an ioas_lock to protect the context of iopt referencings.
Also, to allow the iommufd_access_unpin_pages() callback to happen via this unmap() call, add an ioas_unpin pointer, so the unpin routine won't be affected by the "access->ioas = NULL" trick.
Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-15-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22 |
|
#
325de950 |
| 27-Mar-2023 |
Yi Liu <yi.l.liu@intel.com> |
iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()
No need to pass the iommufd_ucmd pointer.
Link: https://lore.kernel.org/r/20230327093351.44505-2-yi.l.liu@intel.com Signed-off-by: Yi L
iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()
No need to pass the iommufd_ucmd pointer.
Link: https://lore.kernel.org/r/20230327093351.44505-2-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15 |
|
#
65c619ae |
| 01-Mar-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd/selftest: Make selftest create a more complete mock device
iommufd wants to use more infrastructure, like the iommu_group, that the mock device does not support. Create a more complete mock
iommufd/selftest: Make selftest create a more complete mock device
iommufd wants to use more infrastructure, like the iommu_group, that the mock device does not support. Create a more complete mock device that can go through the whole cycle of ownership, blocking domain, and has an iommu_group.
This requires creating a real struct device on a real bus to be able to connect it to a iommu_group. Unfortunately we cannot formally attach the mock iommu driver as an actual driver as the iommu core does not allow more than one driver or provide a general way for busses to link to iommus. This can be solved with a little hack to open code the dev_iommus struct.
With this infrastructure things work exactly the same as the normal domain path, including the auto domains mechanism and direct attach of hwpts. As the created hwpt is now an autodomain it is no longer required to destroy it and trying to do so will trigger a failure.
Link: https://lore.kernel.org/r/11-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
339fbf3a |
| 01-Mar-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain()
The HWPT is always linked to an IOAS and once a HWPT exists its domain should be fully mapped. This ended up being split up into
iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain()
The HWPT is always linked to an IOAS and once a HWPT exists its domain should be fully mapped. This ended up being split up into device.c during a two phase creation that was a bit confusing.
Move the iopt_table_add_domain() into iommufd_hw_pagetable_alloc() by having it call back to device.c to complete the domain attach in the required order.
Calling iommufd_hw_pagetable_alloc() with immediate_attach = false will work on most drivers, but notably the SMMU drivers will fail because they can't decide what kind of domain to create until they are attached. This will be fixed when the domain_alloc function can take in a struct device.
Link: https://lore.kernel.org/r/6-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
7e7ec8a5 |
| 01-Mar-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Move iommufd_device to iommufd_private.h
hw_pagetable.c will need this in the next patches.
Link: https://lore.kernel.org/r/5-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by:
iommufd: Move iommufd_device to iommufd_private.h
hw_pagetable.c will need this in the next patches.
Link: https://lore.kernel.org/r/5-v3-ae9c2975a131+2e1e8-iommufd_hwpt_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
c9a397ce |
| 18-Jan-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
vfio: Support VFIO_NOIOMMU with iommufd
Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a no-iommu
vfio: Support VFIO_NOIOMMU with iommufd
Add a small amount of emulation to vfio_compat to accept the SET_IOMMU to VFIO_NOIOMMU_IOMMU and have vfio just ignore iommufd if it is working on a no-iommu enabled device.
Move the enable_unsafe_noiommu_mode module out of container.c into vfio_main.c so that it is always available even if VFIO_CONTAINER=n.
This passes Alex's mini-test:
https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c
Link: https://lore.kernel.org/r/0-v3-480cd64a16f7+1ad0-iommufd_noiommu_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11 |
|
#
f4b20bb3 |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add kernel support for testing iommufd
Provide a mock kernel module for the iommu_domain that allows it to run without any HW and the mocking provides a way to directly validate that the PF
iommufd: Add kernel support for testing iommufd
Provide a mock kernel module for the iommu_domain that allows it to run without any HW and the mocking provides a way to directly validate that the PFNs loaded into the iommu_domain are correct. This exposes the access kAPI toward userspace to allow userspace to explore the functionality of pages.c and io_pagetable.c
The mock also simulates the rare case of PAGE_SIZE > iommu page size as the mock will operate at a 2K iommu page size. This allows exercising all of the calculations to support this mismatch.
This is also intended to support syzkaller exploring the same space.
However, it is an unusually invasive config option to enable all of this. The config option should not be enabled in a production kernel.
Link: https://lore.kernel.org/r/16-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
d624d665 |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: vfio container FD ioctl compatibility
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into io_pagetable operations.
A userspace application can test agai
iommufd: vfio container FD ioctl compatibility
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into io_pagetable operations.
A userspace application can test against iommufd and confirm compatibility then simply make a small change to open /dev/iommu instead of /dev/vfio/vfio.
For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and then all applications will use the compatibility path with no code changes. A later series allows /dev/vfio/vfio to be directly provided by iommufd, which allows the rlimit mode to work the same as well.
This series just provides the iommufd side of compatibility. Actually linking this to VFIO_SET_CONTAINER is a followup series, with a link in the cover letter.
Internally the compatibility API uses a normal IOAS object that, like vfio, is automatically allocated when the first device is attached.
Userspace can also query or set this IOAS object directly using the IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only features while still using the VFIO style map/unmap ioctls.
While this is enough to operate qemu, it has a few differences:
- Resource limits rely on memory cgroups to bound what userspace can do instead of the module parameter dma_entry_limit.
- VFIO P2P is not implemented. The DMABUF patches for vfio are a start at a solution where iommufd would import a special DMABUF. This is to avoid further propogating the follow_pfn() security problem.
- A full audit for pedantic compatibility details (eg errnos, etc) has not yet been done
- powerpc SPAPR is left out, as it is not connected to the iommu_domain framework. It seems interest in SPAPR is minimal as it is currently non-working in v6.1-rc1. They will have to convert to the iommu subsystem framework to enjoy iommfd.
The following are not going to be implemented and we expect to remove them from VFIO type1:
- SW access 'dirty tracking'. As discussed in the cover letter this will be done in VFIO.
- VFIO_TYPE1_NESTING_IOMMU https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
- VFIO_DMA_MAP_FLAG_VADDR https://lore.kernel.org/all/Yz777bJZjTyLrHEQ@nvidia.com/
Link: https://lore.kernel.org/r/15-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
8d40205f |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add kAPI toward external drivers for kernel access
Kernel access is the mode that VFIO "mdevs" use. In this case there is no struct device and no IOMMU connection. iommufd acts as a record
iommufd: Add kAPI toward external drivers for kernel access
Kernel access is the mode that VFIO "mdevs" use. In this case there is no struct device and no IOMMU connection. iommufd acts as a record keeper for accesses and returns the actual struct pages back to the caller to use however they need. eg with kmap or the DMA API.
Each caller must create a struct iommufd_access with iommufd_access_create(), similar to how iommufd_device_bind() works. Using this struct the caller can access blocks of IOVA using iommufd_access_pin_pages() or iommufd_access_rw().
Callers must provide a callback that immediately unpins any IOVA being used within a range. This happens if userspace unmaps the IOVA under the pin.
The implementation forwards the access requests directly to the iopt infrastructure that manages the iopt_pages_access.
Link: https://lore.kernel.org/r/14-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
e8d57210 |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add kAPI toward external drivers for physical devices
Add the four functions external drivers need to connect physical DMA to the IOMMUFD:
iommufd_device_bind() / iommufd_device_unbind()
iommufd: Add kAPI toward external drivers for physical devices
Add the four functions external drivers need to connect physical DMA to the IOMMUFD:
iommufd_device_bind() / iommufd_device_unbind() Register the device with iommufd and establish security isolation.
iommufd_device_attach() / iommufd_device_detach() Connect a bound device to a page table
Binding a device creates a device object ID in the uAPI, however the generic API does not yet provide any IOCTLs to manipulate them.
Link: https://lore.kernel.org/r/13-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
ea4acfac |
| 29-Nov-2022 |
Jason Gunthorpe <jgg@nvidia.com> |
iommufd: Add a HW pagetable object
The hw_pagetable object exposes the internal struct iommu_domain's to userspace. An iommu_domain is required when any DMA device attaches to an IOAS to control the
iommufd: Add a HW pagetable object
The hw_pagetable object exposes the internal struct iommu_domain's to userspace. An iommu_domain is required when any DMA device attaches to an IOAS to control the io page table through the iommu driver.
For compatibility with VFIO the hw_pagetable is automatically created when a DMA device is attached to the IOAS. If a compatible iommu_domain already exists then the hw_pagetable associated with it is used for the attachment.
In the initial series there is no iommufd uAPI for the hw_pagetable object. The next patch provides driver facing APIs for IO page table attachment that allows drivers to accept either an IOAS or a hw_pagetable ID and for the driver to return the hw_pagetable ID that was auto-selected from an IOAS. The expectation is the driver will provide uAPI through its own FD for attaching its device to iommufd. This allows userspace to learn the mapping of devices to iommu_domains and to override the automatic attachment.
The future HW specific interface will allow userspace to create hw_pagetable objects using iommu_domains with IOMMU driver specific parameters. This infrastructure will allow linking those domains to IOAS's and devices.
Link: https://lore.kernel.org/r/12-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|