xref: /openbmc/linux/Documentation/userspace-api/iommufd.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1658234deSKevin Tian.. SPDX-License-Identifier: GPL-2.0+
2658234deSKevin Tian
3658234deSKevin Tian=======
4658234deSKevin TianIOMMUFD
5658234deSKevin Tian=======
6658234deSKevin Tian
7658234deSKevin Tian:Author: Jason Gunthorpe
8658234deSKevin Tian:Author: Kevin Tian
9658234deSKevin Tian
10658234deSKevin TianOverview
11658234deSKevin Tian========
12658234deSKevin Tian
13658234deSKevin TianIOMMUFD is the user API to control the IOMMU subsystem as it relates to managing
14658234deSKevin TianIO page tables from userspace using file descriptors. It intends to be general
15658234deSKevin Tianand consumable by any driver that wants to expose DMA to userspace. These
16658234deSKevin Tiandrivers are eventually expected to deprecate any internal IOMMU logic
17658234deSKevin Tianthey may already/historically implement (e.g. vfio_iommu_type1.c).
18658234deSKevin Tian
19658234deSKevin TianAt minimum iommufd provides universal support of managing I/O address spaces and
20658234deSKevin TianI/O page tables for all IOMMUs, with room in the design to add non-generic
21658234deSKevin Tianfeatures to cater to specific hardware functionality.
22658234deSKevin Tian
23658234deSKevin TianIn this context the capital letter (IOMMUFD) refers to the subsystem while the
24658234deSKevin Tiansmall letter (iommufd) refers to the file descriptors created via /dev/iommu for
25658234deSKevin Tianuse by userspace.
26658234deSKevin Tian
27658234deSKevin TianKey Concepts
28658234deSKevin Tian============
29658234deSKevin Tian
30658234deSKevin TianUser Visible Objects
31658234deSKevin Tian--------------------
32658234deSKevin Tian
33658234deSKevin TianFollowing IOMMUFD objects are exposed to userspace:
34658234deSKevin Tian
35658234deSKevin Tian- IOMMUFD_OBJ_IOAS, representing an I/O address space (IOAS), allowing map/unmap
36658234deSKevin Tian  of user space memory into ranges of I/O Virtual Address (IOVA).
37658234deSKevin Tian
38658234deSKevin Tian  The IOAS is a functional replacement for the VFIO container, and like the VFIO
39658234deSKevin Tian  container it copies an IOVA map to a list of iommu_domains held within it.
40658234deSKevin Tian
41658234deSKevin Tian- IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an
42658234deSKevin Tian  external driver.
43658234deSKevin Tian
44658234deSKevin Tian- IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table
45658234deSKevin Tian  (i.e. a single struct iommu_domain) managed by the iommu driver.
46658234deSKevin Tian
47658234deSKevin Tian  The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and
48658234deSKevin Tian  it will synchronize its mapping with each member HW_PAGETABLE.
49658234deSKevin Tian
50658234deSKevin TianAll user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
51658234deSKevin Tian
52658234deSKevin TianThe diagram below shows relationship between user-visible objects and kernel
53658234deSKevin Tiandatastructures (external to iommufd), with numbers referred to operations
54658234deSKevin Tiancreating the objects and links::
55658234deSKevin Tian
56658234deSKevin Tian  _________________________________________________________
57658234deSKevin Tian |                         iommufd                         |
58658234deSKevin Tian |       [1]                                               |
59658234deSKevin Tian |  _________________                                      |
60658234deSKevin Tian | |                 |                                     |
61658234deSKevin Tian | |                 |                                     |
62658234deSKevin Tian | |                 |                                     |
63658234deSKevin Tian | |                 |                                     |
64658234deSKevin Tian | |                 |                                     |
65658234deSKevin Tian | |                 |                                     |
66658234deSKevin Tian | |                 |        [3]                 [2]      |
67658234deSKevin Tian | |                 |    ____________         __________  |
68658234deSKevin Tian | |      IOAS       |<--|            |<------|          | |
69658234deSKevin Tian | |                 |   |HW_PAGETABLE|       |  DEVICE  | |
70658234deSKevin Tian | |                 |   |____________|       |__________| |
71658234deSKevin Tian | |                 |         |                   |       |
72658234deSKevin Tian | |                 |         |                   |       |
73658234deSKevin Tian | |                 |         |                   |       |
74658234deSKevin Tian | |                 |         |                   |       |
75658234deSKevin Tian | |                 |         |                   |       |
76658234deSKevin Tian | |_________________|         |                   |       |
77658234deSKevin Tian |         |                   |                   |       |
78658234deSKevin Tian |_________|___________________|___________________|_______|
79658234deSKevin Tian           |                   |                   |
80658234deSKevin Tian           |              _____v______      _______v_____
81658234deSKevin Tian           | PFN storage |            |    |             |
82658234deSKevin Tian           |------------>|iommu_domain|    |struct device|
83658234deSKevin Tian                         |____________|    |_____________|
84658234deSKevin Tian
85658234deSKevin Tian1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can
86658234deSKevin Tian   hold multiple IOAS objects. IOAS is the most generic object and does not
87658234deSKevin Tian   expose interfaces that are specific to single IOMMU drivers. All operations
88658234deSKevin Tian   on the IOAS must operate equally on each of the iommu_domains inside of it.
89658234deSKevin Tian
90658234deSKevin Tian2. IOMMUFD_OBJ_DEVICE is created when an external driver calls the IOMMUFD kAPI
91658234deSKevin Tian   to bind a device to an iommufd. The driver is expected to implement a set of
92658234deSKevin Tian   ioctls to allow userspace to initiate the binding operation. Successful
93658234deSKevin Tian   completion of this operation establishes the desired DMA ownership over the
94658234deSKevin Tian   device. The driver must also set the driver_managed_dma flag and must not
95658234deSKevin Tian   touch the device until this operation succeeds.
96658234deSKevin Tian
97658234deSKevin Tian3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD
98658234deSKevin Tian   kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI
99658234deSKevin Tian   allows userspace to initiate the attaching operation. If a compatible
100658234deSKevin Tian   pagetable already exists then it is reused for the attachment. Otherwise a
101658234deSKevin Tian   new pagetable object and iommu_domain is created. Successful completion of
102658234deSKevin Tian   this operation sets up the linkages among IOAS, device and iommu_domain. Once
103658234deSKevin Tian   this completes the device could do DMA.
104658234deSKevin Tian
105658234deSKevin Tian   Every iommu_domain inside the IOAS is also represented to userspace as a
106658234deSKevin Tian   HW_PAGETABLE object.
107658234deSKevin Tian
108658234deSKevin Tian   .. note::
109658234deSKevin Tian
110658234deSKevin Tian      Future IOMMUFD updates will provide an API to create and manipulate the
111658234deSKevin Tian      HW_PAGETABLE directly.
112658234deSKevin Tian
113658234deSKevin TianA device can only bind to an iommufd due to DMA ownership claim and attach to at
114658234deSKevin Tianmost one IOAS object (no support of PASID yet).
115658234deSKevin Tian
116658234deSKevin TianKernel Datastructure
117658234deSKevin Tian--------------------
118658234deSKevin Tian
119658234deSKevin TianUser visible objects are backed by following datastructures:
120658234deSKevin Tian
121658234deSKevin Tian- iommufd_ioas for IOMMUFD_OBJ_IOAS.
122658234deSKevin Tian- iommufd_device for IOMMUFD_OBJ_DEVICE.
123658234deSKevin Tian- iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE.
124658234deSKevin Tian
125658234deSKevin TianSeveral terminologies when looking at these datastructures:
126658234deSKevin Tian
127658234deSKevin Tian- Automatic domain - refers to an iommu domain created automatically when
128658234deSKevin Tian  attaching a device to an IOAS object. This is compatible to the semantics of
129658234deSKevin Tian  VFIO type1.
130658234deSKevin Tian
131658234deSKevin Tian- Manual domain - refers to an iommu domain designated by the user as the
132658234deSKevin Tian  target pagetable to be attached to by a device. Though currently there are
133658234deSKevin Tian  no uAPIs to directly create such domain, the datastructure and algorithms
134658234deSKevin Tian  are ready for handling that use case.
135658234deSKevin Tian
136658234deSKevin Tian- In-kernel user - refers to something like a VFIO mdev that is using the
137658234deSKevin Tian  IOMMUFD access interface to access the IOAS. This starts by creating an
138658234deSKevin Tian  iommufd_access object that is similar to the domain binding a physical device
139658234deSKevin Tian  would do. The access object will then allow converting IOVA ranges into struct
140658234deSKevin Tian  page * lists, or doing direct read/write to an IOVA.
141658234deSKevin Tian
142658234deSKevin Tianiommufd_ioas serves as the metadata datastructure to manage how IOVA ranges are
143658234deSKevin Tianmapped to memory pages, composed of:
144658234deSKevin Tian
145658234deSKevin Tian- struct io_pagetable holding the IOVA map
146658234deSKevin Tian- struct iopt_area's representing populated portions of IOVA
147658234deSKevin Tian- struct iopt_pages representing the storage of PFNs
148658234deSKevin Tian- struct iommu_domain representing the IO page table in the IOMMU
149658234deSKevin Tian- struct iopt_pages_access representing in-kernel users of PFNs
150658234deSKevin Tian- struct xarray pinned_pfns holding a list of pages pinned by in-kernel users
151658234deSKevin Tian
152658234deSKevin TianEach iopt_pages represents a logical linear array of full PFNs. The PFNs are
153658234deSKevin Tianultimately derived from userspace VAs via an mm_struct. Once they have been
154658234deSKevin Tianpinned the PFNs are stored in IOPTEs of an iommu_domain or inside the pinned_pfns
155658234deSKevin Tianxarray if they have been pinned through an iommufd_access.
156658234deSKevin Tian
157658234deSKevin TianPFN have to be copied between all combinations of storage locations, depending
158658234deSKevin Tianon what domains are present and what kinds of in-kernel "software access" users
159658234deSKevin Tianexist. The mechanism ensures that a page is pinned only once.
160658234deSKevin Tian
161658234deSKevin TianAn io_pagetable is composed of iopt_areas pointing at iopt_pages, along with a
162658234deSKevin Tianlist of iommu_domains that mirror the IOVA to PFN map.
163658234deSKevin Tian
164658234deSKevin TianMultiple io_pagetable-s, through their iopt_area-s, can share a single
165658234deSKevin Tianiopt_pages which avoids multi-pinning and double accounting of page
166658234deSKevin Tianconsumption.
167658234deSKevin Tian
168*c1966bd1SRandy Dunlapiommufd_ioas is shareable between subsystems, e.g. VFIO and VDPA, as long as
169658234deSKevin Tiandevices managed by different subsystems are bound to a same iommufd.
170658234deSKevin Tian
171658234deSKevin TianIOMMUFD User API
172658234deSKevin Tian================
173658234deSKevin Tian
174658234deSKevin Tian.. kernel-doc:: include/uapi/linux/iommufd.h
175658234deSKevin Tian
176658234deSKevin TianIOMMUFD Kernel API
177658234deSKevin Tian==================
178658234deSKevin Tian
179658234deSKevin TianThe IOMMUFD kAPI is device-centric with group-related tricks managed behind the
180658234deSKevin Tianscene. This allows the external drivers calling such kAPI to implement a simple
181658234deSKevin Tiandevice-centric uAPI for connecting its device to an iommufd, instead of
182658234deSKevin Tianexplicitly imposing the group semantics in its uAPI as VFIO does.
183658234deSKevin Tian
184658234deSKevin Tian.. kernel-doc:: drivers/iommu/iommufd/device.c
185658234deSKevin Tian   :export:
186658234deSKevin Tian
187658234deSKevin Tian.. kernel-doc:: drivers/iommu/iommufd/main.c
188658234deSKevin Tian   :export:
189658234deSKevin Tian
190658234deSKevin TianVFIO and IOMMUFD
191658234deSKevin Tian----------------
192658234deSKevin Tian
193658234deSKevin TianConnecting a VFIO device to iommufd can be done in two ways.
194658234deSKevin Tian
195658234deSKevin TianFirst is a VFIO compatible way by directly implementing the /dev/vfio/vfio
196658234deSKevin Tiancontainer IOCTLs by mapping them into io_pagetable operations. Doing so allows
197658234deSKevin Tianthe use of iommufd in legacy VFIO applications by symlinking /dev/vfio/vfio to
198658234deSKevin Tian/dev/iommufd or extending VFIO to SET_CONTAINER using an iommufd instead of a
199658234deSKevin Tiancontainer fd.
200658234deSKevin Tian
201658234deSKevin TianThe second approach directly extends VFIO to support a new set of device-centric
202658234deSKevin Tianuser API based on aforementioned IOMMUFD kernel API. It requires userspace
203658234deSKevin Tianchange but better matches the IOMMUFD API semantics and easier to support new
204658234deSKevin Tianiommufd features when comparing it to the first approach.
205658234deSKevin Tian
206658234deSKevin TianCurrently both approaches are still work-in-progress.
207658234deSKevin Tian
208658234deSKevin TianThere are still a few gaps to be resolved to catch up with VFIO type1, as
209658234deSKevin Tiandocumented in iommufd_vfio_check_extension().
210658234deSKevin Tian
211658234deSKevin TianFuture TODOs
212658234deSKevin Tian============
213658234deSKevin Tian
214658234deSKevin TianCurrently IOMMUFD supports only kernel-managed I/O page table, similar to VFIO
215658234deSKevin Tiantype1. New features on the radar include:
216658234deSKevin Tian
217658234deSKevin Tian - Binding iommu_domain's to PASID/SSID
218658234deSKevin Tian - Userspace page tables, for ARM, x86 and S390
219658234deSKevin Tian - Kernel bypass'd invalidation of user page tables
220658234deSKevin Tian - Re-use of the KVM page table in the IOMMU
221658234deSKevin Tian - Dirty page tracking in the IOMMU
222658234deSKevin Tian - Runtime Increase/Decrease of IOPTE size
223658234deSKevin Tian - PRI support with faults resolved in userspace
224