xref: /openbmc/qemu/docs/devel/vfio-iommufd.rst (revision dd7d3e35401f80ffef4e209fa9e27db9087501b0)
1===============================
2IOMMUFD BACKEND usage with VFIO
3===============================
4
5(Same meaning for backend/container/BE)
6
7With the introduction of iommufd, the Linux kernel provides a generic
8interface for user space drivers to propagate their DMA mappings to kernel
9for assigned devices. While the legacy kernel interface is group-centric,
10the new iommufd interface is device-centric, relying on device fd and iommufd.
11
12To support both interfaces in the QEMU VFIO device, introduce a base container
13to abstract the common part of VFIO legacy and iommufd container. So that the
14generic VFIO code can use either container.
15
16The base container implements generic functions such as memory_listener and
17address space management whereas the derived container implements callbacks
18specific to either legacy or iommufd. Each container has its own way to setup
19secure context and dma management interface. The below diagram shows how it
20looks like with both containers.
21
22::
23
24                      VFIO                           AddressSpace/Memory
25      +-------+  +----------+  +-----+  +-----+
26      |  pci  |  | platform |  |  ap |  | ccw |
27      +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
28          |           |           |        |        |   AddressSpace       |
29          |           |           |        |        +------------+---------+
30      +---V-----------V-----------V--------V----+               /
31      |           VFIOAddressSpace              | <------------+
32      |                  |                      |  MemoryListener
33      |        VFIOContainerBase list           |
34      +-------+----------------------------+----+
35              |                            |
36              |                            |
37      +-------V------+            +--------V----------+
38      |   iommufd    |            |    vfio legacy    |
39      |  container   |            |     container     |
40      +-------+------+            +--------+----------+
41              |                            |
42              | /dev/iommu                 | /dev/vfio/vfio
43              | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
44  Userspace   |                            |
45  ============+============================+===========================
46  Kernel      |  device fd                 |
47              +---------------+            | group/container fd
48              | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
49              |  ATTACH_IOAS) |            | device fd
50              |               |            |
51              |       +-------V------------V-----------------+
52      iommufd |       |                vfio                  |
53  (map/unmap  |       +---------+--------------------+-------+
54  ioas_copy)  |                 |                    | map/unmap
55              |                 |                    |
56       +------V------+    +-----V------+      +------V--------+
57       | iommfd core |    |  device    |      |  vfio iommu   |
58       +-------------+    +------------+      +---------------+
59
60* Secure Context setup
61
62  - iommufd BE: uses device fd and iommufd to setup secure context
63    (bind_iommufd, attach_ioas)
64  - vfio legacy BE: uses group fd and container fd to setup secure context
65    (set_container, set_iommu)
66
67* Device access
68
69  - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
70  - vfio legacy BE: device fd is retrieved from group fd ioctl
71
72* DMA Mapping flow
73
74  1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
75  2. VFIO populates DMA map/unmap via the container BEs
76     * iommufd BE: uses iommufd
77     * vfio legacy BE: uses container fd
78
79Example configuration
80=====================
81
82Step 1: configure the host device
83---------------------------------
84
85It's exactly same as the VFIO device with legacy VFIO container.
86
87Step 2: configure QEMU
88----------------------
89
90Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
91object (compiled in with the ``CONFIG_IOMMUFD`` option).
92
93Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
94be linked with an iommufd object. It gets a new optional property
95named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
96device for example:
97
98.. code-block:: bash
99
100    -object iommufd,id=iommufd0
101    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
102
103Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
104management layer. In such a case the fd is passed, the fd supports a
105string naming the fd or a number, for example:
106
107.. code-block:: bash
108
109    -object iommufd,id=iommufd0,fd=22
110    -device vfio-pci,iommufd=iommufd0,fd=23
111
112If the ``fd`` property is not passed, the fd is opened by QEMU.
113
114If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
115is not used and the user gets the behavior based on the legacy VFIO
116container:
117
118.. code-block:: bash
119
120    -device vfio-pci,host=0000:02:00.0
121
122Supported platform
123==================
124
125Supports x86, ARM and s390x currently.
126
127Caveats
128=======
129
130Dirty page sync
131---------------
132
133Dirty page sync with iommufd backend is unsupported yet, live migration is
134disabled by default. But it can be force enabled like below, low efficient
135though.
136
137.. code-block:: bash
138
139    -object iommufd,id=iommufd0
140    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
141
142P2P DMA
143-------
144
145PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
146BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
147
148.. code-block:: none
149
150    qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
151    qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
152
153FD passing with mdev
154--------------------
155
156``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
157If FD passing is used, there is no way to know that and the mdev is treated
158like a real PCI device. There is an error as below if user wants to enable
159RAM discarding for mdev.
160
161.. code-block:: none
162
163    qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
164
165``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
166devices are always mdev and RAM discarding is force enabled.
167