xref: /openbmc/qemu/docs/devel/vfio-iommufd.rst (revision dd7d3e35401f80ffef4e209fa9e27db9087501b0)
1*98dad2b0SZhenzhong Duan===============================
2*98dad2b0SZhenzhong DuanIOMMUFD BACKEND usage with VFIO
3*98dad2b0SZhenzhong Duan===============================
4*98dad2b0SZhenzhong Duan
5*98dad2b0SZhenzhong Duan(Same meaning for backend/container/BE)
6*98dad2b0SZhenzhong Duan
7*98dad2b0SZhenzhong DuanWith the introduction of iommufd, the Linux kernel provides a generic
8*98dad2b0SZhenzhong Duaninterface for user space drivers to propagate their DMA mappings to kernel
9*98dad2b0SZhenzhong Duanfor assigned devices. While the legacy kernel interface is group-centric,
10*98dad2b0SZhenzhong Duanthe new iommufd interface is device-centric, relying on device fd and iommufd.
11*98dad2b0SZhenzhong Duan
12*98dad2b0SZhenzhong DuanTo support both interfaces in the QEMU VFIO device, introduce a base container
13*98dad2b0SZhenzhong Duanto abstract the common part of VFIO legacy and iommufd container. So that the
14*98dad2b0SZhenzhong Duangeneric VFIO code can use either container.
15*98dad2b0SZhenzhong Duan
16*98dad2b0SZhenzhong DuanThe base container implements generic functions such as memory_listener and
17*98dad2b0SZhenzhong Duanaddress space management whereas the derived container implements callbacks
18*98dad2b0SZhenzhong Duanspecific to either legacy or iommufd. Each container has its own way to setup
19*98dad2b0SZhenzhong Duansecure context and dma management interface. The below diagram shows how it
20*98dad2b0SZhenzhong Duanlooks like with both containers.
21*98dad2b0SZhenzhong Duan
22*98dad2b0SZhenzhong Duan::
23*98dad2b0SZhenzhong Duan
24*98dad2b0SZhenzhong Duan                      VFIO                           AddressSpace/Memory
25*98dad2b0SZhenzhong Duan      +-------+  +----------+  +-----+  +-----+
26*98dad2b0SZhenzhong Duan      |  pci  |  | platform |  |  ap |  | ccw |
27*98dad2b0SZhenzhong Duan      +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
28*98dad2b0SZhenzhong Duan          |           |           |        |        |   AddressSpace       |
29*98dad2b0SZhenzhong Duan          |           |           |        |        +------------+---------+
30*98dad2b0SZhenzhong Duan      +---V-----------V-----------V--------V----+               /
31*98dad2b0SZhenzhong Duan      |           VFIOAddressSpace              | <------------+
32*98dad2b0SZhenzhong Duan      |                  |                      |  MemoryListener
33*98dad2b0SZhenzhong Duan      |        VFIOContainerBase list           |
34*98dad2b0SZhenzhong Duan      +-------+----------------------------+----+
35*98dad2b0SZhenzhong Duan              |                            |
36*98dad2b0SZhenzhong Duan              |                            |
37*98dad2b0SZhenzhong Duan      +-------V------+            +--------V----------+
38*98dad2b0SZhenzhong Duan      |   iommufd    |            |    vfio legacy    |
39*98dad2b0SZhenzhong Duan      |  container   |            |     container     |
40*98dad2b0SZhenzhong Duan      +-------+------+            +--------+----------+
41*98dad2b0SZhenzhong Duan              |                            |
42*98dad2b0SZhenzhong Duan              | /dev/iommu                 | /dev/vfio/vfio
43*98dad2b0SZhenzhong Duan              | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
44*98dad2b0SZhenzhong Duan  Userspace   |                            |
45*98dad2b0SZhenzhong Duan  ============+============================+===========================
46*98dad2b0SZhenzhong Duan  Kernel      |  device fd                 |
47*98dad2b0SZhenzhong Duan              +---------------+            | group/container fd
48*98dad2b0SZhenzhong Duan              | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
49*98dad2b0SZhenzhong Duan              |  ATTACH_IOAS) |            | device fd
50*98dad2b0SZhenzhong Duan              |               |            |
51*98dad2b0SZhenzhong Duan              |       +-------V------------V-----------------+
52*98dad2b0SZhenzhong Duan      iommufd |       |                vfio                  |
53*98dad2b0SZhenzhong Duan  (map/unmap  |       +---------+--------------------+-------+
54*98dad2b0SZhenzhong Duan  ioas_copy)  |                 |                    | map/unmap
55*98dad2b0SZhenzhong Duan              |                 |                    |
56*98dad2b0SZhenzhong Duan       +------V------+    +-----V------+      +------V--------+
57*98dad2b0SZhenzhong Duan       | iommfd core |    |  device    |      |  vfio iommu   |
58*98dad2b0SZhenzhong Duan       +-------------+    +------------+      +---------------+
59*98dad2b0SZhenzhong Duan
60*98dad2b0SZhenzhong Duan* Secure Context setup
61*98dad2b0SZhenzhong Duan
62*98dad2b0SZhenzhong Duan  - iommufd BE: uses device fd and iommufd to setup secure context
63*98dad2b0SZhenzhong Duan    (bind_iommufd, attach_ioas)
64*98dad2b0SZhenzhong Duan  - vfio legacy BE: uses group fd and container fd to setup secure context
65*98dad2b0SZhenzhong Duan    (set_container, set_iommu)
66*98dad2b0SZhenzhong Duan
67*98dad2b0SZhenzhong Duan* Device access
68*98dad2b0SZhenzhong Duan
69*98dad2b0SZhenzhong Duan  - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
70*98dad2b0SZhenzhong Duan  - vfio legacy BE: device fd is retrieved from group fd ioctl
71*98dad2b0SZhenzhong Duan
72*98dad2b0SZhenzhong Duan* DMA Mapping flow
73*98dad2b0SZhenzhong Duan
74*98dad2b0SZhenzhong Duan  1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
75*98dad2b0SZhenzhong Duan  2. VFIO populates DMA map/unmap via the container BEs
76*98dad2b0SZhenzhong Duan     * iommufd BE: uses iommufd
77*98dad2b0SZhenzhong Duan     * vfio legacy BE: uses container fd
78*98dad2b0SZhenzhong Duan
79*98dad2b0SZhenzhong DuanExample configuration
80*98dad2b0SZhenzhong Duan=====================
81*98dad2b0SZhenzhong Duan
82*98dad2b0SZhenzhong DuanStep 1: configure the host device
83*98dad2b0SZhenzhong Duan---------------------------------
84*98dad2b0SZhenzhong Duan
85*98dad2b0SZhenzhong DuanIt's exactly same as the VFIO device with legacy VFIO container.
86*98dad2b0SZhenzhong Duan
87*98dad2b0SZhenzhong DuanStep 2: configure QEMU
88*98dad2b0SZhenzhong Duan----------------------
89*98dad2b0SZhenzhong Duan
90*98dad2b0SZhenzhong DuanInteractions with the ``/dev/iommu`` are abstracted by a new iommufd
91*98dad2b0SZhenzhong Duanobject (compiled in with the ``CONFIG_IOMMUFD`` option).
92*98dad2b0SZhenzhong Duan
93*98dad2b0SZhenzhong DuanAny QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
94*98dad2b0SZhenzhong Duanbe linked with an iommufd object. It gets a new optional property
95*98dad2b0SZhenzhong Duannamed iommufd which allows to pass an iommufd object. Take ``vfio-pci``
96*98dad2b0SZhenzhong Duandevice for example:
97*98dad2b0SZhenzhong Duan
98*98dad2b0SZhenzhong Duan.. code-block:: bash
99*98dad2b0SZhenzhong Duan
100*98dad2b0SZhenzhong Duan    -object iommufd,id=iommufd0
101*98dad2b0SZhenzhong Duan    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
102*98dad2b0SZhenzhong Duan
103*98dad2b0SZhenzhong DuanNote the ``/dev/iommu`` and VFIO cdev can be externally opened by a
104*98dad2b0SZhenzhong Duanmanagement layer. In such a case the fd is passed, the fd supports a
105*98dad2b0SZhenzhong Duanstring naming the fd or a number, for example:
106*98dad2b0SZhenzhong Duan
107*98dad2b0SZhenzhong Duan.. code-block:: bash
108*98dad2b0SZhenzhong Duan
109*98dad2b0SZhenzhong Duan    -object iommufd,id=iommufd0,fd=22
110*98dad2b0SZhenzhong Duan    -device vfio-pci,iommufd=iommufd0,fd=23
111*98dad2b0SZhenzhong Duan
112*98dad2b0SZhenzhong DuanIf the ``fd`` property is not passed, the fd is opened by QEMU.
113*98dad2b0SZhenzhong Duan
114*98dad2b0SZhenzhong DuanIf no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
115*98dad2b0SZhenzhong Duanis not used and the user gets the behavior based on the legacy VFIO
116*98dad2b0SZhenzhong Duancontainer:
117*98dad2b0SZhenzhong Duan
118*98dad2b0SZhenzhong Duan.. code-block:: bash
119*98dad2b0SZhenzhong Duan
120*98dad2b0SZhenzhong Duan    -device vfio-pci,host=0000:02:00.0
121*98dad2b0SZhenzhong Duan
122*98dad2b0SZhenzhong DuanSupported platform
123*98dad2b0SZhenzhong Duan==================
124*98dad2b0SZhenzhong Duan
125*98dad2b0SZhenzhong DuanSupports x86, ARM and s390x currently.
126*98dad2b0SZhenzhong Duan
127*98dad2b0SZhenzhong DuanCaveats
128*98dad2b0SZhenzhong Duan=======
129*98dad2b0SZhenzhong Duan
130*98dad2b0SZhenzhong DuanDirty page sync
131*98dad2b0SZhenzhong Duan---------------
132*98dad2b0SZhenzhong Duan
133*98dad2b0SZhenzhong DuanDirty page sync with iommufd backend is unsupported yet, live migration is
134*98dad2b0SZhenzhong Duandisabled by default. But it can be force enabled like below, low efficient
135*98dad2b0SZhenzhong Duanthough.
136*98dad2b0SZhenzhong Duan
137*98dad2b0SZhenzhong Duan.. code-block:: bash
138*98dad2b0SZhenzhong Duan
139*98dad2b0SZhenzhong Duan    -object iommufd,id=iommufd0
140*98dad2b0SZhenzhong Duan    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
141*98dad2b0SZhenzhong Duan
142*98dad2b0SZhenzhong DuanP2P DMA
143*98dad2b0SZhenzhong Duan-------
144*98dad2b0SZhenzhong Duan
145*98dad2b0SZhenzhong DuanPCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
146*98dad2b0SZhenzhong DuanBAR region yet. Below warning shows for assigned PCI device, it's not a bug.
147*98dad2b0SZhenzhong Duan
148*98dad2b0SZhenzhong Duan.. code-block:: none
149*98dad2b0SZhenzhong Duan
150*98dad2b0SZhenzhong Duan    qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
151*98dad2b0SZhenzhong Duan    qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
152*98dad2b0SZhenzhong Duan
153*98dad2b0SZhenzhong DuanFD passing with mdev
154*98dad2b0SZhenzhong Duan--------------------
155*98dad2b0SZhenzhong Duan
156*98dad2b0SZhenzhong Duan``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
157*98dad2b0SZhenzhong DuanIf FD passing is used, there is no way to know that and the mdev is treated
158*98dad2b0SZhenzhong Duanlike a real PCI device. There is an error as below if user wants to enable
159*98dad2b0SZhenzhong DuanRAM discarding for mdev.
160*98dad2b0SZhenzhong Duan
161*98dad2b0SZhenzhong Duan.. code-block:: none
162*98dad2b0SZhenzhong Duan
163*98dad2b0SZhenzhong Duan    qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
164*98dad2b0SZhenzhong Duan
165*98dad2b0SZhenzhong Duan``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
166*98dad2b0SZhenzhong Duandevices are always mdev and RAM discarding is force enabled.
167