1=============================== 2IOMMUFD BACKEND usage with VFIO 3=============================== 4 5(Same meaning for backend/container/BE) 6 7With the introduction of iommufd, the Linux kernel provides a generic 8interface for user space drivers to propagate their DMA mappings to kernel 9for assigned devices. While the legacy kernel interface is group-centric, 10the new iommufd interface is device-centric, relying on device fd and iommufd. 11 12To support both interfaces in the QEMU VFIO device, introduce a base container 13to abstract the common part of VFIO legacy and iommufd container. So that the 14generic VFIO code can use either container. 15 16The base container implements generic functions such as memory_listener and 17address space management whereas the derived container implements callbacks 18specific to either legacy or iommufd. Each container has its own way to setup 19secure context and dma management interface. The below diagram shows how it 20looks like with both containers. 21 22:: 23 24 VFIO AddressSpace/Memory 25 +-------+ +----------+ +-----+ +-----+ 26 | pci | | platform | | ap | | ccw | 27 +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ 28 | | | | | AddressSpace | 29 | | | | +------------+---------+ 30 +---V-----------V-----------V--------V----+ / 31 | VFIOAddressSpace | <------------+ 32 | | | MemoryListener 33 | VFIOContainerBase list | 34 +-------+----------------------------+----+ 35 | | 36 | | 37 +-------V------+ +--------V----------+ 38 | iommufd | | vfio legacy | 39 | container | | container | 40 +-------+------+ +--------+----------+ 41 | | 42 | /dev/iommu | /dev/vfio/vfio 43 | /dev/vfio/devices/vfioX | /dev/vfio/$group_id 44 Userspace | | 45 ============+============================+=========================== 46 Kernel | device fd | 47 +---------------+ | group/container fd 48 | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) 49 | ATTACH_IOAS) | | device fd 50 | | | 51 | +-------V------------V-----------------+ 52 iommufd | | vfio | 53 (map/unmap | +---------+--------------------+-------+ 54 ioas_copy) | | | map/unmap 55 | | | 56 +------V------+ +-----V------+ +------V--------+ 57 | iommfd core | | device | | vfio iommu | 58 +-------------+ +------------+ +---------------+ 59 60* Secure Context setup 61 62 - iommufd BE: uses device fd and iommufd to setup secure context 63 (bind_iommufd, attach_ioas) 64 - vfio legacy BE: uses group fd and container fd to setup secure context 65 (set_container, set_iommu) 66 67* Device access 68 69 - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX`` 70 - vfio legacy BE: device fd is retrieved from group fd ioctl 71 72* DMA Mapping flow 73 74 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener 75 2. VFIO populates DMA map/unmap via the container BEs 76 * iommufd BE: uses iommufd 77 * vfio legacy BE: uses container fd 78 79Example configuration 80===================== 81 82Step 1: configure the host device 83--------------------------------- 84 85It's exactly same as the VFIO device with legacy VFIO container. 86 87Step 2: configure QEMU 88---------------------- 89 90Interactions with the ``/dev/iommu`` are abstracted by a new iommufd 91object (compiled in with the ``CONFIG_IOMMUFD`` option). 92 93Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must 94be linked with an iommufd object. It gets a new optional property 95named iommufd which allows to pass an iommufd object. Take ``vfio-pci`` 96device for example: 97 98.. code-block:: bash 99 100 -object iommufd,id=iommufd0 101 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 102 103Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a 104management layer. In such a case the fd is passed, the fd supports a 105string naming the fd or a number, for example: 106 107.. code-block:: bash 108 109 -object iommufd,id=iommufd0,fd=22 110 -device vfio-pci,iommufd=iommufd0,fd=23 111 112If the ``fd`` property is not passed, the fd is opened by QEMU. 113 114If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd 115is not used and the user gets the behavior based on the legacy VFIO 116container: 117 118.. code-block:: bash 119 120 -device vfio-pci,host=0000:02:00.0 121 122Supported platform 123================== 124 125Supports x86, ARM and s390x currently. 126 127Caveats 128======= 129 130Dirty page sync 131--------------- 132 133Dirty page sync with iommufd backend is unsupported yet, live migration is 134disabled by default. But it can be force enabled like below, low efficient 135though. 136 137.. code-block:: bash 138 139 -object iommufd,id=iommufd0 140 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on 141 142P2P DMA 143------- 144 145PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI 146BAR region yet. Below warning shows for assigned PCI device, it's not a bug. 147 148.. code-block:: none 149 150 qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR? 151 qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address) 152 153FD passing with mdev 154-------------------- 155 156``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev. 157If FD passing is used, there is no way to know that and the mdev is treated 158like a real PCI device. There is an error as below if user wants to enable 159RAM discarding for mdev. 160 161.. code-block:: none 162 163 qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices 164 165``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend 166devices are always mdev and RAM discarding is force enabled. 167