1*98dad2b0SZhenzhong Duan=============================== 2*98dad2b0SZhenzhong DuanIOMMUFD BACKEND usage with VFIO 3*98dad2b0SZhenzhong Duan=============================== 4*98dad2b0SZhenzhong Duan 5*98dad2b0SZhenzhong Duan(Same meaning for backend/container/BE) 6*98dad2b0SZhenzhong Duan 7*98dad2b0SZhenzhong DuanWith the introduction of iommufd, the Linux kernel provides a generic 8*98dad2b0SZhenzhong Duaninterface for user space drivers to propagate their DMA mappings to kernel 9*98dad2b0SZhenzhong Duanfor assigned devices. While the legacy kernel interface is group-centric, 10*98dad2b0SZhenzhong Duanthe new iommufd interface is device-centric, relying on device fd and iommufd. 11*98dad2b0SZhenzhong Duan 12*98dad2b0SZhenzhong DuanTo support both interfaces in the QEMU VFIO device, introduce a base container 13*98dad2b0SZhenzhong Duanto abstract the common part of VFIO legacy and iommufd container. So that the 14*98dad2b0SZhenzhong Duangeneric VFIO code can use either container. 15*98dad2b0SZhenzhong Duan 16*98dad2b0SZhenzhong DuanThe base container implements generic functions such as memory_listener and 17*98dad2b0SZhenzhong Duanaddress space management whereas the derived container implements callbacks 18*98dad2b0SZhenzhong Duanspecific to either legacy or iommufd. Each container has its own way to setup 19*98dad2b0SZhenzhong Duansecure context and dma management interface. The below diagram shows how it 20*98dad2b0SZhenzhong Duanlooks like with both containers. 21*98dad2b0SZhenzhong Duan 22*98dad2b0SZhenzhong Duan:: 23*98dad2b0SZhenzhong Duan 24*98dad2b0SZhenzhong Duan VFIO AddressSpace/Memory 25*98dad2b0SZhenzhong Duan +-------+ +----------+ +-----+ +-----+ 26*98dad2b0SZhenzhong Duan | pci | | platform | | ap | | ccw | 27*98dad2b0SZhenzhong Duan +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ 28*98dad2b0SZhenzhong Duan | | | | | AddressSpace | 29*98dad2b0SZhenzhong Duan | | | | +------------+---------+ 30*98dad2b0SZhenzhong Duan +---V-----------V-----------V--------V----+ / 31*98dad2b0SZhenzhong Duan | VFIOAddressSpace | <------------+ 32*98dad2b0SZhenzhong Duan | | | MemoryListener 33*98dad2b0SZhenzhong Duan | VFIOContainerBase list | 34*98dad2b0SZhenzhong Duan +-------+----------------------------+----+ 35*98dad2b0SZhenzhong Duan | | 36*98dad2b0SZhenzhong Duan | | 37*98dad2b0SZhenzhong Duan +-------V------+ +--------V----------+ 38*98dad2b0SZhenzhong Duan | iommufd | | vfio legacy | 39*98dad2b0SZhenzhong Duan | container | | container | 40*98dad2b0SZhenzhong Duan +-------+------+ +--------+----------+ 41*98dad2b0SZhenzhong Duan | | 42*98dad2b0SZhenzhong Duan | /dev/iommu | /dev/vfio/vfio 43*98dad2b0SZhenzhong Duan | /dev/vfio/devices/vfioX | /dev/vfio/$group_id 44*98dad2b0SZhenzhong Duan Userspace | | 45*98dad2b0SZhenzhong Duan ============+============================+=========================== 46*98dad2b0SZhenzhong Duan Kernel | device fd | 47*98dad2b0SZhenzhong Duan +---------------+ | group/container fd 48*98dad2b0SZhenzhong Duan | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) 49*98dad2b0SZhenzhong Duan | ATTACH_IOAS) | | device fd 50*98dad2b0SZhenzhong Duan | | | 51*98dad2b0SZhenzhong Duan | +-------V------------V-----------------+ 52*98dad2b0SZhenzhong Duan iommufd | | vfio | 53*98dad2b0SZhenzhong Duan (map/unmap | +---------+--------------------+-------+ 54*98dad2b0SZhenzhong Duan ioas_copy) | | | map/unmap 55*98dad2b0SZhenzhong Duan | | | 56*98dad2b0SZhenzhong Duan +------V------+ +-----V------+ +------V--------+ 57*98dad2b0SZhenzhong Duan | iommfd core | | device | | vfio iommu | 58*98dad2b0SZhenzhong Duan +-------------+ +------------+ +---------------+ 59*98dad2b0SZhenzhong Duan 60*98dad2b0SZhenzhong Duan* Secure Context setup 61*98dad2b0SZhenzhong Duan 62*98dad2b0SZhenzhong Duan - iommufd BE: uses device fd and iommufd to setup secure context 63*98dad2b0SZhenzhong Duan (bind_iommufd, attach_ioas) 64*98dad2b0SZhenzhong Duan - vfio legacy BE: uses group fd and container fd to setup secure context 65*98dad2b0SZhenzhong Duan (set_container, set_iommu) 66*98dad2b0SZhenzhong Duan 67*98dad2b0SZhenzhong Duan* Device access 68*98dad2b0SZhenzhong Duan 69*98dad2b0SZhenzhong Duan - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX`` 70*98dad2b0SZhenzhong Duan - vfio legacy BE: device fd is retrieved from group fd ioctl 71*98dad2b0SZhenzhong Duan 72*98dad2b0SZhenzhong Duan* DMA Mapping flow 73*98dad2b0SZhenzhong Duan 74*98dad2b0SZhenzhong Duan 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener 75*98dad2b0SZhenzhong Duan 2. VFIO populates DMA map/unmap via the container BEs 76*98dad2b0SZhenzhong Duan * iommufd BE: uses iommufd 77*98dad2b0SZhenzhong Duan * vfio legacy BE: uses container fd 78*98dad2b0SZhenzhong Duan 79*98dad2b0SZhenzhong DuanExample configuration 80*98dad2b0SZhenzhong Duan===================== 81*98dad2b0SZhenzhong Duan 82*98dad2b0SZhenzhong DuanStep 1: configure the host device 83*98dad2b0SZhenzhong Duan--------------------------------- 84*98dad2b0SZhenzhong Duan 85*98dad2b0SZhenzhong DuanIt's exactly same as the VFIO device with legacy VFIO container. 86*98dad2b0SZhenzhong Duan 87*98dad2b0SZhenzhong DuanStep 2: configure QEMU 88*98dad2b0SZhenzhong Duan---------------------- 89*98dad2b0SZhenzhong Duan 90*98dad2b0SZhenzhong DuanInteractions with the ``/dev/iommu`` are abstracted by a new iommufd 91*98dad2b0SZhenzhong Duanobject (compiled in with the ``CONFIG_IOMMUFD`` option). 92*98dad2b0SZhenzhong Duan 93*98dad2b0SZhenzhong DuanAny QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must 94*98dad2b0SZhenzhong Duanbe linked with an iommufd object. It gets a new optional property 95*98dad2b0SZhenzhong Duannamed iommufd which allows to pass an iommufd object. Take ``vfio-pci`` 96*98dad2b0SZhenzhong Duandevice for example: 97*98dad2b0SZhenzhong Duan 98*98dad2b0SZhenzhong Duan.. code-block:: bash 99*98dad2b0SZhenzhong Duan 100*98dad2b0SZhenzhong Duan -object iommufd,id=iommufd0 101*98dad2b0SZhenzhong Duan -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 102*98dad2b0SZhenzhong Duan 103*98dad2b0SZhenzhong DuanNote the ``/dev/iommu`` and VFIO cdev can be externally opened by a 104*98dad2b0SZhenzhong Duanmanagement layer. In such a case the fd is passed, the fd supports a 105*98dad2b0SZhenzhong Duanstring naming the fd or a number, for example: 106*98dad2b0SZhenzhong Duan 107*98dad2b0SZhenzhong Duan.. code-block:: bash 108*98dad2b0SZhenzhong Duan 109*98dad2b0SZhenzhong Duan -object iommufd,id=iommufd0,fd=22 110*98dad2b0SZhenzhong Duan -device vfio-pci,iommufd=iommufd0,fd=23 111*98dad2b0SZhenzhong Duan 112*98dad2b0SZhenzhong DuanIf the ``fd`` property is not passed, the fd is opened by QEMU. 113*98dad2b0SZhenzhong Duan 114*98dad2b0SZhenzhong DuanIf no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd 115*98dad2b0SZhenzhong Duanis not used and the user gets the behavior based on the legacy VFIO 116*98dad2b0SZhenzhong Duancontainer: 117*98dad2b0SZhenzhong Duan 118*98dad2b0SZhenzhong Duan.. code-block:: bash 119*98dad2b0SZhenzhong Duan 120*98dad2b0SZhenzhong Duan -device vfio-pci,host=0000:02:00.0 121*98dad2b0SZhenzhong Duan 122*98dad2b0SZhenzhong DuanSupported platform 123*98dad2b0SZhenzhong Duan================== 124*98dad2b0SZhenzhong Duan 125*98dad2b0SZhenzhong DuanSupports x86, ARM and s390x currently. 126*98dad2b0SZhenzhong Duan 127*98dad2b0SZhenzhong DuanCaveats 128*98dad2b0SZhenzhong Duan======= 129*98dad2b0SZhenzhong Duan 130*98dad2b0SZhenzhong DuanDirty page sync 131*98dad2b0SZhenzhong Duan--------------- 132*98dad2b0SZhenzhong Duan 133*98dad2b0SZhenzhong DuanDirty page sync with iommufd backend is unsupported yet, live migration is 134*98dad2b0SZhenzhong Duandisabled by default. But it can be force enabled like below, low efficient 135*98dad2b0SZhenzhong Duanthough. 136*98dad2b0SZhenzhong Duan 137*98dad2b0SZhenzhong Duan.. code-block:: bash 138*98dad2b0SZhenzhong Duan 139*98dad2b0SZhenzhong Duan -object iommufd,id=iommufd0 140*98dad2b0SZhenzhong Duan -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on 141*98dad2b0SZhenzhong Duan 142*98dad2b0SZhenzhong DuanP2P DMA 143*98dad2b0SZhenzhong Duan------- 144*98dad2b0SZhenzhong Duan 145*98dad2b0SZhenzhong DuanPCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI 146*98dad2b0SZhenzhong DuanBAR region yet. Below warning shows for assigned PCI device, it's not a bug. 147*98dad2b0SZhenzhong Duan 148*98dad2b0SZhenzhong Duan.. code-block:: none 149*98dad2b0SZhenzhong Duan 150*98dad2b0SZhenzhong Duan qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR? 151*98dad2b0SZhenzhong Duan qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address) 152*98dad2b0SZhenzhong Duan 153*98dad2b0SZhenzhong DuanFD passing with mdev 154*98dad2b0SZhenzhong Duan-------------------- 155*98dad2b0SZhenzhong Duan 156*98dad2b0SZhenzhong Duan``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev. 157*98dad2b0SZhenzhong DuanIf FD passing is used, there is no way to know that and the mdev is treated 158*98dad2b0SZhenzhong Duanlike a real PCI device. There is an error as below if user wants to enable 159*98dad2b0SZhenzhong DuanRAM discarding for mdev. 160*98dad2b0SZhenzhong Duan 161*98dad2b0SZhenzhong Duan.. code-block:: none 162*98dad2b0SZhenzhong Duan 163*98dad2b0SZhenzhong Duan qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices 164*98dad2b0SZhenzhong Duan 165*98dad2b0SZhenzhong Duan``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend 166*98dad2b0SZhenzhong Duandevices are always mdev and RAM discarding is force enabled. 167