xref: /openbmc/qemu/docs/igd-assign.txt (revision 0ae375ab08037a8ee6421c2f37678444c0e6337f)
1Intel Graphics Device (IGD) assignment with vfio-pci
2====================================================
3
4Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either
5serve as primary and exclusive graphics adapter, or used in combination with an
6emulated primary graphics device, depending on the config and guest driver
7support. However, IGD devices are not "clean" PCI devices, they use extra
8memory regions other than BARs. Special handling is required to make them work
9properly, including:
10
11* OpRegion for accessing Virtual BIOS Table (VBT) that contains display output
12  information.
13* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
14
15Certain guest software also depends on following conditions to work:
16(*-Required by)
17
18| Condition                                   | Linux | Windows | VBIOS | EFI GOP |
19|---------------------------------------------|-------|---------|-------|---------|
20| #1 IGD has a valid OpRegion containing VBT  |  * ^1 |    *    |   *   |    *    |
21| #2 VID/DID of LPC bridge at 00:1f.0 matches |       |         |   *   |    *    |
22| #3 IGD is assigned to BDF 00:02.0           |       |         |   *   |    *    |
23| #4 IGD has VGA controller device class      |       |         |   *   |    *    |
24| #5 Host's VGA ranges are mapped to IGD      |       |         |   *   |         |
25| #6 Guest has valid VBIOS or UEFI Option ROM |       |         |   *   |    *    |
26
27^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
28   use the VBT copied from host OpRegion to prevent incorrect configuration.
29
30For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
31guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
32
33For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge
34to guest. Currently this is only supported on i440fx machines as there is
35already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
36lead to unexpected behavior.
37
38For #3, "addr=2.0" assigns IGD to 00:02.0.
39
40For #4, the primary display must be set to IGD in host BIOS.
41
42For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
43
44For #6, ROM either provided via the ROM BAR or romfile= option is needed, this
45Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
46"Guest firmware" section.
47
48QEMU also provides a "Legacy" mode that implicitly enables full functionality
49on IGD, it is automatically enabled when
50* IGD generation is 6 to 9 (Sandy Bridge to Comet Lake)
51* IGD claims VGA cycles on host (IGD is VGA controller on host)
52* Machine type is i440fx
53* IGD is assigned to guest BDF 00:02.0
54* ROM BAR or romfile is present
55
56In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
57VGA range access, which is equivalent to:
58  x-igd-opregion=on,x-igd-lpc=on,x-vga=on
59
60By default, "Legacy" mode won't fail, it continues on error. User can set
61"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
62conditions above for legacy mode is met, and if any error occurs, QEMU will
63fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy
64mode.
65
66In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
67graphics devices should be removed, this can be done using "-nographic" or
68"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
69
70For either mode, depending on the host kernel, the i915 driver in the host
71may generate faults and errors upon re-binding to an IGD device after it
72has been assigned to a VM.  It's therefore generally recommended to prevent
73such driver binding unless the host driver is known to work well for this.
74There are numerous ways to do this, i915 can be blacklisted on the host,
75the driver_override option can be used to ensure that only vfio-pci can bind
76to the device on the host[2], virsh nodedev-detach can be used to bind the
77device to vfio drivers and then managed='no' set in the VM xml to prevent
78re-binding to i915, etc.  Also note that IGD is also typically the primary
79graphics in the host and special options may be required beyond simply
80blacklisting i915 or using pci-stub/vfio-pci to take ownership of IGD as a
81PCI class device.  Lower level drivers exist that may still claim the device.
82It may therefore be necessary to use kernel boot options video=vesafb:off or
83video=efifb:off (depending on host BIOS/UEFI) or these can be combined to
84a catch-all, video=vesafb:off,efifb:off.  Error messages such as:
85
86    Failed to mmap 0000:00:02.0 BAR <>. Performance may be slow
87
88are a good indicator that such a problem exists.  The host files /proc/iomem
89and /proc/ioports are often useful for identifying drivers consuming ranges
90of the device to cause such conflicts.
91
92Additionally, IGD device are known to generate small numbers of DMAR faults
93when initially assigned.  It is believed that this is simply the IGD attempting
94to access the reserved GTT space after reset, which it no longer has access to
95when accessed from userspace.  So long as the DMAR faults are small in number
96and most importantly, not ongoing, these are not an indication of an error.
97
98Additionally++, analog VGA output (as opposed to digital outputs like HDMI,
99DVI, or DisplayPort) may be unsupported in some use cases.  In the author's
100experience, even DP to VGA adapters can be troublesome while adapters between
101digital formats work well.
102
103
104Options
105=======
106* x-igd-opregion=[*on*|off]
107  Copy host IGD OpRegion and expose it to guest with fw_cfg
108
109* x-igd-lpc=[on|*off*]
110  Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
111
112* x-igd-legacy-mode=[on|off|*auto*]
113  Enable/Disable legacy mode
114
115* x-igd-gms=[hex, default 0]
116  Overriding DSM region size in GGC register, 0 means uses host value.
117  Use this only when the DSM size cannot be changed through the
118  'DVMT Pre-Allocated' option in host BIOS.
119
120
121Examples
122========
123* Adding IGD with automatically legacy mode support
124  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
125
126* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
127  (For UEFI guests)
128  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-lpc=on,romfile=efi_oprom.rom
129
130
131Guest firmware
132==============
133Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
134Memory (BDSM) in guest address space. IGD passthrough support imposes two
135fw_cfg requirements on the VM firmware:
136
1371) "etc/igd-opregion"
138
139   This fw_cfg file exposes the OpRegion for the IGD device.  A reserved
140   region should be created below 4GB (recommended 4KB alignment), sized
141   sufficient for the fw_cfg file size, and the content of this file copied
142   to it.  The dword based address of this reserved memory region must also
143   be written to the ASLS register at offset 0xFC on the IGD device.  It is
144   recommended that firmware should make use of this fw_cfg entry for any
145   PCI class VGA device with Intel vendor ID.  Multiple of such devices
146   within a VM is undefined.
147
1482) "etc/igd-bdsm-size"
149
150   This fw_cfg file contains an 8-byte, little endian integer indicating
151   the size of the reserved memory region required for IGD stolen memory.
152   Firmware must allocate a reserved memory below 4GB with required 1MB
153   alignment equal to this size.  Additionally the base address of this
154   reserved region must be written to the dword BDSM register in PCI config
155   space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
156   64-bit BDSM).  As this support is related to running the IGD ROM, which
157   has other dependencies on the device appearing at guest address 00:02.0,
158   it's expected that this fw_cfg file is only relevant to a single PCI
159   class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0.
160
161   Starting from Meteor Lake, IGD devices access stolen memory via its MMIO
162   BAR2 (LMEMBAR) and removed the BDSM register in config space. There is
163   no need for guest firmware to allocate data stolen memory in guest address
164   space and write it to BDSM register. Value of this fw_cfg file is 0 in
165   such case.
166
167Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
168However, the support is not accepted by upstream EDK2/OVMF. A recommended
169solution is to create a virtual OpRom with following DXE drivers:
170
171* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
172* IntelGopDriver: Closed-source Intel GOP driver
173* PlatformGopPolicy: Protocol required by IntelGopDriver
174
175IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD.
176
177The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
178with PlatformGopPolicy for industrial computing is at [4]. There is also an
179unofficially maintained version with newer Gen11+ device support at [5].
180You need to build them with EDK2.
181
182For the IntelGopDriver, Intel never released it to public. You may contact
183Intel support to get one as [4] said, if you are an Intel Premier Support
184customer, or you can try extracting it from your host firmware using
185"UEFI BIOS Updater"[6].
186
187Once you got all the required DXE drivers, a Option ROM can be generated with
188EfiRom utility in EDK2, using
189  EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
190  -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
191
192
193Known issues
194============
195When using OVMF as guest firmware, you may encounter the following warning:
196warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument)
197
198Solution:
199Set the host physical address bits to IOMMU address width using
200  -cpu host,host-phys-bits-limit=<IOMMU address width>
201Or in libvirt XML with
202  <cpu>
203    <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
204  </cpu>
205The IOMMU address width can be determined with
206  echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 ))
207Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details
208
209
210Memory View
211===========
212IGD has it own address space. To use system RAM as VRAM, a single-level page
213table named Global Graphics Translation Table (GTT) is used for the address
214translation. Each page table entry points a 4KB page. Illustration below shows
215the translation flow on IGD with 64-bit GTT PTEs.
216
217(PTE_SIZE == 8)                +-------------+---+
218                               |   Address   | V |  V: Valid Bit
219                               +-------------+---+
220                               | ...         |   |
221IGD:0x01ae9010           0xd740| 0x70ffc000  | 1 |  Mem:0x42ba3e010^
222-----------------------> 0xd748| 0x42ba3e000 | 1 +------------------>
223(addr >> 12) * PTE_SIZE  0xd750| 0x42ba3f000 | 1 |
224                               | ...         |   |
225                               +-------------+---+
226^ The address may be remapped by IOMMU
227
228The memory region store GTT is called GTT Stolen Memory (GSM) it is located
229right below the Data Stolen Memory (DSM). Accessing this region directly is
230not allowed, any access will immediately freeze the whole system. The only way
231to access it is through the second half of MMIO BAR0.
232
233The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS
234environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
235reserving a continuous region and program its base address to BDSM register,
236then let VBIOS/GOP driver initializing this region. Illustration below shows
237how DSM is mapped.
238
239       IGD Addr Space                 Host Addr Space         Guest Addr Space
240       +-------------+                +-------------+         +-------------+
241       |             |                |             |         |             |
242       |             |                |             |         |             |
243       |             |                +-------------+         +-------------+
244       |             |                | Data Stolen |         | Data Stolen |
245       |             |                |   (Guest)   |         |   (Guest)   |
246       |             |  +------------>+-------------+<------->+-------------+<--Guest BDSM
247       |             |  | Passthrough |             | EPT     |             |   Emulated by QEMU
248DSMSIZE+-------------+  | with IOMMU  |             | Mapping |             |   Programmed by guest FW
249       |             |  |             |             |         |             |
250       |             |  |             |             |         |             |
251      0+-------------+--+             |             |         |             |
252                        |             +-------------+         |             |
253                        |             | Data Stolen |         +-------------+
254                        |             |   (Host)    |
255                        +------------>+-------------+<--Host BDSM
256                          Non-        |             |   "real" one in HW
257                          Passthrough |             |   Programmed by host FW
258                                      +-------------+
259
260Footnotes
261=========
262[1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
263[2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
264[3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
265    Tianocore bugzilla was down since Jan 2025 :(
266[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004
267[5] https://github.com/tomitamoeko/VfioIgdPkg
268[6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357
269