1 PCI EXPRESS GUIDELINES 2 ====================== 3 4 1. Introduction 5 ================ 6 The doc proposes best practices on how to use PCI Express (PCIe) / PCI 7 devices in PCI Express based machines and explains the reasoning behind 8 them. 9 10 Note that the PCIe features are available only when using the 'q35' 11 machine type on x86 architecture and the 'virt' machine type on AArch64. 12 Other machine types do not use PCIe at this time. 13 14 The following presentations accompany this document: 15 (1) Q35 overview. 16 https://wiki.qemu.org/images/4/4e/Q35.pdf 17 (2) A comparison between PCI and PCI Express technologies. 18 https://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf 19 20 Note: The usage examples are not intended to replace the full 21 documentation, please use QEMU help to retrieve all options. 22 23 2. Device placement strategy 24 ============================ 25 QEMU does not have a clear socket-device matching mechanism 26 and allows any PCI/PCI Express device to be plugged into any 27 PCI/PCI Express slot. 28 Plugging a PCI device into a PCI Express slot might not always work and 29 is weird anyway since it cannot be done for "bare metal". 30 Plugging a PCI Express device into a PCI slot will hide the Extended 31 Configuration Space thus is also not recommended. 32 33 The recommendation is to separate the PCI Express and PCI hierarchies. 34 PCI Express devices should be plugged only into PCI Express Root Ports and 35 PCI Express Downstream ports. 36 37 2.1 Root Bus (pcie.0) 38 ===================== 39 Place only the following kinds of devices directly on the Root Complex: 40 (1) PCI Devices (e.g. network card, graphics card, IDE controller), 41 not controllers. Place only legacy PCI devices on 42 the Root Complex. These will be considered Integrated Endpoints. 43 Note: Integrated Endpoints are not hot-pluggable. 44 45 Although the PCI Express spec does not forbid PCI Express devices as 46 Integrated Endpoints, existing hardware mostly integrates legacy PCI 47 devices with the Root Complex. Guest OSes are suspected to behave 48 strangely when PCI Express devices are integrated 49 with the Root Complex. 50 51 (2) PCI Express Root Ports (pcie-root-port), for starting exclusively 52 PCI Express hierarchies. 53 54 (3) PCI Express to PCI Bridge (pcie-pci-bridge), for starting legacy PCI 55 hierarchies. 56 57 (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses 58 are needed. 59 60 pcie.0 bus 61 ---------------------------------------------------------------------------- 62 | | | | 63 ----------- ------------------ ------------------- -------------- 64 | PCI Dev | | PCIe Root Port | | PCIe-PCI Bridge | | pxb-pcie | 65 ----------- ------------------ ------------------- -------------- 66 67 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use: 68 -device <dev>[,bus=pcie.0] 69 2.1.2 To expose a new PCI Express Root Bus use: 70 -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z] 71 PCI Express Root Ports and PCI Express to PCI bridges can be 72 connected to the pcie.1 bus: 73 -device pcie-root-port,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ 74 -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1 75 76 77 2.2 PCI Express only hierarchy 78 ============================== 79 Always use PCI Express Root Ports to start PCI Express hierarchies. 80 81 A PCI Express Root bus supports up to 32 devices. Since each 82 PCI Express Root Port is a function and a multi-function 83 device may support up to 8 functions, the maximum possible 84 number of PCI Express Root Ports per PCI Express Root Bus is 256. 85 86 Prefer grouping PCI Express Root Ports into multi-function devices 87 to keep a simple flat hierarchy that is enough for most scenarios. 88 Only use PCI Express Switches (x3130-upstream, xio3130-downstream) 89 if there is no more room for PCI Express Root Ports. 90 Please see section 4. for further justifications. 91 92 Plug only PCI Express devices into PCI Express Ports. 93 94 95 pcie.0 bus 96 ---------------------------------------------------------------------------------- 97 | | | 98 ------------- ------------- ------------- 99 | Root Port | | Root Port | | Root Port | 100 ------------ ------------- ------------- 101 | -------------------------|------------------------ 102 ------------ | ----------------- | 103 | PCIe Dev | | PCI Express | Upstream Port | | 104 ------------ | Switch ----------------- | 105 | | | | 106 | ------------------- ------------------- | 107 | | Downstream Port | | Downstream Port | | 108 | ------------------- ------------------- | 109 -------------|-----------------------|------------ 110 ------------ 111 | PCIe Dev | 112 ------------ 113 114 2.2.1 Plugging a PCI Express device into a PCI Express Root Port: 115 -device pcie-root-port,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ 116 -device <dev>,bus=root_port1 117 2.2.2 Using multi-function PCI Express Root Ports: 118 -device pcie-root-port,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \ 119 -device pcie-root-port,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \ 120 -device pcie-root-port,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \ 121 2.2.3 Plugging a PCI Express device into a Switch: 122 -device pcie-root-port,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ 123 -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ 124 -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \ 125 -device <dev>,bus=downstream_port1 126 127 Notes: 128 - (slot, chassis) pair is mandatory and must be unique for each 129 PCI Express Root Port. slot defaults to 0 when not specified. 130 - 'addr' parameter can be 0 for all the examples above. 131 132 133 2.3 PCI only hierarchy 134 ====================== 135 Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints, 136 but, as mentioned in section 5, doing so means the legacy PCI 137 device in question will be incapable of hot-unplugging. 138 Besides that use PCI Express to PCI Bridges (pcie-pci-bridge) in 139 combination with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies. 140 141 Prefer flat hierarchies. For most scenarios a single PCI Express to PCI Bridge 142 (having 32 slots) and several PCI-PCI Bridges attached to it 143 (each supporting also 32 slots) will support hundreds of legacy devices. 144 The recommendation is to populate one PCI-PCI Bridge under the 145 PCI Express to PCI Bridge until is full and then plug a new PCI-PCI Bridge... 146 147 pcie.0 bus 148 ---------------------------------------------- 149 | | 150 ----------- ------------------- 151 | PCI Dev | | PCIe-PCI Bridge | 152 ----------- ------------------- 153 | | 154 ------------------ ------------------ 155 | PCI-PCI Bridge | | PCI-PCI Bridge | 156 ------------------ ------------------ 157 | | 158 ----------- ----------- 159 | PCI Dev | | PCI Dev | 160 ----------- ----------- 161 162 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use: 163 -device <dev>[,bus=pcie.0] 164 2.3.2 Plugging a PCI device into a PCI-PCI Bridge: 165 -device pcie-pci-bridge,id=pcie_pci_bridge1[,bus=pcie.0] \ 166 -device pci-bridge,id=pci_bridge1,bus=pcie_pci_bridge1[,chassis_nr=x][,addr=y] \ 167 -device <dev>,bus=pci_bridge1[,addr=x] 168 Note that 'addr' cannot be 0 unless shpc=off parameter is passed to 169 the PCI Bridge/PCI Express to PCI Bridge. 170 171 3. IO space issues 172 =================== 173 The PCI Express Root Ports and PCI Express Downstream ports are seen by 174 Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each 175 such Port should be reserved a 4K IO range for, even though only one 176 (multifunction) device can be plugged into each Port. This results in 177 poor IO space utilization. 178 179 The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations 180 by not allocating IO space for each PCI Express Root / PCI Express 181 Downstream port if: 182 (1) the port is empty, or 183 (2) the device behind the port has no IO BARs. 184 185 The IO space is very limited, to 65536 byte-wide IO ports, and may even be 186 fragmented by fixed IO ports owned by platform devices resulting in at most 187 10 PCI Express Root Ports or PCI Express Downstream Ports per system 188 if devices with IO BARs are used in the PCI Express hierarchy. Using the 189 proposed device placing strategy solves this issue by using only 190 PCI Express devices within PCI Express hierarchy. 191 192 The PCI Express spec requires that PCI Express devices work properly 193 without using IO ports. The PCI hierarchy has no such limitations. 194 195 196 4. Bus numbers issues 197 ====================== 198 Each PCI domain can have up to only 256 buses and the QEMU PCI Express 199 machines do not support multiple PCI domains even if extra Root 200 Complexes (pxb-pcie) are used. 201 202 Each element of the PCI Express hierarchy (Root Complexes, 203 PCI Express Root Ports, PCI Express Downstream/Upstream ports) 204 uses one bus number. Since only one (multifunction) device 205 can be attached to a PCI Express Root Port or PCI Express Downstream 206 Port it is advised to plan in advance for the expected number of 207 devices to prevent bus number starvation. 208 209 Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI 210 Express hierarchy) enables the hierarchy to not spend bus numbers on 211 Upstream Ports. 212 213 The bus_nr properties of the pxb-pcie devices partition the 0..255 bus 214 number space. All bus numbers assigned to the buses recursively behind a 215 given pxb-pcie device's root bus must fit between the bus_nr property of 216 that pxb-pcie device, and the lowest of the higher bus_nr properties 217 that the command line sets for other pxb-pcie devices. 218 219 220 5. Hot-plug 221 ============ 222 The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) 223 do not support hot-plug, so any devices plugged into Root Complexes 224 cannot be hot-plugged/hot-unplugged: 225 (1) PCI Express Integrated Endpoints 226 (2) PCI Express Root Ports 227 (3) PCI Express to PCI Bridges 228 (4) pxb-pcie 229 230 Be aware that PCI Express Downstream Ports can't be hot-plugged into 231 an existing PCI Express Upstream Port. 232 233 PCI devices can be hot-plugged into PCI Express to PCI and PCI-PCI Bridges. 234 The PCI hot-plug into PCI-PCI bridge is ACPI based, whereas hot-plug into 235 PCI Express to PCI bridges is SHPC-based. They both can work side by side with 236 the PCI Express native hot-plug. 237 238 PCI Express devices can be natively hot-plugged/hot-unplugged into/from 239 PCI Express Root Ports (and PCI Express Downstream Ports). 240 241 5.1 Planning for hot-plug: 242 (1) PCI hierarchy 243 Leave enough PCI-PCI Bridge slots empty or add one 244 or more empty PCI-PCI Bridges to the PCI Express to PCI Bridge. 245 246 For each such PCI-PCI Bridge the Guest Firmware is expected to reserve 247 4K IO space and 2M MMIO range to be used for all devices behind it. 248 Appropriate PCI capability is designed, see pcie_pci_bridge.txt. 249 250 Because of the hard IO limit of around 10 PCI Bridges (~ 40K space) 251 per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the 252 Integrated Endpoints. (The PCI Express Hierarchy needs no IO space). 253 254 (2) PCI Express hierarchy: 255 Leave enough PCI Express Root Ports empty. Use multifunction 256 PCI Express Root Ports (up to 8 ports per pcie.0 slot) 257 on the Root Complex(es), for keeping the 258 hierarchy as flat as possible, thereby saving PCI bus numbers. 259 Don't use PCI Express Switches if you don't have 260 to, each one of those uses an extra PCI bus (for its Upstream Port) 261 that could be put to better use with another Root Port or Downstream 262 Port, which may come handy for hot-plugging another device. 263 264 265 5.3 Hot-plug example: 266 Using HMP: (add -monitor stdio to QEMU command line) 267 device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/> 268 269 270 6. Device assignment 271 ==================== 272 Host devices are mostly PCI Express and should be plugged only into 273 PCI Express Root Ports or PCI Express Downstream Ports. 274 PCI-PCI Bridge slots can be used for legacy PCI host devices. 275 276 6.1 How to detect if a device is PCI Express: 277 > lspci -s 03:00.0 -v (as root) 278 279 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) 280 Subsystem: Intel Corporation Dual Band Wireless-AC 7260 281 Flags: bus master, fast devsel, latency 0, IRQ 50 282 Memory at f0400000 (64-bit, non-prefetchable) [size=8K] 283 Capabilities: [c8] Power Management version 3 284 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ 285 Capabilities: [40] Express Endpoint, MSI 00 286 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 287 Capabilities: [100] Advanced Error Reporting 288 Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 289 Capabilities: [14c] Latency Tolerance Reporting 290 Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 291 292 If you can see the "Express Endpoint" capability in the 293 output, then the device is indeed PCI Express. 294 295 296 7. Virtio devices 297 ================= 298 Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints 299 will remain PCI and have transitional behaviour as default. 300 Transitional virtio devices work in both IO and MMIO modes depending on 301 the guest support. The Guest firmware will assign both IO and MMIO resources 302 to transitional virtio devices. 303 304 Virtio devices plugged into PCI Express ports are PCI Express devices and 305 have "1.0" behavior by default without IO support. 306 In both cases disable-legacy and disable-modern properties can be used 307 to override the behaviour. 308 309 Note that setting disable-legacy=off will enable legacy mode (enabling 310 legacy behavior) for PCI Express virtio devices causing them to 311 require IO space, which, given the limited available IO space, may quickly 312 lead to resource exhaustion, and is therefore strongly discouraged. 313 314 315 8. Conclusion 316 ============== 317 The proposal offers a usage model that is easy to understand and follow 318 and at the same time overcomes the PCI Express architecture limitations. 319