1PCI EXPRESS GUIDELINES 2====================== 3 41. Introduction 5================ 6The doc proposes best practices on how to use PCI Express/PCI device 7in PCI Express based machines and explains the reasoning behind them. 8 9The following presentations accompany this document: 10 (1) Q35 overview. 11 https://wiki.qemu.org/images/4/4e/Q35.pdf 12 (2) A comparison between PCI and PCI Express technologies. 13 https://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf 14 15Note: The usage examples are not intended to replace the full 16documentation, please use QEMU help to retrieve all options. 17 182. Device placement strategy 19============================ 20QEMU does not have a clear socket-device matching mechanism 21and allows any PCI/PCI Express device to be plugged into any 22PCI/PCI Express slot. 23Plugging a PCI device into a PCI Express slot might not always work and 24is weird anyway since it cannot be done for "bare metal". 25Plugging a PCI Express device into a PCI slot will hide the Extended 26Configuration Space thus is also not recommended. 27 28The recommendation is to separate the PCI Express and PCI hierarchies. 29PCI Express devices should be plugged only into PCI Express Root Ports and 30PCI Express Downstream ports. 31 322.1 Root Bus (pcie.0) 33===================== 34Place only the following kinds of devices directly on the Root Complex: 35 (1) PCI Devices (e.g. network card, graphics card, IDE controller), 36 not controllers. Place only legacy PCI devices on 37 the Root Complex. These will be considered Integrated Endpoints. 38 Note: Integrated Endpoints are not hot-pluggable. 39 40 Although the PCI Express spec does not forbid PCI Express devices as 41 Integrated Endpoints, existing hardware mostly integrates legacy PCI 42 devices with the Root Complex. Guest OSes are suspected to behave 43 strangely when PCI Express devices are integrated 44 with the Root Complex. 45 46 (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express 47 hierarchies. 48 49 (3) PCI Express to PCI Bridge (pcie-pci-bridge), for starting legacy PCI 50 hierarchies. 51 52 (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses 53 are needed. 54 55 pcie.0 bus 56 ---------------------------------------------------------------------------- 57 | | | | 58 ----------- ------------------ ------------------- -------------- 59 | PCI Dev | | PCIe Root Port | | PCIe-PCI Bridge | | pxb-pcie | 60 ----------- ------------------ ------------------- -------------- 61 622.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use: 63 -device <dev>[,bus=pcie.0] 642.1.2 To expose a new PCI Express Root Bus use: 65 -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z] 66 PCI Express Root Ports and PCI Express to PCI bridges can be 67 connected to the pcie.1 bus: 68 -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ 69 -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1 70 71 722.2 PCI Express only hierarchy 73============================== 74Always use PCI Express Root Ports to start PCI Express hierarchies. 75 76A PCI Express Root bus supports up to 32 devices. Since each 77PCI Express Root Port is a function and a multi-function 78device may support up to 8 functions, the maximum possible 79number of PCI Express Root Ports per PCI Express Root Bus is 256. 80 81Prefer grouping PCI Express Root Ports into multi-function devices 82to keep a simple flat hierarchy that is enough for most scenarios. 83Only use PCI Express Switches (x3130-upstream, xio3130-downstream) 84if there is no more room for PCI Express Root Ports. 85Please see section 4. for further justifications. 86 87Plug only PCI Express devices into PCI Express Ports. 88 89 90 pcie.0 bus 91 ---------------------------------------------------------------------------------- 92 | | | 93 ------------- ------------- ------------- 94 | Root Port | | Root Port | | Root Port | 95 ------------ ------------- ------------- 96 | -------------------------|------------------------ 97 ------------ | ----------------- | 98 | PCIe Dev | | PCI Express | Upstream Port | | 99 ------------ | Switch ----------------- | 100 | | | | 101 | ------------------- ------------------- | 102 | | Downstream Port | | Downstream Port | | 103 | ------------------- ------------------- | 104 -------------|-----------------------|------------ 105 ------------ 106 | PCIe Dev | 107 ------------ 108 1092.2.1 Plugging a PCI Express device into a PCI Express Root Port: 110 -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ 111 -device <dev>,bus=root_port1 1122.2.2 Using multi-function PCI Express Root Ports: 113 -device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \ 114 -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \ 115 -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \ 1162.2.3 Plugging a PCI Express device into a Switch: 117 -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ 118 -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ 119 -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \ 120 -device <dev>,bus=downstream_port1 121 122Notes: 123 - (slot, chassis) pair is mandatory and must be unique for each 124 PCI Express Root Port. slot defaults to 0 when not specified. 125 - 'addr' parameter can be 0 for all the examples above. 126 127 1282.3 PCI only hierarchy 129====================== 130Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints, 131but, as mentioned in section 5, doing so means the legacy PCI 132device in question will be incapable of hot-unplugging. 133Besides that use PCI Express to PCI Bridges (pcie-pci-bridge) in 134combination with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies. 135 136Prefer flat hierarchies. For most scenarios a single PCI Express to PCI Bridge 137(having 32 slots) and several PCI-PCI Bridges attached to it 138(each supporting also 32 slots) will support hundreds of legacy devices. 139The recommendation is to populate one PCI-PCI Bridge under the 140PCI Express to PCI Bridge until is full and then plug a new PCI-PCI Bridge... 141 142 pcie.0 bus 143 ---------------------------------------------- 144 | | 145 ----------- ------------------- 146 | PCI Dev | | PCIe-PCI Bridge | 147 ----------- ------------------- 148 | | 149 ------------------ ------------------ 150 | PCI-PCI Bridge | | PCI-PCI Bridge | 151 ------------------ ------------------ 152 | | 153 ----------- ----------- 154 | PCI Dev | | PCI Dev | 155 ----------- ----------- 156 1572.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use: 158 -device <dev>[,bus=pcie.0] 1592.3.2 Plugging a PCI device into a PCI-PCI Bridge: 160 -device pcie-pci-bridge,id=pcie_pci_bridge1[,bus=pcie.0] \ 161 -device pci-bridge,id=pci_bridge1,bus=pcie_pci_bridge1[,chassis_nr=x][,addr=y] \ 162 -device <dev>,bus=pci_bridge1[,addr=x] 163 Note that 'addr' cannot be 0 unless shpc=off parameter is passed to 164 the PCI Bridge/PCI Express to PCI Bridge. 165 1663. IO space issues 167=================== 168The PCI Express Root Ports and PCI Express Downstream ports are seen by 169Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each 170such Port should be reserved a 4K IO range for, even though only one 171(multifunction) device can be plugged into each Port. This results in 172poor IO space utilization. 173 174The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations 175by not allocating IO space for each PCI Express Root / PCI Express 176Downstream port if: 177 (1) the port is empty, or 178 (2) the device behind the port has no IO BARs. 179 180The IO space is very limited, to 65536 byte-wide IO ports, and may even be 181fragmented by fixed IO ports owned by platform devices resulting in at most 18210 PCI Express Root Ports or PCI Express Downstream Ports per system 183if devices with IO BARs are used in the PCI Express hierarchy. Using the 184proposed device placing strategy solves this issue by using only 185PCI Express devices within PCI Express hierarchy. 186 187The PCI Express spec requires that PCI Express devices work properly 188without using IO ports. The PCI hierarchy has no such limitations. 189 190 1914. Bus numbers issues 192====================== 193Each PCI domain can have up to only 256 buses and the QEMU PCI Express 194machines do not support multiple PCI domains even if extra Root 195Complexes (pxb-pcie) are used. 196 197Each element of the PCI Express hierarchy (Root Complexes, 198PCI Express Root Ports, PCI Express Downstream/Upstream ports) 199uses one bus number. Since only one (multifunction) device 200can be attached to a PCI Express Root Port or PCI Express Downstream 201Port it is advised to plan in advance for the expected number of 202devices to prevent bus number starvation. 203 204Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI 205Express hierarchy) enables the hierarchy to not spend bus numbers on 206Upstream Ports. 207 208The bus_nr properties of the pxb-pcie devices partition the 0..255 bus 209number space. All bus numbers assigned to the buses recursively behind a 210given pxb-pcie device's root bus must fit between the bus_nr property of 211that pxb-pcie device, and the lowest of the higher bus_nr properties 212that the command line sets for other pxb-pcie devices. 213 214 2155. Hot-plug 216============ 217The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) 218do not support hot-plug, so any devices plugged into Root Complexes 219cannot be hot-plugged/hot-unplugged: 220 (1) PCI Express Integrated Endpoints 221 (2) PCI Express Root Ports 222 (3) PCI Express to PCI Bridges 223 (4) pxb-pcie 224 225Be aware that PCI Express Downstream Ports can't be hot-plugged into 226an existing PCI Express Upstream Port. 227 228PCI devices can be hot-plugged into PCI Express to PCI and PCI-PCI Bridges. 229The PCI hot-plug into PCI-PCI bridge is ACPI based, whereas hot-plug into 230PCI Express to PCI bridges is SHPC-based. They both can work side by side with 231the PCI Express native hot-plug. 232 233PCI Express devices can be natively hot-plugged/hot-unplugged into/from 234PCI Express Root Ports (and PCI Express Downstream Ports). 235 2365.1 Planning for hot-plug: 237 (1) PCI hierarchy 238 Leave enough PCI-PCI Bridge slots empty or add one 239 or more empty PCI-PCI Bridges to the PCI Express to PCI Bridge. 240 241 For each such PCI-PCI Bridge the Guest Firmware is expected to reserve 242 4K IO space and 2M MMIO range to be used for all devices behind it. 243 Appropriate PCI capability is designed, see pcie_pci_bridge.txt. 244 245 Because of the hard IO limit of around 10 PCI Bridges (~ 40K space) 246 per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the 247 Integrated Endpoints. (The PCI Express Hierarchy needs no IO space). 248 249 (2) PCI Express hierarchy: 250 Leave enough PCI Express Root Ports empty. Use multifunction 251 PCI Express Root Ports (up to 8 ports per pcie.0 slot) 252 on the Root Complex(es), for keeping the 253 hierarchy as flat as possible, thereby saving PCI bus numbers. 254 Don't use PCI Express Switches if you don't have 255 to, each one of those uses an extra PCI bus (for its Upstream Port) 256 that could be put to better use with another Root Port or Downstream 257 Port, which may come handy for hot-plugging another device. 258 259 2605.3 Hot-plug example: 261Using HMP: (add -monitor stdio to QEMU command line) 262 device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/> 263 264 2656. Device assignment 266==================== 267Host devices are mostly PCI Express and should be plugged only into 268PCI Express Root Ports or PCI Express Downstream Ports. 269PCI-PCI Bridge slots can be used for legacy PCI host devices. 270 2716.1 How to detect if a device is PCI Express: 272 > lspci -s 03:00.0 -v (as root) 273 274 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) 275 Subsystem: Intel Corporation Dual Band Wireless-AC 7260 276 Flags: bus master, fast devsel, latency 0, IRQ 50 277 Memory at f0400000 (64-bit, non-prefetchable) [size=8K] 278 Capabilities: [c8] Power Management version 3 279 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ 280 Capabilities: [40] Express Endpoint, MSI 00 281 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 282 Capabilities: [100] Advanced Error Reporting 283 Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 284 Capabilities: [14c] Latency Tolerance Reporting 285 Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 286 287If you can see the "Express Endpoint" capability in the 288output, then the device is indeed PCI Express. 289 290 2917. Virtio devices 292================= 293Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints 294will remain PCI and have transitional behaviour as default. 295Transitional virtio devices work in both IO and MMIO modes depending on 296the guest support. The Guest firmware will assign both IO and MMIO resources 297to transitional virtio devices. 298 299Virtio devices plugged into PCI Express ports are PCI Express devices and 300have "1.0" behavior by default without IO support. 301In both cases disable-legacy and disable-modern properties can be used 302to override the behaviour. 303 304Note that setting disable-legacy=off will enable legacy mode (enabling 305legacy behavior) for PCI Express virtio devices causing them to 306require IO space, which, given the limited available IO space, may quickly 307lead to resource exhaustion, and is therefore strongly discouraged. 308 309 3108. Conclusion 311============== 312The proposal offers a usage model that is easy to understand and follow 313and at the same time overcomes the PCI Express architecture limitations. 314