1Compute Express Link (CXL) 2========================== 3From the view of a single host, CXL is an interconnect standard that 4targets accelerators and memory devices attached to a CXL host. 5This description will focus on those aspects visible either to 6software running on a QEMU emulated host or to the internals of 7functional emulation. As such, it will skip over many of the 8electrical and protocol elements that would be more of interest 9for real hardware and will dominate more general introductions to CXL. 10It will also completely ignore the fabric management aspects of CXL 11by considering only a single host and a static configuration. 12 13CXL shares many concepts and much of the infrastructure of PCI Express, 14with CXL Host Bridges, which have CXL Root Ports which may be directly 15attached to CXL or PCI End Points. Alternatively there may be CXL Switches 16with CXL and PCI Endpoints attached below them. In many cases additional 17control and capabilities are exposed via PCI Express interfaces. 18This sharing of interfaces and hence emulation code is is reflected 19in how the devices are emulated in QEMU. In most cases the various 20CXL elements are built upon an equivalent PCIe devices. 21 22CXL devices support the following interfaces: 23 24* Most conventional PCIe interfaces 25 26 - Configuration space access 27 - BAR mapped memory accesses used for registers and mailboxes. 28 - MSI/MSI-X 29 - AER 30 - DOE mailboxes 31 - IDE 32 - Many other PCI express defined interfaces.. 33 34* Memory operations 35 36 - Equivalent of accessing DRAM / NVDIMMs. Any access / feature 37 supported by the host for normal memory should also work for 38 CXL attached memory devices. 39 40* Cache operations. The are mostly irrelevant to QEMU emulation as 41 QEMU is not emulating a coherency protocol. Any emulation related 42 to these will be device specific and is out of the scope of this 43 document. 44 45CXL 2.0 Device Types 46-------------------- 47CXL 2.0 End Points are often categorized into three types. 48 49**Type 1:** These support coherent caching of host memory. Example might 50be a crypto accelerators. May also have device private memory accessible 51via means such as PCI memory reads and writes to BARs. 52 53**Type 2:** These support coherent caching of host memory and host 54managed device memory (HDM) for which the coherency protocol is managed 55by the host. This is a complex topic, so for more information on CXL 56coherency see the CXL 2.0 specification. 57 58**Type 3 Memory devices:** These devices act as a means of attaching 59additional memory (HDM) to a CXL host including both volatile and 60persistent memory. The CXL topology may support interleaving across a 61number of Type 3 memory devices using HDM Decoders in the host, host 62bridge, switch upstream port and endpoints. 63 64Scope of CXL emulation in QEMU 65------------------------------ 66The focus of CXL emulation is CXL revision 2.0 and later. Earlier CXL 67revisions defined a smaller set of features, leaving much of the control 68interface as implementation defined or device specific, making generic 69emulation challenging with host specific firmware being responsible 70for setup and the Endpoints being presented to operating systems 71as Root Complex Integrated End Points. CXL rev 2.0 looks a lot 72more like PCI Express, with fully specified discoverability 73of the CXL topology. 74 75CXL System components 76---------------------- 77A CXL system is made up a Host with a number of 'standard components' 78the control and capabilities of which are discoverable by system software 79using means described in the CXL 2.0 specification. 80 81CXL Fixed Memory Windows (CFMW) 82~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 83A CFMW consists of a particular range of Host Physical Address space 84which is routed to particular CXL Host Bridges. At time of generic 85software initialization it will have a particularly interleaving 86configuration and associated Quality of Serice Throtling Group (QTG). 87This information is available to system software, when making 88decisions about how to configure interleave across available CXL 89memory devices. It is provide as CFMW Structures (CFMWS) in 90the CXL Early Discovery Table, an ACPI table. 91 92Note: QTG 0 is the only one currently supported in QEMU. 93 94CXL Host Bridge (CXL HB) 95~~~~~~~~~~~~~~~~~~~~~~~~ 96A CXL host bridge is similar to the PCIe equivalent, but with a 97specification defined register interface called CXL Host Bridge 98Component Registers (CHBCR). The location of this CHBCR MMIO 99space is described to system software via a CXL Host Bridge 100Structure (CHBS) in the CEDT ACPI table. The actual interfaces 101are identical to those used for other parts of the CXL heirarchy 102as CXL Component Registers in PCI BARs. 103 104Interfaces provided include: 105 106* Configuration of HDM Decoders to route CXL Memory accesses with 107 a particularly Host Physical Address range to the target port 108 below which the CXL device servicing that address lies. This 109 may be a mapping to a single Root Port (RP) or across a set of 110 target RPs. 111 112CXL Root Ports (CXL RP) 113~~~~~~~~~~~~~~~~~~~~~~~ 114A CXL Root Port servers te same purpose as a PCIe Root Port. 115There are a number of CXL specific Designated Vendor Specific 116Extended Capabilities (DVSEC) in PCIe Configuration Space 117and associated component register access via PCI bars. 118 119CXL Switch 120~~~~~~~~~~ 121Not yet implemented in QEMU. 122 123Here we consider a simple CXL switch with only a single 124virtual hierarchy. Whilst more complex devices exist, their 125visibility to a particular host is generally the same as for 126a simple switch design. Hosts often have no awareness 127of complex rerouting and device pooling, they simply see 128devices being hot added or hot removed. 129 130A CXL switch has a similar architecture to those in PCIe, 131with a single upstream port, internal PCI bus and multiple 132downstream ports. 133 134Both the CXL upstream and downstream ports have CXL specific 135DVSECs in configuration space, and component registers in PCI 136BARs. The Upstream Port has the configuration interfaces for 137the HDM decoders which route incoming memory accesses to the 138appropriate downstream port. 139 140CXL Memory Devices - Type 3 141~~~~~~~~~~~~~~~~~~~~~~~~~~~ 142CXL type 3 devices use a PCI class code and are intended to be supported 143by a generic operating system driver. They have HDM decoders 144though in these EP devices, the decoder is reponsible not for 145routing but for translation of the incoming host physical address (HPA) 146into a Device Physical Address (DPA). 147 148CXL Memory Interleave 149--------------------- 150To understand the interaction of different CXL hardware components which 151are emulated in QEMU, let us consider a memory read in a fully configured 152CXL topology. Note that system software is responsible for configuration 153of all components with the exception of the CFMWs. System software is 154responsible for allocating appropriate ranges from within the CFMWs 155and exposing those via normal memory configurations as would be done 156for system RAM. 157 158Example system Topology. x marks the match in each decoder level:: 159 160 |<------------------SYSTEM PHYSICAL ADDRESS MAP (1)----------------->| 161 | __________ __________________________________ __________ | 162 | | | | | | | | 163 | | CFMW 0 | | CXL Fixed Memory Window 1 | | CFMW 1 | | 164 | | HB0 only | | Configured to interleave memory | | HB1 only | | 165 | | | | memory accesses across HB0/HB1 | | | | 166 | |__________| |_____x____________________________| |__________| | 167 | | | | 168 | | | | 169 | | | | 170 | Interleave Decoder | | 171 | Matches this HB | | 172 \_____________| |_____________/ 173 __________|__________ _____|_______________ 174 | | | | 175 (2) | CXL HB 0 | | CXL HB 1 | 176 | HB IntLv Decoders | | HB IntLv Decoders | 177 | PCI/CXL Root Bus 0c | | PCI/CXL Root Bus 0d | 178 | | | | 179 |___x_________________| |_____________________| 180 | | | | 181 | | | | 182 A HB 0 HDM Decoder | | | 183 matches this Port | | | 184 | | | | 185 ___________|___ __________|__ __|_________ ___|_________ 186 (3)| Root Port 0 | | Root Port 1 | | Root Port 2| | Root Port 3 | 187 | Appears in | | Appears in | | Appears in | | Appear in | 188 | PCI topology | | PCI Topology| | PCI Topo | | PCI Topo | 189 | As 0c:00.0 | | as 0c:01.0 | | as de:00.0 | | as de:01.0 | 190 |_______________| |_____________| |____________| |_____________| 191 | | | | 192 | | | | 193 _____|_________ ______|______ ______|_____ ______|_______ 194 (4)| x | | | | | | | 195 | CXL Type3 0 | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 | 196 | | | | | | | | 197 | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...) | 198 | Decoder to go | | | | | | | 199 | from host PA | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0 | 200 | to device PA | | | | | | | 201 | PCI as 0d:00.0| | | | | | | 202 |_______________| |_____________| |____________| |______________| 203 204Notes: 205 206(1) **3 CXL Fixed Memory Windows (CFMW)** corresponding to different 207 ranges of the system physical address map. Each CFMW has 208 particular interleave setup across the CXL Host Bridges (HB) 209 CFMW0 provides uninterleaved access to HB0, CFW2 provides 210 uninterleaved acess to HB1. CFW1 provides interleaved memory access 211 across HB0 and HB1. 212 213(2) **Two CXL Host Bridges**. Each of these has 2 CXL Root Ports and 214 programmable HDM decoders to route memory accesses either to 215 a single port or interleave them across multiple ports. 216 A complex configuration here, might be to use the following HDM 217 decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence 218 part of CXL Type3 0. HDM1 routes CFMW0 requests from a 219 different region of the CFMW0 PA range to RP2 and hence part 220 of CXL Type 3 1. HDM2 routes yet another PA range from within 221 CFMW0 to be interleaved across RP0 and RP1, providing 2 way 222 interleave of part of the memory provided by CXL Type3 0 and 223 CXL Type 3 1. HDM3 routes those interleaved accesses from 224 CFMW1 that target HB0 to RP 0 and another part of the memory of 225 CXL Type 3 0 (as part of a 2 way interleave at the system level 226 across for example CXL Type3 0 and CXL Type3 2. 227 HDM4 is used to enable system wide 4 way interleave across all 228 the present CXL type3 devices, by interleaving those (interleaved) 229 requests that HB0 receives from from CFMW1 across RP 0 and 230 RP 1 and hence to yet more regions of the memory of the 231 attached Type3 devices. Note this is a representative subset 232 of the full range of possible HDM decoder configurations in this 233 topology. 234 235(3) **Four CXL Root Ports.** In this case the CXL Type 3 devices are 236 directly attached to these ports. 237 238(4) **Four CXL Type3 memory expansion devices.** These will each have 239 HDM decoders, but in this case rather than performing interleave 240 they will take the Host Physical Addresses of accesses and map 241 them to their own local Device Physical Address Space (DPA). 242 243Example command lines 244--------------------- 245A very simple setup with just one directly attached CXL Type 3 device:: 246 247 qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \ 248 ... 249 -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \ 250 -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \ 251 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ 252 -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ 253 -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ 254 -cxl-fixed-memory-window targets.0=cxl.1,size=4G 255 256A setup suitable for 4 way interleave. Only one fixed window provided, to enable 2 way 257interleave across 2 CXL host bridges. Each host bridge has 2 CXL Root Ports, with 258the CXL Type3 device directly attached (no switches).:: 259 260 qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max \ 261 ... 262 -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \ 263 -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \ 264 -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \ 265 -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M \ 266 -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \ 267 -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \ 268 -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \ 269 -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \ 270 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ 271 -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \ 272 -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ 273 -device cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ 274 -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \ 275 -device cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \ 276 -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \ 277 -device cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \ 278 -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \ 279 -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \ 280 -cxl-fixed-memory-window targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k 281 282Kernel Configuration Options 283---------------------------- 284 285In Linux 5.18 the followings options are necessary to make use of 286OS management of CXL memory devices as described here. 287 288* CONFIG_CXL_BUS 289* CONFIG_CXL_PCI 290* CONFIG_CXL_ACPI 291* CONFIG_CXL_PMEM 292* CONFIG_CXL_MEM 293* CONFIG_CXL_PORT 294* CONFIG_CXL_REGION 295 296References 297---------- 298 299 - Consortium website for specifications etc: 300 http://www.computeexpresslink.org 301 - Compute Express link Revision 2 specification, October 2020 302 - CEDT CFMWS & QTG _DSM ECN May 2021 303