1============== 2NVMe Emulation 3============== 4 5QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and 6``nvme-subsys`` devices. 7 8See the following sections for specific information on 9 10 * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. 11 * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, 12 `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data 13 Protection`_, 14 15Adding NVMe Devices 16=================== 17 18Controller Emulation 19-------------------- 20 21The QEMU emulated NVMe controller implements version 1.4 of the NVM Express 22specification. All mandatory features are implement with a couple of exceptions 23and limitations: 24 25 * Accounting numbers in the SMART/Health log page are reset when the device 26 is power cycled. 27 * Interrupt Coalescing is not supported and is disabled by default. 28 29The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the 30following parameters: 31 32.. code-block:: console 33 34 -drive file=nvm.img,if=none,id=nvm 35 -device nvme,serial=deadbeef,drive=nvm 36 37There are a number of optional general parameters for the ``nvme`` device. Some 38are mentioned here, but see ``-device nvme,help`` to list all possible 39parameters. 40 41``max_ioqpairs=UINT32`` (default: ``64``) 42 Set the maximum number of allowed I/O queue pairs. This replaces the 43 deprecated ``num_queues`` parameter. 44 45``msix_qsize=UINT16`` (default: ``65``) 46 The number of MSI-X vectors that the device should support. 47 48``mdts=UINT8`` (default: ``7``) 49 Set the Maximum Data Transfer Size of the device. 50 51``use-intel-id`` (default: ``off``) 52 Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and 53 Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID 54 previously used. 55 56Additional Namespaces 57--------------------- 58 59In the simplest possible invocation sketched above, the device only support a 60single namespace with the namespace identifier ``1``. To support multiple 61namespaces and additional features, the ``nvme-ns`` device must be used. 62 63.. code-block:: console 64 65 -device nvme,id=nvme-ctrl-0,serial=deadbeef 66 -drive file=nvm-1.img,if=none,id=nvm-1 67 -device nvme-ns,drive=nvm-1 68 -drive file=nvm-2.img,if=none,id=nvm-2 69 -device nvme-ns,drive=nvm-2 70 71The namespaces defined by the ``nvme-ns`` device will attach to the most 72recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace 73identifiers are allocated automatically, starting from ``1``. 74 75There are a number of parameters available: 76 77``nsid`` (default: ``0``) 78 Explicitly set the namespace identifier. 79 80``uuid`` (default: *autogenerated*) 81 Set the UUID of the namespace. This will be reported as a "Namespace UUID" 82 descriptor in the Namespace Identification Descriptor List. 83 84``eui64`` 85 Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended 86 Unique Identifier" descriptor in the Namespace Identification Descriptor List. 87 Since machine type 6.1 a non-zero default value is used if the parameter 88 is not provided. For earlier machine types the field defaults to 0. 89 90``bus`` 91 If there are more ``nvme`` devices defined, this parameter may be used to 92 attach the namespace to a specific ``nvme`` device (identified by an ``id`` 93 parameter on the controller device). 94 95NVM Subsystems 96-------------- 97 98Additional features becomes available if the controller device (``nvme``) is 99linked to an NVM Subsystem device (``nvme-subsys``). 100 101The NVM Subsystem emulation allows features such as shared namespaces and 102multipath I/O. 103 104.. code-block:: console 105 106 -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0 107 -device nvme,serial=a,subsys=nvme-subsys-0 108 -device nvme,serial=b,subsys=nvme-subsys-0 109 110This will create an NVM subsystem with two controllers. Having controllers 111linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters: 112 113``shared`` (default: ``on`` since 6.2) 114 Specifies that the namespace will be attached to all controllers in the 115 subsystem. If set to ``off``, the namespace will remain a private namespace 116 and may only be attached to a single controller at a time. Shared namespaces 117 are always automatically attached to all controllers (also when controllers 118 are hotplugged). 119 120``detached`` (default: ``off``) 121 If set to ``on``, the namespace will be be available in the subsystem, but 122 not attached to any controllers initially. A shared namespace with this set 123 to ``on`` will never be automatically attached to controllers. 124 125Thus, adding 126 127.. code-block:: console 128 129 -drive file=nvm-1.img,if=none,id=nvm-1 130 -device nvme-ns,drive=nvm-1,nsid=1 131 -drive file=nvm-2.img,if=none,id=nvm-2 132 -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on 133 134will cause NSID 1 will be a shared namespace that is initially attached to both 135controllers. NSID 3 will be a private namespace due to ``shared=off`` and only 136attachable to a single controller at a time. Additionally it will not be 137attached to any controller initially (due to ``detached=on``) or to hotplugged 138controllers. 139 140Optional Features 141================= 142 143Controller Memory Buffer 144------------------------ 145 146``nvme`` device parameters related to the Controller Memory Buffer support: 147 148``cmb_size_mb=UINT32`` (default: ``0``) 149 This adds a Controller Memory Buffer of the given size at offset zero in BAR 150 2. 151 152``legacy-cmb`` (default: ``off``) 153 By default, the device uses the "v1.4 scheme" for the Controller Memory 154 Buffer support (i.e, the CMB is initially disabled and must be explicitly 155 enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the 156 CMB. 157 158Simple Copy 159----------- 160 161The device includes support for TP 4065 ("Simple Copy Command"). A number of 162additional ``nvme-ns`` device parameters may be used to control the Copy 163command limits: 164 165``mssrl=UINT16`` (default: ``128``) 166 Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum 167 number of logical blocks that may be specified in each source range. 168 169``mcl=UINT32`` (default: ``128``) 170 Set the Maximum Copy Length (``MCL``). This is the maximum number of logical 171 blocks that may be specified in a Copy command (the total for all source 172 ranges). 173 174``msrc=UINT8`` (default: ``127``) 175 Set the Maximum Source Range Count (``MSRC``). This is the maximum number of 176 source ranges that may be used in a Copy command. This is a 0's based value. 177 178Zoned Namespaces 179---------------- 180 181A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set 182``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace. 183 184The namespace may be configured with additional parameters 185 186``zoned.zone_size=SIZE`` (default: ``128MiB``) 187 Define the zone size (``ZSZE``). 188 189``zoned.zone_capacity=SIZE`` (default: ``0``) 190 Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone 191 capacity will equal the zone size. 192 193``zoned.descr_ext_size=UINT32`` (default: ``0``) 194 Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64 195 bytes. 196 197``zoned.cross_read=BOOL`` (default: ``off``) 198 Set to ``on`` to allow reads to cross zone boundaries. 199 200``zoned.max_active=UINT32`` (default: ``0``) 201 Set the maximum number of active resources (``MAR``). The default (``0``) 202 allows all zones to be active. 203 204``zoned.max_open=UINT32`` (default: ``0``) 205 Set the maximum number of open resources (``MOR``). The default (``0``) 206 allows all zones to be open. If ``zoned.max_active`` is specified, this value 207 must be less than or equal to that. 208 209``zoned.zasl=UINT8`` (default: ``0``) 210 Set the maximum data transfer size for the Zone Append command. Like 211 ``mdts``, the value is specified as a power of two (2^n) and is in units of 212 the minimum memory page size (CAP.MPSMIN). The default value (``0``) 213 has this property inherit the ``mdts`` value. 214 215Metadata 216-------- 217 218The virtual namespace device supports LBA metadata in the form separate 219metadata (``MPTR``-based) and extended LBAs. 220 221``ms=UINT16`` (default: ``0``) 222 Defines the number of metadata bytes per LBA. 223 224``mset=UINT8`` (default: ``0``) 225 Set to ``1`` to enable extended LBAs. 226 227End-to-End Data Protection 228-------------------------- 229 230The virtual namespace device supports DIF- and DIX-based protection information 231(depending on ``mset``). 232 233``pi=UINT8`` (default: ``0``) 234 Enable protection information of the specified type (type ``1``, ``2`` or 235 ``3``). 236 237``pil=UINT8`` (default: ``0``) 238 Controls the location of the protection information within the metadata. Set 239 to ``1`` to transfer protection information as the first eight bytes of 240 metadata. Otherwise, the protection information is transferred as the last 241 eight bytes. 242 243Virtualization Enhancements and SR-IOV (Experimental Support) 244------------------------------------------------------------- 245 246The ``nvme`` device supports Single Root I/O Virtualization and Sharing 247along with Virtualization Enhancements. The controller has to be linked to 248an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV. 249 250A number of parameters are present (**please note, that they may be 251subject to change**): 252 253``sriov_max_vfs`` (default: ``0``) 254 Indicates the maximum number of PCIe virtual functions supported 255 by the controller. Specifying a non-zero value enables reporting of both 256 SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities 257 by the NVMe device. Virtual function controllers will not report SR-IOV. 258 259``sriov_vq_flexible`` 260 Indicates the total number of flexible queue resources assignable to all 261 the secondary controllers. Implicitly sets the number of primary 262 controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``. 263 264``sriov_vi_flexible`` 265 Indicates the total number of flexible interrupt resources assignable to 266 all the secondary controllers. Implicitly sets the number of primary 267 controller's private resources to ``(msix_qsize - sriov_vi_flexible)``. 268 269``sriov_max_vi_per_vf`` (default: ``0``) 270 Indicates the maximum number of virtual interrupt resources assignable 271 to a secondary controller. The default ``0`` resolves to 272 ``(sriov_vi_flexible / sriov_max_vfs)`` 273 274``sriov_max_vq_per_vf`` (default: ``0``) 275 Indicates the maximum number of virtual queue resources assignable to 276 a secondary controller. The default ``0`` resolves to 277 ``(sriov_vq_flexible / sriov_max_vfs)`` 278 279The simplest possible invocation enables the capability to set up one VF 280controller and assign an admin queue, an IO queue, and a MSI-X interrupt. 281 282.. code-block:: console 283 284 -device nvme-subsys,id=subsys0 285 -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1, 286 sriov_vq_flexible=2,sriov_vi_flexible=1 287 288The minimum steps required to configure a functional NVMe secondary 289controller are: 290 291 * unbind flexible resources from the primary controller 292 293.. code-block:: console 294 295 nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 296 nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 297 298 * perform a Function Level Reset on the primary controller to actually 299 release the resources 300 301.. code-block:: console 302 303 echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset 304 305 * enable VF 306 307.. code-block:: console 308 309 echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs 310 311 * assign the flexible resources to the VF and set it ONLINE 312 313.. code-block:: console 314 315 nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 316 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 317 nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 318 319 * bind the NVMe driver to the VF 320 321.. code-block:: console 322 323 echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind