==============
NVMe Emulation
==============

QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
``nvme-subsys`` devices.

See the following sections for specific information on

  * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
  * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
    `Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
    Protection`_,

Adding NVMe Devices
===================

Controller Emulation
--------------------

The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
specification. All mandatory features are implement with a couple of exceptions
and limitations:

  * Accounting numbers in the SMART/Health log page are reset when the device
    is power cycled.
  * Interrupt Coalescing is not supported and is disabled by default.

The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
following parameters:

.. code-block:: console

    -drive file=nvm.img,if=none,id=nvm
    -device nvme,serial=deadbeef,drive=nvm

There are a number of optional general parameters for the ``nvme`` device. Some
are mentioned here, but see ``-device nvme,help`` to list all possible
parameters.

``max_ioqpairs=UINT32`` (default: ``64``)
  Set the maximum number of allowed I/O queue pairs. This replaces the
  deprecated ``num_queues`` parameter.

``msix_qsize=UINT16`` (default: ``65``)
  The number of MSI-X vectors that the device should support.

``mdts=UINT8`` (default: ``7``)
  Set the Maximum Data Transfer Size of the device.

``use-intel-id`` (default: ``off``)
  Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
  Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
  previously used.

Additional Namespaces
---------------------

In the simplest possible invocation sketched above, the device only support a
single namespace with the namespace identifier ``1``. To support multiple
namespaces and additional features, the ``nvme-ns`` device must be used.

.. code-block:: console

   -device nvme,id=nvme-ctrl-0,serial=deadbeef
   -drive file=nvm-1.img,if=none,id=nvm-1
   -device nvme-ns,drive=nvm-1
   -drive file=nvm-2.img,if=none,id=nvm-2
   -device nvme-ns,drive=nvm-2

The namespaces defined by the ``nvme-ns`` device will attach to the most
recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
identifiers are allocated automatically, starting from ``1``.

There are a number of parameters available:

``nsid`` (default: ``0``)
  Explicitly set the namespace identifier.

``uuid`` (default: *autogenerated*)
  Set the UUID of the namespace. This will be reported as a "Namespace UUID"
  descriptor in the Namespace Identification Descriptor List.

``eui64``
  Set the EUI-64 of the namespace. This will be reported as a "IEEE Extended
  Unique Identifier" descriptor in the Namespace Identification Descriptor List.
  Since machine type 6.1 a non-zero default value is used if the parameter
  is not provided. For earlier machine types the field defaults to 0.

``bus``
  If there are more ``nvme`` devices defined, this parameter may be used to
  attach the namespace to a specific ``nvme`` device (identified by an ``id``
  parameter on the controller device).

NVM Subsystems
--------------

Additional features becomes available if the controller device (``nvme``) is
linked to an NVM Subsystem device (``nvme-subsys``).

The NVM Subsystem emulation allows features such as shared namespaces and
multipath I/O.

.. code-block:: console

   -device nvme-subsys,id=nvme-subsys-0,nqn=subsys0
   -device nvme,serial=a,subsys=nvme-subsys-0
   -device nvme,serial=b,subsys=nvme-subsys-0

This will create an NVM subsystem with two controllers. Having controllers
linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:

``shared`` (default: ``on`` since 6.2)
  Specifies that the namespace will be attached to all controllers in the
  subsystem. If set to ``off``, the namespace will remain a private namespace
  and may only be attached to a single controller at a time. Shared namespaces
  are always automatically attached to all controllers (also when controllers
  are hotplugged).

``detached`` (default: ``off``)
  If set to ``on``, the namespace will be be available in the subsystem, but
  not attached to any controllers initially. A shared namespace with this set
  to ``on`` will never be automatically attached to controllers.

Thus, adding

.. code-block:: console

   -drive file=nvm-1.img,if=none,id=nvm-1
   -device nvme-ns,drive=nvm-1,nsid=1
   -drive file=nvm-2.img,if=none,id=nvm-2
   -device nvme-ns,drive=nvm-2,nsid=3,shared=off,detached=on

will cause NSID 1 will be a shared namespace that is initially attached to both
controllers. NSID 3 will be a private namespace due to ``shared=off`` and only
attachable to a single controller at a time. Additionally it will not be
attached to any controller initially (due to ``detached=on``) or to hotplugged
controllers.

Optional Features
=================

Controller Memory Buffer
------------------------

``nvme`` device parameters related to the Controller Memory Buffer support:

``cmb_size_mb=UINT32`` (default: ``0``)
  This adds a Controller Memory Buffer of the given size at offset zero in BAR
  2.

``legacy-cmb`` (default: ``off``)
  By default, the device uses the "v1.4 scheme" for the Controller Memory
  Buffer support (i.e, the CMB is initially disabled and must be explicitly
  enabled by the host). Set this to ``on`` to behave as a v1.3 device wrt. the
  CMB.

Simple Copy
-----------

The device includes support for TP 4065 ("Simple Copy Command"). A number of
additional ``nvme-ns`` device parameters may be used to control the Copy
command limits:

``mssrl=UINT16`` (default: ``128``)
  Set the Maximum Single Source Range Length (``MSSRL``). This is the maximum
  number of logical blocks that may be specified in each source range.

``mcl=UINT32`` (default: ``128``)
  Set the Maximum Copy Length (``MCL``). This is the maximum number of logical
  blocks that may be specified in a Copy command (the total for all source
  ranges).

``msrc=UINT8`` (default: ``127``)
  Set the Maximum Source Range Count (``MSRC``). This is the maximum number of
  source ranges that may be used in a Copy command. This is a 0's based value.

Zoned Namespaces
----------------

A namespaces may be "Zoned" as defined by TP 4053 ("Zoned Namespaces"). Set
``zoned=on`` on an ``nvme-ns`` device to configure it as a zoned namespace.

The namespace may be configured with additional parameters

``zoned.zone_size=SIZE`` (default: ``128MiB``)
  Define the zone size (``ZSZE``).

``zoned.zone_capacity=SIZE`` (default: ``0``)
  Define the zone capacity (``ZCAP``). If left at the default (``0``), the zone
  capacity will equal the zone size.

``zoned.descr_ext_size=UINT32`` (default: ``0``)
  Set the Zone Descriptor Extension Size (``ZDES``). Must be a multiple of 64
  bytes.

``zoned.cross_read=BOOL`` (default: ``off``)
  Set to ``on`` to allow reads to cross zone boundaries.

``zoned.max_active=UINT32`` (default: ``0``)
  Set the maximum number of active resources (``MAR``). The default (``0``)
  allows all zones to be active.

``zoned.max_open=UINT32`` (default: ``0``)
  Set the maximum number of open resources (``MOR``). The default (``0``)
  allows all zones to be open. If ``zoned.max_active`` is specified, this value
  must be less than or equal to that.

``zoned.zasl=UINT8`` (default: ``0``)
  Set the maximum data transfer size for the Zone Append command. Like
  ``mdts``, the value is specified as a power of two (2^n) and is in units of
  the minimum memory page size (CAP.MPSMIN). The default value (``0``)
  has this property inherit the ``mdts`` value.

Metadata
--------

The virtual namespace device supports LBA metadata in the form separate
metadata (``MPTR``-based) and extended LBAs.

``ms=UINT16`` (default: ``0``)
  Defines the number of metadata bytes per LBA.

``mset=UINT8`` (default: ``0``)
  Set to ``1`` to enable extended LBAs.

End-to-End Data Protection
--------------------------

The virtual namespace device supports DIF- and DIX-based protection information
(depending on ``mset``).

``pi=UINT8`` (default: ``0``)
  Enable protection information of the specified type (type ``1``, ``2`` or
  ``3``).

``pil=UINT8`` (default: ``0``)
  Controls the location of the protection information within the metadata. Set
  to ``1`` to transfer protection information as the first eight bytes of
  metadata. Otherwise, the protection information is transferred as the last
  eight bytes.