Lines Matching +full:back +full:- +full:end

4 Vhost-user Protocol
11 version 2 or later. See the COPYING file in the top-level
26 The protocol defines 2 sides of the communication, *front-end* and
27 *back-end*. The *front-end* is the application that shares its virtqueues, in
28 our case QEMU. The *back-end* is the consumer of the virtqueues.
30 In the current implementation QEMU is the *front-end*, and the *back-end*
33 or a block device back-end processing read & write to a virtual
34 disk. In order to facilitate interoperability between various back-end
38 The *front-end* and *back-end* can be either a client (i.e. connecting) or
42 --------------------------------------
44 While vhost-user was initially developed targeting Linux, nowadays it
47 - A way for requesting shared memory represented by a file descriptor
51 - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
54 - Either eventfd or pipe/pipe2. On platforms where eventfd is not
55 available, QEMU will automatically fall back to pipe2 or, as a last
57 sending events by reading or writing (respectively) an 8-byte value
58 to the corresponding it. The 8-value itself has no meaning and
66 A vhost-user message consists of 3 header fields and a payload.
68 +---------+-------+------+---------+
70 +---------+-------+------+---------+
73 ------
75 :request: 32-bit type of the request
77 :flags: 32-bit bit field
79 - Lower 2 bits are the version (currently 0x01)
80 - Bit 2 is the reply flag - needs to be sent on each reply from the back-end
81 - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
84 :size: 32-bit size of the payload
87 -------
91 A single 64-bit integer
94 +-----+
96 +-----+
98 :u64: a 64-bit unsigned integer
103 +-------+-----+
105 +-------+-----+
107 :index: a 32-bit index
109 :num: a 32-bit number
114 +-------------+---------------------+
116 +-------------+---------------------+
118 :vring index: 32-bit index of the respective virtqueue
120 :index in avail ring: 32-bit value, of which currently only the lower 16
123 - Bits 0–15: Index of the next *Available Ring* descriptor that the
124 back-end will process. This is a free-running index that is not
126 - Bits 16–31: Reserved (set to zero)
131 +-------------+--------------------+
133 +-------------+--------------------+
135 :vring index: 32-bit index of the respective virtqueue
137 :descriptor indices: 32-bit value:
139 - Bits 0–14: Index of the next *Available Ring* descriptor that the
140 back-end will process. This is a free-running index that is not
142 - Bit 15: Driver (Available) Ring Wrap Counter
143 - Bits 16–30: Index of the entry in the *Used Ring* where the back-end
144 will place the next descriptor. This is a free-running index that
146 - Bit 31: Device (Used) Ring Wrap Counter
151 +-------+-------+------------+------+-----------+-----+
153 +-------+-------+------------+------+-----------+-----+
155 :index: a 32-bit vring index
157 :flags: a 32-bit vring flags
159 :descriptor: a 64-bit ring address of the vring descriptor table
161 :used: a 64-bit ring address of the vring used ring
163 :available: a 64-bit ring address of the vring available ring
165 :log: a 64-bit guest address for logging
175 +---------------+------+--------------+-------------+
177 +---------------+------+--------------+-------------+
179 :guest address: a 64-bit guest address of the region
181 :size: a 64-bit size
183 :user address: a 64-bit user address
185 :mmap offset: a 64-bit offset where region starts in the mapped memory
189 fields at the end.
191 +---------------+------+--------------+-------------+----------------+-------+
193 +---------------+------+--------------+-------------+----------------+-------+
195 :xen mmap flags: a 32-bit bit field
197 - Bit 0 is set for Xen foreign memory mapping.
198 - Bit 1 is set for Xen grant memory mapping.
199 - Bit 8 is set if the memory region can not be mapped in advance, and memory
201 back-end. The back-end shouldn't try to map the entire region at once, as the
202 front-end may not allow it. The back-end should rather map only the required
205 :domid: a 32-bit Xen hypervisor specific domain id.
210 +---------+--------+
212 +---------+--------+
214 :padding: 64-bit
221 +-------------+---------+---------+-----+---------+
223 +-------------+---------+---------+-----+---------+
225 :num regions: a 32-bit number of regions
227 :padding: 32-bit
234 +----------+------------+
236 +----------+------------+
238 :log size: a 64-bit size of area used for logging
240 :log offset: a 64-bit offset from start of supplied file descriptor where
247 +------+------+--------------+-------------------+------+
249 +------+------+--------------+-------------------+------+
251 :iova: a 64-bit I/O virtual address programmed by the guest
253 :size: a 64-bit size
255 :user address: a 64-bit user address
257 :permissions flags: an 8-bit value:
258 - 0: No access
259 - 1: Read access
260 - 2: Write access
261 - 3: Read/Write access
263 :type: an 8-bit IOTLB message type:
264 - 1: IOTLB miss
265 - 2: IOTLB update
266 - 3: IOTLB invalidate
267 - 4: IOTLB access fail
272 +--------+------+-------+---------+
274 +--------+------+-------+---------+
276 :offset: a 32-bit offset of virtio device's configuration space
278 :size: a 32-bit configuration space access size in bytes
280 :flags: a 32-bit value:
281 - 0: Vhost front-end messages used for writable fields
282 - 1: Vhost front-end messages used for live migration
290 +-----+------+--------+
292 +-----+------+--------+
294 :u64: a 64-bit integer contains vring index and flags
296 :size: a 64-bit size of this area
298 :offset: a 64-bit offset of this area from the start of the
304 +-----------+-------------+------------+------------+
306 +-----------+-------------+------------+------------+
308 :mmap size: a 64-bit size of area to track inflight I/O
310 :mmap offset: a 64-bit offset of this area from the start
313 :num queues: a 16-bit number of virtqueues
315 :queue size: a 16-bit size of virtqueues
320 +------+
322 +------+
324 :UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
325 two 16-bit values) are stored in big endian.
330 +--------------------+-----------------+
332 +--------------------+-----------------+
334 :transfer direction: a 32-bit enum, describing the direction in which
337 - 0: Save: Transfer the state from the back-end to the front-end,
339 - 1: Load: Transfer the state from the front-end to the back-end,
342 :migration phase: a 32-bit enum, describing the state in which the VM
345 - 0: Stopped (in the period after the transfer of memory-mapped
346 regions before switch-over to the destination): The VM guest is
347 stopped, and the vhost-user device is suspended (see
354 -----------
356 In QEMU the vhost-user message is implemented with the following struct:
380 The protocol for vhost-user is based on the existing implementation of
382 Unix domain socket implementing vhost-user have an equivalent ioctl to
385 The communication consists of the *front-end* sending message requests and
386 the *back-end* sending message replies. Most of the requests don't require
400 There are several messages that the front-end sends with file descriptors passed
414 If *front-end* is unable to send the full message or receives a wrong
418 If *back-end* detects some error such as incompatible features, it may also
422 allows full backwards compatibility on both front-end and back-end. As
423 older back-ends don't support negotiating protocol features, a feature
430 <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
434 This reserved feature bit was reused by the vhost-user protocol to add
435 vhost-user protocol feature negotiation in a backwards compatible
436 fashion. Old vhost-user front-end and back-end implementations continue to
437 work even though they are not aware of vhost-user protocol feature
441 -----------
445 * While a ring is stopped, the back-end must not process the ring at
450 * started and disabled: The back-end must process the ring without
452 in the disabled state the back-end must not supply any new RX packets,
455 * started and enabled: The back-end must process the ring normally, i.e.
458 Each ring is initialized in a stopped and disabled state. The back-end
461 ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
468 the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the
469 back-end must enable all rings immediately.
471 While processing the rings (whether they are enabled or not), the back-end
484 * not send any messages to the front-end,
485 * still process and reply to messages from the front-end.
488 ----------------------
490 Many devices have a fixed number of virtqueues. In this case the front-end
492 back-end.
495 number of virtqueues is chosen by the back-end. The number can depend on host
496 resource availability or back-end implementation details. Such devices are called
499 Multiple queue support allows the back-end to advertise the maximum number of
500 queues. This is treated as a protocol extension, hence the back-end has to
504 The max number of queues the back-end supports can be queried with message
505 ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
508 As all queues share one connection, the front-end uses a unique index for each
511 The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
512 vhost-user-net has historically automatically enabled the first queue pair.
514 Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
518 Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
523 ---------
525 During live migration, the front-end may need to track the modifications
526 the back-end makes to the memory mapped regions. The front-end should mark
530 To start/stop logging of data/used ring writes, the front-end may send
544 ``VHOST_USER_SET_LOG_BASE`` message when the back-end has
550 covers from address 0 to the maximum of guest regions. In pseudo-code,
570 ancillary data, it may be used to inform the front-end that the log has
577 In postcopy migration the back-end is started before all the memory has
579 accessing pages that have yet to be received. The back-end opens a
580 'userfault'-fd and registers the memory with it; this fd is then
581 passed back over to the front-end. The front-end services requests on the
584 back-end. The front-end indicates support for this via the
589 Migrating back-end state
593 back-end, called the source, to another back-end, called the
595 operation without requiring the driver to re-initialize the device at
599 Generally, the front-end is connected to a virtual machine guest (which
601 and destination, and therefore will have an implementation-specific
603 provides functionality to have the front-end include the back-end's
604 state in this transfer operation so the back-end does not need to
606 complete state, including vhost-user devices' states, contained within a
609 To do this, the back-end state is transferred from back-end to front-end
616 it from the back-end to the front-end. On the destination, the data
617 is loaded, transferring it from the front-end to the back-end.
620 after the transfer of memory-mapped regions before switch-over to the
626 The nature of the channel is implementation-defined, but it must
627 generally behave like a pipe: The writing end will write all the data it
628 has into it, signalling the end of data by closing its end. The reading
629 end must read all of this data (until encountering the end of file) and
632 * When saving, the writing end is the source back-end, and the reading
633 end is the source front-end. After reading the state data from the
634 channel, the source front-end must transfer it to the destination
635 front-end through an implementation-defined mechanism.
637 * When loading, the writing end is the destination front-end, and the
638 reading end is the destination back-end. After reading the state data
639 from the channel, the destination back-end must deserialize its
645 This driver will not perform any re-initialization steps, but continue
646 to use the device as if no migration had occurred. The vhost-user
647 front-end, however, will re-initialize the vhost state on the
649 to a vhost-user back-end: This includes, for example, setting up memory
655 front-end has seen all data transferred (when the transfer FD has been
657 verify that data transfer was successful in the back-end, too. The
658 back-end responds once it knows whether the transfer and processing was
662 -------------
664 The front-end sends a list of vhost memory regions to the back-end using the
686 -------------
689 front-end sends IOTLB entries update & invalidation by sending
690 ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
697 (3), the I/O virtual address and the size. On success, the back-end is
698 expected to reply with a zero payload, non-zero otherwise.
700 The back-end relies on the back-end communication channel (see :ref:`Back-end
703 requests to the front-end with a ``struct vhost_iotlb_msg`` as
708 the permissions flags. For synchronization purpose, the back-end may
709 rely on the reply-ack feature, so the front-end may send a reply when
710 operation is completed if the reply-ack feature is negotiated and
711 back-ends requests a reply. For miss events, completed operation means
712 either front-end sent an update message containing the IOTLB entry
713 containing requested address and permission, or front-end sent nothing if
716 The front-end isn't expected to take the initiative to send IOTLB update
717 messages, as the back-end sends IOTLB miss messages for the guest virtual
722 Back-end communication
723 ----------------------
725 An optional communication channel is provided if the back-end declares
727 back-end to make requests to the front-end.
731 A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
735 negotiated, back-end can send file descriptors (at most 8 descriptors in
736 each message) to front-end via ancillary data using this fd communication
740 ---------------------
742 To support reconnecting after restart or crash, back-end may need to
747 out-of-order because some entries which store the information of
750 this problem, the back-end need to allocate an extra buffer to store this
751 information of inflight descriptors and share it with front-end for
754 between front-end and back-end. And the format of this buffer is described
757 +---------------+---------------+-----+---------------+
759 +---------------+---------------+-----+---------------+
761 N is the number of available virtqueues. The back-end could get it from num
770 * Only available for head-descriptor. */
781 * Only available for head-descriptor. */
794 * The back-end could get it from queue size field of VhostUserInflight. */
811 #. Get the next available head-descriptor index from available ring, ``i``
821 1. Get corresponding used head-descriptor index, i
858 * Only available for head-descriptor. */
868 * Only available for head-descriptor. */
872 * Only available for head-descriptor. */
876 * Only available for head-descriptor. */
901 * The back-end could get it from queue size field of VhostUserInflight. */
960 1. Get corresponding used head-descriptor entry from descriptor ring,
993 back-end has submitted the buffer to guest driver before crash, so
994 it has to commit the in-progress update), set ``old_free_head``,
1000 (roll back any in-progress update)
1008 In-band notifications
1009 ---------------------
1012 have the kick, call and error (if used) signals done via in-band
1022 the former is necessary for getting a message channel from the back-end
1023 to the front-end, while the latter needs to be used with the in-band
1026 use case.) As it has no other way of signalling this error, the back-end
1028 ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
1032 -----------------
1057 Front-end message types
1058 -----------------------
1067 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
1079 back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
1095 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
1112 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
1122 as the front-end that owns of the session. This can be used on the *back-end*
1133 rings, but some back-ends interpreted it to also discard connection
1135 that back-ends either ignore this message, or use it to disable all
1144 Sets the memory map regions on the back-end so it can translate the
1151 regions to the front-end. The back-end must have mmap'd the regions but
1157 reply back to the list of mappings with an empty
1170 When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
1227 * For a split virtqueue, returns only the 16-bit next descriptor
1260 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1279 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1298 Bits (0-7) of the payload contain the vring index. Bit 8 is the
1313 Query how many queues the back-end supports.
1325 Signal the back-end to enable or disable corresponding vring.
1336 Ask vhost user back-end to broadcast a fake RARP to notify the migration
1344 back-end to construct and broadcast the fake RARP.
1360 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1361 respond with zero in case the specified MTU is valid, or non-zero
1370 Set the socket file descriptor for back-end initiated requests. It is passed
1377 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1378 respond with zero for success, non-zero otherwise.
1388 The front-end sends such requests to update and invalidate entries in the
1389 device IOTLB. The back-end has to acknowledge the request with sending
1390 zero as ``u64`` payload for success, non-zero otherwise.
1401 Set the endianness of a VQ for legacy devices. Little-endian is
1402 indicated with state.num set to 0 and big-endian is indicated with
1409 configuration (ie. before the front-end starts the VQ).
1418 submitted by the vhost-user front-end to fetch the contents of the
1419 virtio device configuration space, vhost-user back-end's payload size
1420 MUST match the front-end's request, vhost-user back-end uses zero length of
1421 payload to indicate an error to the vhost-user front-end. The vhost-user
1422 front-end may cache the contents to avoid repeated
1432 submitted by the vhost-user front-end when the Guest changes the virtio
1434 on the destination host. The vhost-user back-end must check the flags
1435 field, and back-ends MUST NOT accept SET_CONFIG for read-only
1444 Create a session for crypto operation. The back-end must return
1470 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
1471 advises back-end that a migration with postcopy enabled is underway,
1472 the back-end must open a userfaultfd for later use. Note that at this
1480 The front-end advises back-end that a transition to postcopy mode has
1481 happened. The back-end must ensure that shared memory is registered
1482 with userfaultfd to cause faulting of non-present pages.
1492 The front-end advises that postcopy migration has now completed. The back-end
1497 is sent at the end of the migration, after
1509 been successfully negotiated, this message is submitted by the front-end to
1510 get a shared buffer from back-end. The shared buffer will be used to
1511 track inflight I/O by back-end. QEMU should retrieve a new one when vm
1521 been successfully negotiated, this message is submitted by the front-end to
1522 send the shared inflight buffer back to the back-end so that the back-end
1532 ancillary data. The GPU protocol is used to inform the front-end of
1533 rendering state and updates. See vhost-user-gpu.rst for details.
1541 Ask the vhost user back-end to disable all rings and reset all
1543 reinitialized. The back-end retains ownership of the device
1547 feature is set by the back-end.
1557 submitted by the front-end to indicate that a buffer was added to
1559 descriptor or having the back-end rely on polling.
1571 by the front-end to the back-end. The back-end should return the message with a
1573 QEMU to expose to the guest. The value returned by the back-end
1585 by the front-end to the back-end. The message payload contains a memory
1587 the back-end device must map in. When the
1591 update the memory tables of the back-end device.
1596 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
1597 replies with the bases of the memory mapped region to the front-end.
1609 by the front-end to the back-end. The message payload contains a memory
1611 the back-end device must unmap. When the
1615 update the memory tables of the back-end device.
1621 compatibility with existing incorrect implementations, the back-end MAY
1623 passed, the back-end MUST close it without using it otherwise.
1632 successfully negotiated, this message is submitted by the front-end to
1633 notify the back-end with updated device status as defined in the Virtio
1643 successfully negotiated, this message is submitted by the front-end to
1644 query the back-end for its device status as defined in the Virtio
1655 in the exporters cache, this message is submitted by the front-end
1656 to retrieve a given dma-buf fd from a given back-end, determined by
1657 the requested UUID. Back-end will reply passing the fd when the operation
1666 Front-end and back-end negotiate a channel over which to transfer the
1667 back-end's internal state during migration. Either side (front-end or
1668 back-end) may create the channel. The nature of this channel is not
1674 * For the writing end, it must allow writing the whole back-end state
1675 sequentially. Closing the file descriptor signals the end of
1678 * For the reading end, it must allow reading the whole back-end state
1679 sequentially. The end of file signals the end of the transfer.
1684 Initially, the front-end creates a channel along with such an FD. It
1685 passes the FD to the back-end as ancillary data of a
1686 ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a
1687 different transfer channel, passing the respective FD back to the
1688 front-end as ancillary data of the reply. If so, the front-end must
1689 then discard its channel and use the one provided by the back-end.
1691 Whether the back-end should decide to use its own channel is decided
1694 for more efficient processing on at least one end, e.g. through
1695 zero-copy, is considered more efficient and thus preferred. If the
1696 back-end can provide such a channel, it should decide to use it.
1699 transfer, as described in the :ref:`Migrating back-end state
1703 file descriptor for a back-end-provided channel is returned: Bits 0–7
1704 are 0 on success, and non-zero on error. Bit 8 is the invalid FD
1706 When this flag is not set, the front-end must use the returned file
1707 descriptor as its end of the transfer channel. The back-end must not
1719 After transferring the back-end's internal state during migration (see
1720 the :ref:`Migrating back-end state <migrating_backend_state>`
1721 section), check whether the back-end was able to successfully fully
1725 non-zero value is an error.
1730 Back-end message types
1731 ----------------------
1733 For this type of message, the request is sent by the back-end and the reply
1734 is sent by the front-end.
1743 The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
1745 negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
1747 non-zero otherwise. This request should be send only when
1757 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
1758 back-end sends such messages to notify that the virtio device's
1761 message to the back-end to get the latest content. If
1762 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
1763 ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
1764 operation is successfully completed, or non-zero otherwise.
1781 MMIO region. The back-end sends this request to tell QEMU to de-register
1797 submitted by the back-end to indicate that a buffer was used from
1799 descriptor or having the front-end relying on polling.
1811 submitted by the back-end to indicate that an error occurred on the
1813 set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
1826 table. The back-end device gets associated with a UUID in the shared table.
1827 The back-end is responsible of keeping its own table with exported dma-buf fds.
1828 When another back-end tries to import the resource associated with the UUID,
1829 it will send a message to the front-end, which will act as a proxy to the
1830 exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and
1831 the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must
1832 respond with zero when operation is successfully completed, or non-zero
1843 by the backend to remove themselves from to the virtio-dmabuf shared
1844 table API. Only the back-end owning the entry (i.e., the one that first added
1846 The shared table will remove the back-end device associated with
1848 back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond
1849 with zero when operation is successfully completed, or non-zero otherwise.
1859 by the backends to retrieve a given dma-buf fd from the virtio-dmabuf
1861 when the operation is successful, or non-zero otherwise. Note that if the
1867 -------------------------------
1869 The original vhost-user specification only demands replies for certain
1871 commands are sent over an ``ioctl()`` call and block until the back-end
1876 back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
1877 or failure. The payload should be set to zero on success or non-zero
1881 of the command. Today, QEMU is expected to terminate the main vhost-user
1885 For the message types that already solicit a reply from the back-end,
1895 vhost-user back-ends can provide various devices & services and may
1900 back-end programs and facilitate interoperability.
1902 Each back-end installed on a host system should come with at least one
1903 JSON file that conforms to the vhost-user.json schema. Each file
1904 informs the management applications about the back-end type, and binary
1906 picking the highest priority back-end when multiple match the search
1909 If the back-end is not capable of enabling a requested feature on the
1911 failed, the back-end should fail to start early and exit with a status
1914 The back-end program must not daemonize itself, but it may be
1922 The back-end program must end (as quickly and cleanly as possible) when
1929 --socket-path=PATH
1931 This option specify the location of the vhost-user Unix domain socket.
1932 It is incompatible with --fd.
1934 --fd=FDNUM
1936 When this argument is given, the back-end program is started with the
1937 vhost-user socket as file descriptor FDNUM. It is incompatible with
1938 --socket-path.
1940 --print-capabilities
1942 Output to stdout the back-end capabilities in JSON format, and then
1944 the back-end program should not perform its normal function. The
1948 The JSON output is described in the ``vhost-user.json`` schema, by
1956 "feature-a",
1957 "feature-b"
1961 vhost-user-input
1962 ----------------
1966 --evdev-path=PATH
1972 --no-grab
1978 vhost-user-gpu
1979 --------------
1983 --render-node=PATH
1989 --virgl
1995 vhost-user-blk
1996 --------------
2000 --blk-file=PATH
2006 --read-only
2008 Enable read-only.