xref: /openbmc/qemu/docs/interop/vhost-user.rst (revision c3d7c18b0d616cf7fb3c1f325503e1462307209d)
1.. _vhost_user_proto:
2
3===================
4Vhost-user Protocol
5===================
6
7..
8  Copyright 2014 Virtual Open Systems Sarl.
9  Copyright 2019 Intel Corporation
10  Licence: This work is licensed under the terms of the GNU GPL,
11           version 2 or later. See the COPYING file in the top-level
12           directory.
13
14.. contents:: Table of Contents
15
16Introduction
17============
18
19This protocol is aiming to complement the ``ioctl`` interface used to
20control the vhost implementation in the Linux kernel. It implements
21the control plane needed to establish virtqueue sharing with a user
22space process on the same host. It uses communication over a Unix
23domain socket to share file descriptors in the ancillary data of the
24message.
25
26The protocol defines 2 sides of the communication, *front-end* and
27*back-end*. The *front-end* is the application that shares its virtqueues, in
28our case QEMU. The *back-end* is the consumer of the virtqueues.
29
30In the current implementation QEMU is the *front-end*, and the *back-end*
31is the external process consuming the virtio queues, for example a
32software Ethernet switch running in user space, such as Snabbswitch,
33or a block device back-end processing read & write to a virtual
34disk. In order to facilitate interoperability between various back-end
35implementations, it is recommended to follow the :ref:`Backend program
36conventions <backend_conventions>`.
37
38The *front-end* and *back-end* can be either a client (i.e. connecting) or
39server (listening) in the socket communication.
40
41Support for platforms other than Linux
42--------------------------------------
43
44While vhost-user was initially developed targeting Linux, nowadays it
45is supported on any platform that provides the following features:
46
47- A way for requesting shared memory represented by a file descriptor
48  so it can be passed over a UNIX domain socket and then mapped by the
49  other process.
50
51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
52  exchange messages through it, including ancillary data when needed.
53
54- Either eventfd or pipe/pipe2. On platforms where eventfd is not
55  available, QEMU will automatically fall back to pipe2 or, as a last
56  resort, pipe. Each file descriptor will be used for receiving or
57  sending events by reading or writing (respectively) an 8-byte value
58  to the corresponding it. The 8-value itself has no meaning and
59  should not be interpreted.
60
61Message Specification
62=====================
63
64.. Note:: All numbers are in the machine native byte order.
65
66A vhost-user message consists of 3 header fields and a payload.
67
68+---------+-------+------+---------+
69| request | flags | size | payload |
70+---------+-------+------+---------+
71
72Header
73------
74
75:request: 32-bit type of the request
76
77:flags: 32-bit bit field
78
79- Lower 2 bits are the version (currently 0x01)
80- Bit 2 is the reply flag - needs to be sent on each reply from the back-end
81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
82  details.
83
84:size: 32-bit size of the payload
85
86Payload
87-------
88
89Depending on the request type, **payload** can be:
90
91A single 64-bit integer
92^^^^^^^^^^^^^^^^^^^^^^^
93
94+-----+
95| u64 |
96+-----+
97
98:u64: a 64-bit unsigned integer
99
100A vring state description
101^^^^^^^^^^^^^^^^^^^^^^^^^
102
103+-------+-----+
104| index | num |
105+-------+-----+
106
107:index: a 32-bit index
108
109:num: a 32-bit number
110
111A vring descriptor index for split virtqueues
112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113
114+-------------+---------------------+
115| vring index | index in avail ring |
116+-------------+---------------------+
117
118:vring index: 32-bit index of the respective virtqueue
119
120:index in avail ring: 32-bit value, of which currently only the lower 16
121  bits are used:
122
123  - Bits 0–15: Index of the next *Available Ring* descriptor that the
124    back-end will process.  This is a free-running index that is not
125    wrapped by the ring size.
126  - Bits 16–31: Reserved (set to zero)
127
128Vring descriptor indices for packed virtqueues
129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
130
131+-------------+--------------------+
132| vring index | descriptor indices |
133+-------------+--------------------+
134
135:vring index: 32-bit index of the respective virtqueue
136
137:descriptor indices: 32-bit value:
138
139  - Bits 0–14: Index of the next *Available Ring* descriptor that the
140    back-end will process.  This is a free-running index that is not
141    wrapped by the ring size.
142  - Bit 15: Driver (Available) Ring Wrap Counter
143  - Bits 16–30: Index of the entry in the *Used Ring* where the back-end
144    will place the next descriptor.  This is a free-running index that
145    is not wrapped by the ring size.
146  - Bit 31: Device (Used) Ring Wrap Counter
147
148A vring address description
149^^^^^^^^^^^^^^^^^^^^^^^^^^^
150
151+-------+-------+------------+------+-----------+-----+
152| index | flags | descriptor | used | available | log |
153+-------+-------+------------+------+-----------+-----+
154
155:index: a 32-bit vring index
156
157:flags: a 32-bit vring flags
158
159:descriptor: a 64-bit ring address of the vring descriptor table
160
161:used: a 64-bit ring address of the vring used ring
162
163:available: a 64-bit ring address of the vring available ring
164
165:log: a 64-bit guest address for logging
166
167Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
168been negotiated. Otherwise it is a user address.
169
170.. _memory_region_description:
171
172Memory region description
173^^^^^^^^^^^^^^^^^^^^^^^^^
174
175+---------------+------+--------------+-------------+
176| guest address | size | user address | mmap offset |
177+---------------+------+--------------+-------------+
178
179:guest address: a 64-bit guest address of the region
180
181:size: a 64-bit size
182
183:user address: a 64-bit user address
184
185:mmap offset: a 64-bit offset where region starts in the mapped memory
186
187When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
188successfully negotiated, the memory region description contains two extra
189fields at the end.
190
191+---------------+------+--------------+-------------+----------------+-------+
192| guest address | size | user address | mmap offset | xen mmap flags | domid |
193+---------------+------+--------------+-------------+----------------+-------+
194
195:xen mmap flags: a 32-bit bit field
196
197- Bit 0 is set for Xen foreign memory mapping.
198- Bit 1 is set for Xen grant memory mapping.
199- Bit 8 is set if the memory region can not be mapped in advance, and memory
200  areas within this region must be mapped / unmapped only when required by the
201  back-end. The back-end shouldn't try to map the entire region at once, as the
202  front-end may not allow it. The back-end should rather map only the required
203  amount of memory at once and unmap it after it is used.
204
205:domid: a 32-bit Xen hypervisor specific domain id.
206
207Single memory region description
208^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
209
210+---------+--------+
211| padding | region |
212+---------+--------+
213
214:padding: 64-bit
215
216:region: region is represented by :ref:`Memory region description <memory_region_description>`.
217
218Multiple Memory regions description
219^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
220
221+-------------+---------+---------+-----+---------+
222| num regions | padding | region0 | ... | region7 |
223+-------------+---------+---------+-----+---------+
224
225:num regions: a 32-bit number of regions
226
227:padding: 32-bit
228
229:regions: regions field contains 8 regions of type :ref:`Memory region description <memory_region_description>`.
230
231Log description
232^^^^^^^^^^^^^^^
233
234+----------+------------+
235| log size | log offset |
236+----------+------------+
237
238:log size: a 64-bit size of area used for logging
239
240:log offset: a 64-bit offset from start of supplied file descriptor where
241             logging starts (i.e. where guest address 0 would be
242             logged)
243
244An IOTLB message
245^^^^^^^^^^^^^^^^
246
247+------+------+--------------+-------------------+------+
248| iova | size | user address | permissions flags | type |
249+------+------+--------------+-------------------+------+
250
251:iova: a 64-bit I/O virtual address programmed by the guest
252
253:size: a 64-bit size
254
255:user address: a 64-bit user address
256
257:permissions flags: an 8-bit value:
258  - 0: No access
259  - 1: Read access
260  - 2: Write access
261  - 3: Read/Write access
262
263:type: an 8-bit IOTLB message type:
264  - 1: IOTLB miss
265  - 2: IOTLB update
266  - 3: IOTLB invalidate
267  - 4: IOTLB access fail
268
269Virtio device config space
270^^^^^^^^^^^^^^^^^^^^^^^^^^
271
272+--------+------+-------+---------+
273| offset | size | flags | payload |
274+--------+------+-------+---------+
275
276:offset: a 32-bit offset of virtio device's configuration space
277
278:size: a 32-bit configuration space access size in bytes
279
280:flags: a 32-bit value:
281  - 0: Vhost front-end messages used for writable fields
282  - 1: Vhost front-end messages used for live migration
283
284:payload: Size bytes array holding the contents of the virtio
285          device's configuration space
286
287Vring area description
288^^^^^^^^^^^^^^^^^^^^^^
289
290+-----+------+--------+
291| u64 | size | offset |
292+-----+------+--------+
293
294:u64: a 64-bit integer contains vring index and flags
295
296:size: a 64-bit size of this area
297
298:offset: a 64-bit offset of this area from the start of the
299         supplied file descriptor
300
301Inflight description
302^^^^^^^^^^^^^^^^^^^^
303
304+-----------+-------------+------------+------------+
305| mmap size | mmap offset | num queues | queue size |
306+-----------+-------------+------------+------------+
307
308:mmap size: a 64-bit size of area to track inflight I/O
309
310:mmap offset: a 64-bit offset of this area from the start
311              of the supplied file descriptor
312
313:num queues: a 16-bit number of virtqueues
314
315:queue size: a 16-bit size of virtqueues
316
317VhostUserShared
318^^^^^^^^^^^^^^^
319
320+------+
321| UUID |
322+------+
323
324:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
325  two 16-bit values) are stored in big endian.
326
327Device state transfer parameters
328^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
329
330+--------------------+-----------------+
331| transfer direction | migration phase |
332+--------------------+-----------------+
333
334:transfer direction: a 32-bit enum, describing the direction in which
335  the state is transferred:
336
337  - 0: Save: Transfer the state from the back-end to the front-end,
338    which happens on the source side of migration
339  - 1: Load: Transfer the state from the front-end to the back-end,
340    which happens on the destination side of migration
341
342:migration phase: a 32-bit enum, describing the state in which the VM
343  guest and devices are:
344
345  - 0: Stopped (in the period after the transfer of memory-mapped
346    regions before switch-over to the destination): The VM guest is
347    stopped, and the vhost-user device is suspended (see
348    :ref:`Suspended device state <suspended_device_state>`).
349
350  In the future, additional phases might be added e.g. to allow
351  iterative migration while the device is running.
352
353C structure
354-----------
355
356In QEMU the vhost-user message is implemented with the following struct:
357
358.. code:: c
359
360  typedef struct VhostUserMsg {
361      VhostUserRequest request;
362      uint32_t flags;
363      uint32_t size;
364      union {
365          uint64_t u64;
366          struct vhost_vring_state state;
367          struct vhost_vring_addr addr;
368          VhostUserMemory memory;
369          VhostUserLog log;
370          struct vhost_iotlb_msg iotlb;
371          VhostUserConfig config;
372          VhostUserVringArea area;
373          VhostUserInflight inflight;
374      };
375  } QEMU_PACKED VhostUserMsg;
376
377Communication
378=============
379
380The protocol for vhost-user is based on the existing implementation of
381vhost for the Linux Kernel. Most messages that can be sent via the
382Unix domain socket implementing vhost-user have an equivalent ioctl to
383the kernel implementation.
384
385The communication consists of the *front-end* sending message requests and
386the *back-end* sending message replies. Most of the requests don't require
387replies, except for the following requests:
388
389* ``VHOST_USER_GET_FEATURES``
390* ``VHOST_USER_GET_PROTOCOL_FEATURES``
391* ``VHOST_USER_GET_VRING_BASE``
392* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
393* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
394
395.. seealso::
396
397   :ref:`REPLY_ACK <reply_ack>`
398       The section on ``REPLY_ACK`` protocol extension.
399
400There are several messages that the front-end sends with file descriptors passed
401in the ancillary data:
402
403* ``VHOST_USER_ADD_MEM_REG``
404* ``VHOST_USER_SET_MEM_TABLE``
405* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
406* ``VHOST_USER_SET_LOG_FD``
407* ``VHOST_USER_SET_VRING_KICK``
408* ``VHOST_USER_SET_VRING_CALL``
409* ``VHOST_USER_SET_VRING_ERR``
410* ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
411* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
412* ``VHOST_USER_SET_DEVICE_STATE_FD``
413
414If *front-end* is unable to send the full message or receives a wrong
415reply it will close the connection. An optional reconnection mechanism
416can be implemented.
417
418If *back-end* detects some error such as incompatible features, it may also
419close the connection. This should only happen in exceptional circumstances.
420
421Any protocol extensions are gated by protocol feature bits, which
422allows full backwards compatibility on both front-end and back-end.  As
423older back-ends don't support negotiating protocol features, a feature
424bit was dedicated for this purpose::
425
426  #define VHOST_USER_F_PROTOCOL_FEATURES 30
427
428Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
429bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits
430<https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
431VIRTIO devices do not advertise this feature bit and therefore VIRTIO
432drivers cannot negotiate it.
433
434This reserved feature bit was reused by the vhost-user protocol to add
435vhost-user protocol feature negotiation in a backwards compatible
436fashion. Old vhost-user front-end and back-end implementations continue to
437work even though they are not aware of vhost-user protocol feature
438negotiation.
439
440Ring states
441-----------
442
443Rings have two independent states: started/stopped, and enabled/disabled.
444
445* While a ring is stopped, the back-end must not process the ring at
446  all, regardless of whether it is enabled or disabled.  The
447  enabled/disabled state should still be tracked, though, so it can come
448  into effect once the ring is started.
449
450* started and disabled: The back-end must process the ring without
451  causing any side effects.  For example, for a networking device,
452  in the disabled state the back-end must not supply any new RX packets,
453  but must process and discard any TX packets.
454
455* started and enabled: The back-end must process the ring normally, i.e.
456  process all requests and execute them.
457
458Each ring is initialized in a stopped and disabled state.  The back-end
459must start a ring upon receiving a kick (that is, detecting that file
460descriptor is readable) on the descriptor specified by
461``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
462``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving
463``VHOST_USER_GET_VRING_BASE``.
464
465Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
466
467In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from
468the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the
469back-end must enable all rings immediately.
470
471While processing the rings (whether they are enabled or not), the back-end
472must support changing some configuration aspects on the fly.
473
474.. _suspended_device_state:
475
476Suspended device state
477^^^^^^^^^^^^^^^^^^^^^^
478
479While all vrings are stopped, the device is *suspended*.  In addition to
480not processing any vring (because they are stopped), the device must:
481
482* not write to any guest memory regions,
483* not send any notifications to the guest,
484* not send any messages to the front-end,
485* still process and reply to messages from the front-end.
486
487Multiple queue support
488----------------------
489
490Many devices have a fixed number of virtqueues.  In this case the front-end
491already knows the number of available virtqueues without communicating with the
492back-end.
493
494Some devices do not have a fixed number of virtqueues.  Instead the maximum
495number of virtqueues is chosen by the back-end.  The number can depend on host
496resource availability or back-end implementation details.  Such devices are called
497multiple queue devices.
498
499Multiple queue support allows the back-end to advertise the maximum number of
500queues.  This is treated as a protocol extension, hence the back-end has to
501implement protocol features first. The multiple queues feature is supported
502only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
503
504The max number of queues the back-end supports can be queried with message
505``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
506queues is bigger than that.
507
508As all queues share one connection, the front-end uses a unique index for each
509queue in the sent message to identify a specified queue.
510
511The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
512vhost-user-net has historically automatically enabled the first queue pair.
513
514Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
515feature, even for devices with a fixed number of virtqueues, since it is simple
516to implement and offers a degree of introspection.
517
518Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
519devices with a fixed number of virtqueues.  Only true multiqueue devices
520require this protocol feature.
521
522Migration
523---------
524
525During live migration, the front-end may need to track the modifications
526the back-end makes to the memory mapped regions. The front-end should mark
527the dirty pages in a log. Once it complies to this logging, it may
528declare the ``VHOST_F_LOG_ALL`` vhost feature.
529
530To start/stop logging of data/used ring writes, the front-end may send
531messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
532``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
533flags set to 1/0, respectively.
534
535All the modifications to memory pointed by vring "descriptor" should
536be marked. Modifications to "used" vring should be marked if
537``VHOST_VRING_F_LOG`` is part of ring's flags.
538
539Dirty pages are of size::
540
541  #define VHOST_LOG_PAGE 0x1000
542
543The log memory fd is provided in the ancillary data of
544``VHOST_USER_SET_LOG_BASE`` message when the back-end has
545``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
546
547The size of the log is supplied as part of ``VhostUserMsg`` which
548should be large enough to cover all known guest addresses. Log starts
549at the supplied offset in the supplied file descriptor.  The log
550covers from address 0 to the maximum of guest regions. In pseudo-code,
551to mark page at ``addr`` as dirty::
552
553  page = addr / VHOST_LOG_PAGE
554  log[page / 8] |= 1 << page % 8
555
556Where ``addr`` is the guest physical address.
557
558Use atomic operations, as the log may be concurrently manipulated.
559
560Note that when logging modifications to the used ring (when
561``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
562be used to calculate the log offset: the write to first byte of the
563used ring is logged at this offset from log start. Also note that this
564value might be outside the legal guest physical address range
565(i.e. does not have to be covered by the ``VhostUserMemory`` table), but
566the bit offset of the last byte of the ring must fall within the size
567supplied by ``VhostUserLog``.
568
569``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
570ancillary data, it may be used to inform the front-end that the log has
571been modified.
572
573Once the source has finished migration, rings will be stopped by the
574source (:ref:`Suspended device state <suspended_device_state>`). No
575further update must be done before rings are restarted.
576
577In postcopy migration the back-end is started before all the memory has
578been received from the source host, and care must be taken to avoid
579accessing pages that have yet to be received.  The back-end opens a
580'userfault'-fd and registers the memory with it; this fd is then
581passed back over to the front-end.  The front-end services requests on the
582userfaultfd for pages that are accessed and when the page is available
583it performs WAKE ioctl's on the userfaultfd to wake the stalled
584back-end.  The front-end indicates support for this via the
585``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
586
587.. _migrating_backend_state:
588
589Migrating back-end state
590^^^^^^^^^^^^^^^^^^^^^^^^
591
592Migrating device state involves transferring the state from one
593back-end, called the source, to another back-end, called the
594destination.  After migration, the destination transparently resumes
595operation without requiring the driver to re-initialize the device at
596the VIRTIO level.  If the migration fails, then the source can
597transparently resume operation until another migration attempt is made.
598
599Generally, the front-end is connected to a virtual machine guest (which
600contains the driver), which has its own state to transfer between source
601and destination, and therefore will have an implementation-specific
602mechanism to do so.  The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
603provides functionality to have the front-end include the back-end's
604state in this transfer operation so the back-end does not need to
605implement its own mechanism, and so the virtual machine may have its
606complete state, including vhost-user devices' states, contained within a
607single stream of data.
608
609To do this, the back-end state is transferred from back-end to front-end
610on the source side, and vice versa on the destination side.  This
611transfer happens over a channel that is negotiated using the
612``VHOST_USER_SET_DEVICE_STATE_FD`` message.  This message has two
613parameters:
614
615* Direction of transfer: On the source, the data is saved, transferring
616  it from the back-end to the front-end.  On the destination, the data
617  is loaded, transferring it from the front-end to the back-end.
618
619* Migration phase: Currently, the only supported phase is the period
620  after the transfer of memory-mapped regions before switch-over to the
621  destination, when both the source and destination devices are
622  suspended (:ref:`Suspended device state <suspended_device_state>`).
623  In the future, additional phases might be supported to allow iterative
624  migration while the device is running.
625
626The nature of the channel is implementation-defined, but it must
627generally behave like a pipe: The writing end will write all the data it
628has into it, signalling the end of data by closing its end.  The reading
629end must read all of this data (until encountering the end of file) and
630process it.
631
632* When saving, the writing end is the source back-end, and the reading
633  end is the source front-end.  After reading the state data from the
634  channel, the source front-end must transfer it to the destination
635  front-end through an implementation-defined mechanism.
636
637* When loading, the writing end is the destination front-end, and the
638  reading end is the destination back-end.  After reading the state data
639  from the channel, the destination back-end must deserialize its
640  internal state from that data and set itself up to allow the driver to
641  seamlessly resume operation on the VIRTIO level.
642
643Seamlessly resuming operation means that the migration must be
644transparent to the guest driver, which operates on the VIRTIO level.
645This driver will not perform any re-initialization steps, but continue
646to use the device as if no migration had occurred.  The vhost-user
647front-end, however, will re-initialize the vhost state on the
648destination, following the usual protocol for establishing a connection
649to a vhost-user back-end: This includes, for example, setting up memory
650mappings and kick and call FDs as necessary, negotiating protocol
651features, or setting the initial vring base indices (to the same value
652as on the source side, so that operation can resume).
653
654Both on the source and on the destination side, after the respective
655front-end has seen all data transferred (when the transfer FD has been
656closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
657verify that data transfer was successful in the back-end, too.  The
658back-end responds once it knows whether the transfer and processing was
659successful or not.
660
661Memory access
662-------------
663
664The front-end sends a list of vhost memory regions to the back-end using the
665``VHOST_USER_SET_MEM_TABLE`` message.  Each region has two base
666addresses: a guest address and a user address.
667
668Messages contain guest addresses and/or user addresses to reference locations
669within the shared memory.  The mapping of these addresses works as follows.
670
671User addresses map to the vhost memory region containing that user address.
672
673When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
674
675* Guest addresses map to the vhost memory region containing that guest
676  address.
677
678When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
679
680* Guest addresses are also called I/O virtual addresses (IOVAs).  They are
681  translated to user addresses via the IOTLB.
682
683* The vhost memory region guest address is not used.
684
685IOMMU support
686-------------
687
688When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
689front-end sends IOTLB entries update & invalidation by sending
690``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
691vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
692has to be filled with the update message type (2), the I/O virtual
693address, the size, the user virtual address, and the permissions
694flags. Addresses and size must be within vhost memory regions set via
695the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
696``iotlb`` payload has to be filled with the invalidation message type
697(3), the I/O virtual address and the size. On success, the back-end is
698expected to reply with a zero payload, non-zero otherwise.
699
700The back-end relies on the back-end communication channel (see :ref:`Back-end
701communication <backend_communication>` section below) to send IOTLB miss
702and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG``
703requests to the front-end with a ``struct vhost_iotlb_msg`` as
704payload. For miss events, the iotlb payload has to be filled with the
705miss message type (1), the I/O virtual address and the permissions
706flags. For access failure event, the iotlb payload has to be filled
707with the access failure message type (4), the I/O virtual address and
708the permissions flags.  For synchronization purpose, the back-end may
709rely on the reply-ack feature, so the front-end may send a reply when
710operation is completed if the reply-ack feature is negotiated and
711back-ends requests a reply. For miss events, completed operation means
712either front-end sent an update message containing the IOTLB entry
713containing requested address and permission, or front-end sent nothing if
714the IOTLB miss message is invalid (invalid IOVA or permission).
715
716The front-end isn't expected to take the initiative to send IOTLB update
717messages, as the back-end sends IOTLB miss messages for the guest virtual
718memory areas it needs to access.
719
720.. _backend_communication:
721
722Back-end communication
723----------------------
724
725An optional communication channel is provided if the back-end declares
726``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the
727back-end to make requests to the front-end.
728
729The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data.
730
731A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
732using this fd communication channel.
733
734If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is
735negotiated, back-end can send file descriptors (at most 8 descriptors in
736each message) to front-end via ancillary data using this fd communication
737channel.
738
739Inflight I/O tracking
740---------------------
741
742To support reconnecting after restart or crash, back-end may need to
743resubmit inflight I/Os. If virtqueue is processed in order, we can
744easily achieve that by getting the inflight descriptors from
745descriptor table (split virtqueue) or descriptor ring (packed
746virtqueue). However, it can't work when we process descriptors
747out-of-order because some entries which store the information of
748inflight descriptors in available ring (split virtqueue) or descriptor
749ring (packed virtqueue) might be overridden by new entries. To solve
750this problem, the back-end need to allocate an extra buffer to store this
751information of inflight descriptors and share it with front-end for
752persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
753``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
754between front-end and back-end. And the format of this buffer is described
755below:
756
757+---------------+---------------+-----+---------------+
758| queue0 region | queue1 region | ... | queueN region |
759+---------------+---------------+-----+---------------+
760
761N is the number of available virtqueues. The back-end could get it from num
762queues field of ``VhostUserInflight``.
763
764For split virtqueue, queue region can be implemented as:
765
766.. code:: c
767
768  typedef struct DescStateSplit {
769      /* Indicate whether this descriptor is inflight or not.
770       * Only available for head-descriptor. */
771      uint8_t inflight;
772
773      /* Padding */
774      uint8_t padding[5];
775
776      /* Maintain a list for the last batch of used descriptors.
777       * Only available when batching is used for submitting */
778      uint16_t next;
779
780      /* Used to preserve the order of fetching available descriptors.
781       * Only available for head-descriptor. */
782      uint64_t counter;
783  } DescStateSplit;
784
785  typedef struct QueueRegionSplit {
786      /* The feature flags of this region. Now it's initialized to 0. */
787      uint64_t features;
788
789      /* The version of this region. It's 1 currently.
790       * Zero value indicates an uninitialized buffer */
791      uint16_t version;
792
793      /* The size of DescStateSplit array. It's equal to the virtqueue size.
794       * The back-end could get it from queue size field of VhostUserInflight. */
795      uint16_t desc_num;
796
797      /* The head of list that track the last batch of used descriptors. */
798      uint16_t last_batch_head;
799
800      /* Store the idx value of used ring */
801      uint16_t used_idx;
802
803      /* Used to track the state of each descriptor in descriptor table */
804      DescStateSplit desc[];
805  } QueueRegionSplit;
806
807To track inflight I/O, the queue region should be processed as follows:
808
809When receiving available buffers from the driver:
810
811#. Get the next available head-descriptor index from available ring, ``i``
812
813#. Set ``desc[i].counter`` to the value of global counter
814
815#. Increase global counter by 1
816
817#. Set ``desc[i].inflight`` to 1
818
819When supplying used buffers to the driver:
820
8211. Get corresponding used head-descriptor index, i
822
8232. Set ``desc[i].next`` to ``last_batch_head``
824
8253. Set ``last_batch_head`` to ``i``
826
827#. Steps 1,2,3 may be performed repeatedly if batching is possible
828
829#. Increase the ``idx`` value of used ring by the size of the batch
830
831#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
832
833#. Set ``used_idx`` to the ``idx`` value of used ring
834
835When reconnecting:
836
837#. If the value of ``used_idx`` does not match the ``idx`` value of
838   used ring (means the inflight field of ``DescStateSplit`` entries in
839   last batch may be incorrect),
840
841   a. Subtract the value of ``used_idx`` from the ``idx`` value of
842      used ring to get last batch size of ``DescStateSplit`` entries
843
844   #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
845      list which starts from ``last_batch_head``
846
847   #. Set ``used_idx`` to the ``idx`` value of used ring
848
849#. Resubmit inflight ``DescStateSplit`` entries in order of their
850   counter value
851
852For packed virtqueue, queue region can be implemented as:
853
854.. code:: c
855
856  typedef struct DescStatePacked {
857      /* Indicate whether this descriptor is inflight or not.
858       * Only available for head-descriptor. */
859      uint8_t inflight;
860
861      /* Padding */
862      uint8_t padding;
863
864      /* Link to the next free entry */
865      uint16_t next;
866
867      /* Link to the last entry of descriptor list.
868       * Only available for head-descriptor. */
869      uint16_t last;
870
871      /* The length of descriptor list.
872       * Only available for head-descriptor. */
873      uint16_t num;
874
875      /* Used to preserve the order of fetching available descriptors.
876       * Only available for head-descriptor. */
877      uint64_t counter;
878
879      /* The buffer id */
880      uint16_t id;
881
882      /* The descriptor flags */
883      uint16_t flags;
884
885      /* The buffer length */
886      uint32_t len;
887
888      /* The buffer address */
889      uint64_t addr;
890  } DescStatePacked;
891
892  typedef struct QueueRegionPacked {
893      /* The feature flags of this region. Now it's initialized to 0. */
894      uint64_t features;
895
896      /* The version of this region. It's 1 currently.
897       * Zero value indicates an uninitialized buffer */
898      uint16_t version;
899
900      /* The size of DescStatePacked array. It's equal to the virtqueue size.
901       * The back-end could get it from queue size field of VhostUserInflight. */
902      uint16_t desc_num;
903
904      /* The head of free DescStatePacked entry list */
905      uint16_t free_head;
906
907      /* The old head of free DescStatePacked entry list */
908      uint16_t old_free_head;
909
910      /* The used index of descriptor ring */
911      uint16_t used_idx;
912
913      /* The old used index of descriptor ring */
914      uint16_t old_used_idx;
915
916      /* Device ring wrap counter */
917      uint8_t used_wrap_counter;
918
919      /* The old device ring wrap counter */
920      uint8_t old_used_wrap_counter;
921
922      /* Padding */
923      uint8_t padding[7];
924
925      /* Used to track the state of each descriptor fetched from descriptor ring */
926      DescStatePacked desc[];
927  } QueueRegionPacked;
928
929To track inflight I/O, the queue region should be processed as follows:
930
931When receiving available buffers from the driver:
932
933#. Get the next available descriptor entry from descriptor ring, ``d``
934
935#. If ``d`` is head descriptor,
936
937   a. Set ``desc[old_free_head].num`` to 0
938
939   #. Set ``desc[old_free_head].counter`` to the value of global counter
940
941   #. Increase global counter by 1
942
943   #. Set ``desc[old_free_head].inflight`` to 1
944
945#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
946   ``free_head``
947
948#. Increase ``desc[old_free_head].num`` by 1
949
950#. Set ``desc[free_head].addr``, ``desc[free_head].len``,
951   ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
952   ``d.len``, ``d.flags``, ``d.id``
953
954#. Set ``free_head`` to ``desc[free_head].next``
955
956#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
957
958When supplying used buffers to the driver:
959
9601. Get corresponding used head-descriptor entry from descriptor ring,
961   ``d``
962
9632. Get corresponding ``DescStatePacked`` entry, ``e``
964
9653. Set ``desc[e.last].next`` to ``free_head``
966
9674. Set ``free_head`` to the index of ``e``
968
969#. Steps 1,2,3,4 may be performed repeatedly if batching is possible
970
971#. Increase ``used_idx`` by the size of the batch and update
972   ``used_wrap_counter`` if needed
973
974#. Update ``d.flags``
975
976#. Set the ``inflight`` field of each head ``DescStatePacked`` entry
977   in the batch to 0
978
979#. Set ``old_free_head``,  ``old_used_idx``, ``old_used_wrap_counter``
980   to ``free_head``, ``used_idx``, ``used_wrap_counter``
981
982When reconnecting:
983
984#. If ``used_idx`` does not match ``old_used_idx`` (means the
985   ``inflight`` field of ``DescStatePacked`` entries in last batch may
986   be incorrect),
987
988   a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
989
990   #. Use ``old_used_wrap_counter`` to calculate the available flags
991
992   #. If ``d.flags`` is not equal to the calculated flags value (means
993      back-end has submitted the buffer to guest driver before crash, so
994      it has to commit the in-progress update), set ``old_free_head``,
995      ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
996      ``used_idx``, ``used_wrap_counter``
997
998#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
999   ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
1000   (roll back any in-progress update)
1001
1002#. Set the ``inflight`` field of each ``DescStatePacked`` entry in
1003   free list to 0
1004
1005#. Resubmit inflight ``DescStatePacked`` entries in order of their
1006   counter value
1007
1008In-band notifications
1009---------------------
1010
1011In some limited situations (e.g. for simulation) it is desirable to
1012have the kick, call and error (if used) signals done via in-band
1013messages instead of asynchronous eventfd notifications. This can be
1014done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
1015protocol feature.
1016
1017Note that due to the fact that too many messages on the sockets can
1018cause the sending application(s) to block, it is not advised to use
1019this feature unless absolutely necessary. It is also considered an
1020error to negotiate this feature without also negotiating
1021``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
1022the former is necessary for getting a message channel from the back-end
1023to the front-end, while the latter needs to be used with the in-band
1024notification messages to block until they are processed, both to avoid
1025blocking later and for proper processing (at least in the simulation
1026use case.) As it has no other way of signalling this error, the back-end
1027should close the connection as a response to a
1028``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
1029notifications feature flag without the other two.
1030
1031Protocol features
1032-----------------
1033
1034.. code:: c
1035
1036  #define VHOST_USER_PROTOCOL_F_MQ                    0
1037  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
1038  #define VHOST_USER_PROTOCOL_F_RARP                  2
1039  #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
1040  #define VHOST_USER_PROTOCOL_F_MTU                   4
1041  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
1042  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
1043  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
1044  #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
1045  #define VHOST_USER_PROTOCOL_F_CONFIG                9
1046  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
1047  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
1048  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
1049  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
1050  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
1051  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
1052  #define VHOST_USER_PROTOCOL_F_STATUS               16
1053  #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
1054  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT        18
1055  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         19
1056
1057Front-end message types
1058-----------------------
1059
1060``VHOST_USER_GET_FEATURES``
1061  :id: 1
1062  :equivalent ioctl: ``VHOST_GET_FEATURES``
1063  :request payload: N/A
1064  :reply payload: ``u64``
1065
1066  Get from the underlying vhost implementation the features bitmask.
1067  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
1068  for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
1069  ``VHOST_USER_SET_PROTOCOL_FEATURES``.
1070
1071``VHOST_USER_SET_FEATURES``
1072  :id: 2
1073  :equivalent ioctl: ``VHOST_SET_FEATURES``
1074  :request payload: ``u64``
1075  :reply payload: N/A
1076
1077  Enable features in the underlying vhost implementation using a
1078  bitmask.  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
1079  back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
1080  ``VHOST_USER_SET_PROTOCOL_FEATURES``.
1081
1082``VHOST_USER_GET_PROTOCOL_FEATURES``
1083  :id: 15
1084  :equivalent ioctl: ``VHOST_GET_FEATURES``
1085  :request payload: N/A
1086  :reply payload: ``u64``
1087
1088  Get the protocol feature bitmask from the underlying vhost
1089  implementation.  Only legal if feature bit
1090  ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
1091  ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
1092  ``VHOST_USER_SET_FEATURES``.
1093
1094.. Note::
1095   Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
1096   support this message even before ``VHOST_USER_SET_FEATURES`` was
1097   called.
1098
1099``VHOST_USER_SET_PROTOCOL_FEATURES``
1100  :id: 16
1101  :equivalent ioctl: ``VHOST_SET_FEATURES``
1102  :request payload: ``u64``
1103  :reply payload: N/A
1104
1105  Enable protocol features in the underlying vhost implementation.
1106
1107  Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
1108  ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
1109  ``VHOST_USER_SET_FEATURES``.
1110
1111.. Note::
1112   Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
1113   this message even before ``VHOST_USER_SET_FEATURES`` was called.
1114
1115``VHOST_USER_SET_OWNER``
1116  :id: 3
1117  :equivalent ioctl: ``VHOST_SET_OWNER``
1118  :request payload: N/A
1119  :reply payload: N/A
1120
1121  Issued when a new connection is established. It marks the sender
1122  as the front-end that owns of the session. This can be used on the *back-end*
1123  as a "session start" flag.
1124
1125``VHOST_USER_RESET_OWNER``
1126  :id: 4
1127  :request payload: N/A
1128  :reply payload: N/A
1129
1130.. admonition:: Deprecated
1131
1132   This is no longer used. Used to be sent to request disabling all
1133   rings, but some back-ends interpreted it to also discard connection
1134   state (this interpretation would lead to bugs).  It is recommended
1135   that back-ends either ignore this message, or use it to disable all
1136   rings.
1137
1138``VHOST_USER_SET_MEM_TABLE``
1139  :id: 5
1140  :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
1141  :request payload: multiple memory regions description
1142  :reply payload: (postcopy only) multiple memory regions description
1143
1144  Sets the memory map regions on the back-end so it can translate the
1145  vring addresses. In the ancillary data there is an array of file
1146  descriptors for each memory mapped region. The size and ordering of
1147  the fds matches the number and ordering of memory regions.
1148
1149  When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
1150  ``SET_MEM_TABLE`` replies with the bases of the memory mapped
1151  regions to the front-end.  The back-end must have mmap'd the regions but
1152  not yet accessed them and should not yet generate a userfault
1153  event.
1154
1155.. Note::
1156   ``NEED_REPLY_MASK`` is not set in this case.  QEMU will then
1157   reply back to the list of mappings with an empty
1158   ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
1159   reception of this message may the guest start accessing the memory
1160   and generating faults.
1161
1162``VHOST_USER_SET_LOG_BASE``
1163  :id: 6
1164  :equivalent ioctl: ``VHOST_SET_LOG_BASE``
1165  :request payload: u64
1166  :reply payload: N/A
1167
1168  Sets logging shared memory space.
1169
1170  When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
1171  the log memory fd is provided in the ancillary data of
1172  ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
1173  memory area provided in the message.
1174
1175``VHOST_USER_SET_LOG_FD``
1176  :id: 7
1177  :equivalent ioctl: ``VHOST_SET_LOG_FD``
1178  :request payload: N/A
1179  :reply payload: N/A
1180
1181  Sets the logging file descriptor, which is passed as ancillary data.
1182
1183``VHOST_USER_SET_VRING_NUM``
1184  :id: 8
1185  :equivalent ioctl: ``VHOST_SET_VRING_NUM``
1186  :request payload: vring state description
1187  :reply payload: N/A
1188
1189  Set the size of the queue.
1190
1191``VHOST_USER_SET_VRING_ADDR``
1192  :id: 9
1193  :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
1194  :request payload: vring address description
1195  :reply payload: N/A
1196
1197  Sets the addresses of the different aspects of the vring.
1198
1199``VHOST_USER_SET_VRING_BASE``
1200  :id: 10
1201  :equivalent ioctl: ``VHOST_SET_VRING_BASE``
1202  :request payload: vring descriptor index/indices
1203  :reply payload: N/A
1204
1205  Sets the next index to use for descriptors in this vring:
1206
1207  * For a split virtqueue, sets only the next descriptor index to
1208    process in the *Available Ring*.  The device is supposed to read the
1209    next index in the *Used Ring* from the respective vring structure in
1210    guest memory.
1211
1212  * For a packed virtqueue, both indices are supplied, as they are not
1213    explicitly available in memory.
1214
1215  Consequently, the payload type is specific to the type of virt queue
1216  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
1217  indices for packed virtqueues*).
1218
1219``VHOST_USER_GET_VRING_BASE``
1220  :id: 11
1221  :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
1222  :request payload: vring state description
1223  :reply payload: vring descriptor index/indices
1224
1225  Stops the vring and returns the current descriptor index or indices:
1226
1227    * For a split virtqueue, returns only the 16-bit next descriptor
1228      index to process in the *Available Ring*.  Note that this may
1229      differ from the available ring index in the vring structure in
1230      memory, which points to where the driver will put new available
1231      descriptors.  For the *Used Ring*, the device only needs the next
1232      descriptor index at which to put new descriptors, which is the
1233      value in the vring structure in memory, so this value is not
1234      covered by this message.
1235
1236    * For a packed virtqueue, neither index is explicitly available to
1237      read from memory, so both indices (as maintained by the device) are
1238      returned.
1239
1240  Consequently, the payload type is specific to the type of virt queue
1241  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
1242  indices for packed virtqueues*).
1243
1244  When and as long as all of a device's vrings are stopped, it is
1245  *suspended*, see :ref:`Suspended device state
1246  <suspended_device_state>`.
1247
1248  The request payload's *num* field is currently reserved and must be
1249  set to 0.
1250
1251``VHOST_USER_SET_VRING_KICK``
1252  :id: 12
1253  :equivalent ioctl: ``VHOST_SET_VRING_KICK``
1254  :request payload: ``u64``
1255  :reply payload: N/A
1256
1257  Set the event file descriptor for adding buffers to the vring. It is
1258  passed in the ancillary data.
1259
1260  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1261  invalid FD flag. This flag is set when there is no file descriptor
1262  in the ancillary data. This signals that polling should be used
1263  instead of waiting for the kick. Note that if the protocol feature
1264  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
1265  this message isn't necessary as the ring is also started on the
1266  ``VHOST_USER_VRING_KICK`` message, it may however still be used to
1267  set an event file descriptor (which will be preferred over the
1268  message) or to enable polling.
1269
1270``VHOST_USER_SET_VRING_CALL``
1271  :id: 13
1272  :equivalent ioctl: ``VHOST_SET_VRING_CALL``
1273  :request payload: ``u64``
1274  :reply payload: N/A
1275
1276  Set the event file descriptor to signal when buffers are used. It is
1277  passed in the ancillary data.
1278
1279  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1280  invalid FD flag. This flag is set when there is no file descriptor
1281  in the ancillary data. This signals that polling will be used
1282  instead of waiting for the call. Note that if the protocol features
1283  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1284  ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
1285  isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be
1286  used, it may however still be used to set an event file descriptor
1287  or to enable polling.
1288
1289``VHOST_USER_SET_VRING_ERR``
1290  :id: 14
1291  :equivalent ioctl: ``VHOST_SET_VRING_ERR``
1292  :request payload: ``u64``
1293  :reply payload: N/A
1294
1295  Set the event file descriptor to signal when error occurs. It is
1296  passed in the ancillary data.
1297
1298  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1299  invalid FD flag. This flag is set when there is no file descriptor
1300  in the ancillary data. Note that if the protocol features
1301  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1302  ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
1303  isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be
1304  used, it may however still be used to set an event file descriptor
1305  (which will be preferred over the message).
1306
1307``VHOST_USER_GET_QUEUE_NUM``
1308  :id: 17
1309  :equivalent ioctl: N/A
1310  :request payload: N/A
1311  :reply payload: u64
1312
1313  Query how many queues the back-end supports.
1314
1315  This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
1316  is set in queried protocol features by
1317  ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1318
1319``VHOST_USER_SET_VRING_ENABLE``
1320  :id: 18
1321  :equivalent ioctl: N/A
1322  :request payload: vring state description
1323  :reply payload: N/A
1324
1325  Signal the back-end to enable or disable corresponding vring.
1326
1327  This request should be sent only when
1328  ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
1329
1330``VHOST_USER_SEND_RARP``
1331  :id: 19
1332  :equivalent ioctl: N/A
1333  :request payload: ``u64``
1334  :reply payload: N/A
1335
1336  Ask vhost user back-end to broadcast a fake RARP to notify the migration
1337  is terminated for guest that does not support GUEST_ANNOUNCE.
1338
1339  Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
1340  present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1341  ``VHOST_USER_PROTOCOL_F_RARP`` is present in
1342  ``VHOST_USER_GET_PROTOCOL_FEATURES``.  The first 6 bytes of the
1343  payload contain the mac address of the guest to allow the vhost user
1344  back-end to construct and broadcast the fake RARP.
1345
1346``VHOST_USER_NET_SET_MTU``
1347  :id: 20
1348  :equivalent ioctl: N/A
1349  :request payload: ``u64``
1350  :reply payload: N/A
1351
1352  Set host MTU value exposed to the guest.
1353
1354  This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
1355  has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
1356  is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1357  ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
1358  ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1359
1360  If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1361  respond with zero in case the specified MTU is valid, or non-zero
1362  otherwise.
1363
1364``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
1365  :id: 21
1366  :equivalent ioctl: N/A
1367  :request payload: N/A
1368  :reply payload: N/A
1369
1370  Set the socket file descriptor for back-end initiated requests. It is passed
1371  in the ancillary data.
1372
1373  This request should be sent only when
1374  ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
1375  feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in
1376  ``VHOST_USER_GET_PROTOCOL_FEATURES``.  If
1377  ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1378  respond with zero for success, non-zero otherwise.
1379
1380``VHOST_USER_IOTLB_MSG``
1381  :id: 22
1382  :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1383  :request payload: ``struct vhost_iotlb_msg``
1384  :reply payload: ``u64``
1385
1386  Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1387
1388  The front-end sends such requests to update and invalidate entries in the
1389  device IOTLB. The back-end has to acknowledge the request with sending
1390  zero as ``u64`` payload for success, non-zero otherwise.
1391
1392  This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
1393  feature has been successfully negotiated.
1394
1395``VHOST_USER_SET_VRING_ENDIAN``
1396  :id: 23
1397  :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
1398  :request payload: vring state description
1399  :reply payload: N/A
1400
1401  Set the endianness of a VQ for legacy devices. Little-endian is
1402  indicated with state.num set to 0 and big-endian is indicated with
1403  state.num set to 1. Other values are invalid.
1404
1405  This request should be sent only when
1406  ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
1407  Backends that negotiated this feature should handle both
1408  endiannesses and expect this message once (per VQ) during device
1409  configuration (ie. before the front-end starts the VQ).
1410
1411``VHOST_USER_GET_CONFIG``
1412  :id: 24
1413  :equivalent ioctl: N/A
1414  :request payload: virtio device config space
1415  :reply payload: virtio device config space
1416
1417  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1418  submitted by the vhost-user front-end to fetch the contents of the
1419  virtio device configuration space, vhost-user back-end's payload size
1420  MUST match the front-end's request, vhost-user back-end uses zero length of
1421  payload to indicate an error to the vhost-user front-end. The vhost-user
1422  front-end may cache the contents to avoid repeated
1423  ``VHOST_USER_GET_CONFIG`` calls.
1424
1425``VHOST_USER_SET_CONFIG``
1426  :id: 25
1427  :equivalent ioctl: N/A
1428  :request payload: virtio device config space
1429  :reply payload: N/A
1430
1431  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1432  submitted by the vhost-user front-end when the Guest changes the virtio
1433  device configuration space and also can be used for live migration
1434  on the destination host. The vhost-user back-end must check the flags
1435  field, and back-ends MUST NOT accept SET_CONFIG for read-only
1436  configuration space fields unless the live migration bit is set.
1437
1438``VHOST_USER_CREATE_CRYPTO_SESSION``
1439  :id: 26
1440  :equivalent ioctl: N/A
1441  :request payload: crypto session description
1442  :reply payload: crypto session description
1443
1444  Create a session for crypto operation. The back-end must return
1445  the session id, 0 or positive for success, negative for failure.
1446  This request should be sent only when
1447  ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1448  successfully negotiated.  It's a required feature for crypto
1449  devices.
1450
1451``VHOST_USER_CLOSE_CRYPTO_SESSION``
1452  :id: 27
1453  :equivalent ioctl: N/A
1454  :request payload: ``u64``
1455  :reply payload: N/A
1456
1457  Close a session for crypto operation which was previously
1458  created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
1459
1460  This request should be sent only when
1461  ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1462  successfully negotiated.  It's a required feature for crypto
1463  devices.
1464
1465``VHOST_USER_POSTCOPY_ADVISE``
1466  :id: 28
1467  :request payload: N/A
1468  :reply payload: userfault fd
1469
1470  When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
1471  advises back-end that a migration with postcopy enabled is underway,
1472  the back-end must open a userfaultfd for later use.  Note that at this
1473  stage the migration is still in precopy mode.
1474
1475``VHOST_USER_POSTCOPY_LISTEN``
1476  :id: 29
1477  :request payload: N/A
1478  :reply payload: N/A
1479
1480  The front-end advises back-end that a transition to postcopy mode has
1481  happened.  The back-end must ensure that shared memory is registered
1482  with userfaultfd to cause faulting of non-present pages.
1483
1484  This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
1485  and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
1486
1487``VHOST_USER_POSTCOPY_END``
1488  :id: 30
1489  :request payload: N/A
1490  :reply payload: ``u64``
1491
1492  The front-end advises that postcopy migration has now completed.  The back-end
1493  must disable the userfaultfd. The reply is an acknowledgement
1494  only.
1495
1496  When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
1497  is sent at the end of the migration, after
1498  ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
1499
1500  The value returned is an error indication; 0 is success.
1501
1502``VHOST_USER_GET_INFLIGHT_FD``
1503  :id: 31
1504  :equivalent ioctl: N/A
1505  :request payload: inflight description
1506  :reply payload: N/A
1507
1508  When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1509  been successfully negotiated, this message is submitted by the front-end to
1510  get a shared buffer from back-end. The shared buffer will be used to
1511  track inflight I/O by back-end. QEMU should retrieve a new one when vm
1512  reset.
1513
1514``VHOST_USER_SET_INFLIGHT_FD``
1515  :id: 32
1516  :equivalent ioctl: N/A
1517  :request payload: inflight description
1518  :reply payload: N/A
1519
1520  When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1521  been successfully negotiated, this message is submitted by the front-end to
1522  send the shared inflight buffer back to the back-end so that the back-end
1523  could get inflight I/O after a crash or restart.
1524
1525``VHOST_USER_GPU_SET_SOCKET``
1526  :id: 33
1527  :equivalent ioctl: N/A
1528  :request payload: N/A
1529  :reply payload: N/A
1530
1531  Sets the GPU protocol socket file descriptor, which is passed as
1532  ancillary data. The GPU protocol is used to inform the front-end of
1533  rendering state and updates. See vhost-user-gpu.rst for details.
1534
1535``VHOST_USER_RESET_DEVICE``
1536  :id: 34
1537  :equivalent ioctl: N/A
1538  :request payload: N/A
1539  :reply payload: N/A
1540
1541  Ask the vhost user back-end to disable all rings and reset all
1542  internal device state to the initial state, ready to be
1543  reinitialized. The back-end retains ownership of the device
1544  throughout the reset operation.
1545
1546  Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
1547  feature is set by the back-end.
1548
1549``VHOST_USER_VRING_KICK``
1550  :id: 35
1551  :equivalent ioctl: N/A
1552  :request payload: vring state description
1553  :reply payload: N/A
1554
1555  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1556  feature has been successfully negotiated, this message may be
1557  submitted by the front-end to indicate that a buffer was added to
1558  the vring instead of signalling it using the vring's kick file
1559  descriptor or having the back-end rely on polling.
1560
1561  The state.num field is currently reserved and must be set to 0.
1562
1563``VHOST_USER_GET_MAX_MEM_SLOTS``
1564  :id: 36
1565  :equivalent ioctl: N/A
1566  :request payload: N/A
1567  :reply payload: u64
1568
1569  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1570  feature has been successfully negotiated, this message is submitted
1571  by the front-end to the back-end. The back-end should return the message with a
1572  u64 payload containing the maximum number of memory slots for
1573  QEMU to expose to the guest. The value returned by the back-end
1574  will be capped at the maximum number of ram slots which can be
1575  supported by the target platform.
1576
1577``VHOST_USER_ADD_MEM_REG``
1578  :id: 37
1579  :equivalent ioctl: N/A
1580  :request payload: N/A
1581  :reply payload: single memory region description
1582
1583  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1584  feature has been successfully negotiated, this message is submitted
1585  by the front-end to the back-end. The message payload contains a memory
1586  region descriptor struct, describing a region of guest memory which
1587  the back-end device must map in. When the
1588  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1589  been successfully negotiated, along with the
1590  ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
1591  update the memory tables of the back-end device.
1592
1593  Exactly one file descriptor from which the memory is mapped is
1594  passed in the ancillary data.
1595
1596  In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
1597  replies with the bases of the memory mapped region to the front-end.
1598  For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``.
1599  They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly.
1600
1601``VHOST_USER_REM_MEM_REG``
1602  :id: 38
1603  :equivalent ioctl: N/A
1604  :request payload: N/A
1605  :reply payload: single memory region description
1606
1607  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1608  feature has been successfully negotiated, this message is submitted
1609  by the front-end to the back-end. The message payload contains a memory
1610  region descriptor struct, describing a region of guest memory which
1611  the back-end device must unmap. When the
1612  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1613  been successfully negotiated, along with the
1614  ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
1615  update the memory tables of the back-end device.
1616
1617  The memory region to be removed is identified by its guest address,
1618  user address and size. The mmap offset is ignored.
1619
1620  No file descriptors SHOULD be passed in the ancillary data. For
1621  compatibility with existing incorrect implementations, the back-end MAY
1622  accept messages with one file descriptor. If a file descriptor is
1623  passed, the back-end MUST close it without using it otherwise.
1624
1625``VHOST_USER_SET_STATUS``
1626  :id: 39
1627  :equivalent ioctl: VHOST_VDPA_SET_STATUS
1628  :request payload: ``u64``
1629  :reply payload: N/A
1630
1631  When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1632  successfully negotiated, this message is submitted by the front-end to
1633  notify the back-end with updated device status as defined in the Virtio
1634  specification.
1635
1636``VHOST_USER_GET_STATUS``
1637  :id: 40
1638  :equivalent ioctl: VHOST_VDPA_GET_STATUS
1639  :request payload: N/A
1640  :reply payload: ``u64``
1641
1642  When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1643  successfully negotiated, this message is submitted by the front-end to
1644  query the back-end for its device status as defined in the Virtio
1645  specification.
1646
1647``VHOST_USER_GET_SHARED_OBJECT``
1648  :id: 41
1649  :equivalent ioctl: N/A
1650  :request payload: ``struct VhostUserShared``
1651  :reply payload: dmabuf fd
1652
1653  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1654  feature has been successfully negotiated, and the UUID is found
1655  in the exporters cache, this message is submitted by the front-end
1656  to retrieve a given dma-buf fd from a given back-end, determined by
1657  the requested UUID. Back-end will reply passing the fd when the operation
1658  is successful, or no fd otherwise.
1659
1660``VHOST_USER_SET_DEVICE_STATE_FD``
1661  :id: 42
1662  :equivalent ioctl: N/A
1663  :request payload: device state transfer parameters
1664  :reply payload: ``u64``
1665
1666  Front-end and back-end negotiate a channel over which to transfer the
1667  back-end's internal state during migration.  Either side (front-end or
1668  back-end) may create the channel.  The nature of this channel is not
1669  restricted or defined in this document, but whichever side creates it
1670  must create a file descriptor that is provided to the respectively
1671  other side, allowing access to the channel.  This FD must behave as
1672  follows:
1673
1674  * For the writing end, it must allow writing the whole back-end state
1675    sequentially.  Closing the file descriptor signals the end of
1676    transfer.
1677
1678  * For the reading end, it must allow reading the whole back-end state
1679    sequentially.  The end of file signals the end of the transfer.
1680
1681  For example, the channel may be a pipe, in which case the two ends of
1682  the pipe fulfill these requirements respectively.
1683
1684  Initially, the front-end creates a channel along with such an FD.  It
1685  passes the FD to the back-end as ancillary data of a
1686  ``VHOST_USER_SET_DEVICE_STATE_FD`` message.  The back-end may create a
1687  different transfer channel, passing the respective FD back to the
1688  front-end as ancillary data of the reply.  If so, the front-end must
1689  then discard its channel and use the one provided by the back-end.
1690
1691  Whether the back-end should decide to use its own channel is decided
1692  based on efficiency: If the channel is a pipe, both ends will most
1693  likely need to copy data into and out of it.  Any channel that allows
1694  for more efficient processing on at least one end, e.g. through
1695  zero-copy, is considered more efficient and thus preferred.  If the
1696  back-end can provide such a channel, it should decide to use it.
1697
1698  The request payload contains parameters for the subsequent data
1699  transfer, as described in the :ref:`Migrating back-end state
1700  <migrating_backend_state>` section.
1701
1702  The value returned is both an indication for success, and whether a
1703  file descriptor for a back-end-provided channel is returned: Bits 0–7
1704  are 0 on success, and non-zero on error.  Bit 8 is the invalid FD
1705  flag; this flag is set when there is no file descriptor returned.
1706  When this flag is not set, the front-end must use the returned file
1707  descriptor as its end of the transfer channel.  The back-end must not
1708  both indicate an error and return a file descriptor.
1709
1710  Using this function requires prior negotiation of the
1711  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
1712
1713``VHOST_USER_CHECK_DEVICE_STATE``
1714  :id: 43
1715  :equivalent ioctl: N/A
1716  :request payload: N/A
1717  :reply payload: ``u64``
1718
1719  After transferring the back-end's internal state during migration (see
1720  the :ref:`Migrating back-end state <migrating_backend_state>`
1721  section), check whether the back-end was able to successfully fully
1722  process the state.
1723
1724  The value returned indicates success or error; 0 is success, any
1725  non-zero value is an error.
1726
1727  Using this function requires prior negotiation of the
1728  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
1729
1730Back-end message types
1731----------------------
1732
1733For this type of message, the request is sent by the back-end and the reply
1734is sent by the front-end.
1735
1736``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``)
1737  :id: 1
1738  :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1739  :request payload: ``struct vhost_iotlb_msg``
1740  :reply payload: N/A
1741
1742  Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1743  The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
1744  access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
1745  negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
1746  must respond with zero when operation is successfully completed, or
1747  non-zero otherwise.  This request should be send only when
1748  ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
1749  negotiated.
1750
1751``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``)
1752  :id: 2
1753  :equivalent ioctl: N/A
1754  :request payload: N/A
1755  :reply payload: N/A
1756
1757  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
1758  back-end sends such messages to notify that the virtio device's
1759  configuration space has changed, for those host devices which can
1760  support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
1761  message to the back-end to get the latest content. If
1762  ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
1763  ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
1764  operation is successfully completed, or non-zero otherwise.
1765
1766``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``)
1767  :id: 3
1768  :equivalent ioctl: N/A
1769  :request payload: vring area description
1770  :reply payload: N/A
1771
1772  Sets host notifier for a specified queue. The queue index is
1773  contained in the ``u64`` field of the vring area description. The
1774  host notifier is described by the file descriptor (typically it's a
1775  VFIO device fd) which is passed as ancillary data and the size
1776  (which is mmap size and should be the same as host page size) and
1777  offset (which is mmap offset) carried in the vring area
1778  description. QEMU can mmap the file descriptor based on the size and
1779  offset to get a memory range. Registering a host notifier means
1780  mapping this memory range to the VM as the specified queue's notify
1781  MMIO region. The back-end sends this request to tell QEMU to de-register
1782  the existing notifier if any and register the new notifier if the
1783  request is sent with a file descriptor.
1784
1785  This request should be sent only when
1786  ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
1787  successfully negotiated.
1788
1789``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``)
1790  :id: 4
1791  :equivalent ioctl: N/A
1792  :request payload: vring state description
1793  :reply payload: N/A
1794
1795  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1796  feature has been successfully negotiated, this message may be
1797  submitted by the back-end to indicate that a buffer was used from
1798  the vring instead of signalling this using the vring's call file
1799  descriptor or having the front-end relying on polling.
1800
1801  The state.num field is currently reserved and must be set to 0.
1802
1803``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``)
1804  :id: 5
1805  :equivalent ioctl: N/A
1806  :request payload: vring state description
1807  :reply payload: N/A
1808
1809  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1810  feature has been successfully negotiated, this message may be
1811  submitted by the back-end to indicate that an error occurred on the
1812  specific vring, instead of signalling the error file descriptor
1813  set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
1814
1815  The state.num field is currently reserved and must be set to 0.
1816
1817``VHOST_USER_BACKEND_SHARED_OBJECT_ADD``
1818  :id: 6
1819  :equivalent ioctl: N/A
1820  :request payload: ``struct VhostUserShared``
1821  :reply payload: N/A
1822
1823  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1824  feature has been successfully negotiated, this message can be submitted
1825  by the backends to add themselves as exporters to the virtio shared lookup
1826  table. The back-end device gets associated with a UUID in the shared table.
1827  The back-end is responsible of keeping its own table with exported dma-buf fds.
1828  When another back-end tries to import the resource associated with the UUID,
1829  it will send a message to the front-end, which will act as a proxy to the
1830  exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and
1831  the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must
1832  respond with zero when operation is successfully completed, or non-zero
1833  otherwise.
1834
1835``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE``
1836  :id: 7
1837  :equivalent ioctl: N/A
1838  :request payload: ``struct VhostUserShared``
1839  :reply payload: N/A
1840
1841  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1842  feature has been successfully negotiated, this message can be submitted
1843  by the backend to remove themselves from to the virtio-dmabuf shared
1844  table API. Only the back-end owning the entry (i.e., the one that first added
1845  it) will have permission to remove it. Otherwise, the message is ignored.
1846  The shared table will remove the back-end device associated with
1847  the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the
1848  back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond
1849  with zero when operation is successfully completed, or non-zero otherwise.
1850
1851``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP``
1852  :id: 8
1853  :equivalent ioctl: N/A
1854  :request payload: ``struct VhostUserShared``
1855  :reply payload: dmabuf fd and ``u64``
1856
1857  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1858  feature has been successfully negotiated, this message can be submitted
1859  by the backends to retrieve a given dma-buf fd from the virtio-dmabuf
1860  shared table given a UUID. Frontend will reply passing the fd and a zero
1861  when the operation is successful, or non-zero otherwise. Note that if the
1862  operation fails, no fd is sent to the backend.
1863
1864.. _reply_ack:
1865
1866VHOST_USER_PROTOCOL_F_REPLY_ACK
1867-------------------------------
1868
1869The original vhost-user specification only demands replies for certain
1870commands. This differs from the vhost protocol implementation where
1871commands are sent over an ``ioctl()`` call and block until the back-end
1872has completed.
1873
1874With this protocol extension negotiated, the sender (QEMU) can set the
1875``need_reply`` [Bit 3] flag to any command. This indicates that the
1876back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
1877or failure. The payload should be set to zero on success or non-zero
1878on failure, unless the message already has an explicit reply body.
1879
1880The reply payload gives QEMU a deterministic indication of the result
1881of the command. Today, QEMU is expected to terminate the main vhost-user
1882loop upon receiving such errors. In future, qemu could be taught to be more
1883resilient for selective requests.
1884
1885For the message types that already solicit a reply from the back-end,
1886the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
1887being set brings no behavioural change. (See the Communication_
1888section for details.)
1889
1890.. _backend_conventions:
1891
1892Backend program conventions
1893===========================
1894
1895vhost-user back-ends can provide various devices & services and may
1896need to be configured manually depending on the use case. However, it
1897is a good idea to follow the conventions listed here when
1898possible. Users, QEMU or libvirt, can then rely on some common
1899behaviour to avoid heterogeneous configuration and management of the
1900back-end programs and facilitate interoperability.
1901
1902Each back-end installed on a host system should come with at least one
1903JSON file that conforms to the vhost-user.json schema. Each file
1904informs the management applications about the back-end type, and binary
1905location. In addition, it defines rules for management apps for
1906picking the highest priority back-end when multiple match the search
1907criteria (see ``@VhostUserBackend`` documentation in the schema file).
1908
1909If the back-end is not capable of enabling a requested feature on the
1910host (such as 3D acceleration with virgl), or the initialization
1911failed, the back-end should fail to start early and exit with a status
1912!= 0. It may also print a message to stderr for further details.
1913
1914The back-end program must not daemonize itself, but it may be
1915daemonized by the management layer. It may also have a restricted
1916access to the system.
1917
1918File descriptors 0, 1 and 2 will exist, and have regular
1919stdin/stdout/stderr usage (they may have been redirected to /dev/null
1920by the management layer, or to a log handler).
1921
1922The back-end program must end (as quickly and cleanly as possible) when
1923the SIGTERM signal is received. Eventually, it may receive SIGKILL by
1924the management layer after a few seconds.
1925
1926The following command line options have an expected behaviour. They
1927are mandatory, unless explicitly said differently:
1928
1929--socket-path=PATH
1930
1931  This option specify the location of the vhost-user Unix domain socket.
1932  It is incompatible with --fd.
1933
1934--fd=FDNUM
1935
1936  When this argument is given, the back-end program is started with the
1937  vhost-user socket as file descriptor FDNUM. It is incompatible with
1938  --socket-path.
1939
1940--print-capabilities
1941
1942  Output to stdout the back-end capabilities in JSON format, and then
1943  exit successfully. Other options and arguments should be ignored, and
1944  the back-end program should not perform its normal function.  The
1945  capabilities can be reported dynamically depending on the host
1946  capabilities.
1947
1948The JSON output is described in the ``vhost-user.json`` schema, by
1949```@VHostUserBackendCapabilities``.  Example:
1950
1951.. code:: json
1952
1953  {
1954    "type": "foo",
1955    "features": [
1956      "feature-a",
1957      "feature-b"
1958    ]
1959  }
1960
1961vhost-user-input
1962----------------
1963
1964Command line options:
1965
1966--evdev-path=PATH
1967
1968  Specify the linux input device.
1969
1970  (optional)
1971
1972--no-grab
1973
1974  Do no request exclusive access to the input device.
1975
1976  (optional)
1977
1978vhost-user-gpu
1979--------------
1980
1981Command line options:
1982
1983--render-node=PATH
1984
1985  Specify the GPU DRM render node.
1986
1987  (optional)
1988
1989--virgl
1990
1991  Enable virgl rendering support.
1992
1993  (optional)
1994
1995vhost-user-blk
1996--------------
1997
1998Command line options:
1999
2000--blk-file=PATH
2001
2002  Specify block device or file path.
2003
2004  (optional)
2005
2006--read-only
2007
2008  Enable read-only.
2009
2010  (optional)
2011