xref: /openbmc/qemu/docs/interop/vhost-user.rst (revision 7e10ce2706e2dbed6a59825dc0286b3810395afa)
1.. _vhost_user_proto:
2
3===================
4Vhost-user Protocol
5===================
6
7..
8  Copyright 2014 Virtual Open Systems Sarl.
9  Copyright 2019 Intel Corporation
10  Licence: This work is licensed under the terms of the GNU GPL,
11           version 2 or later. See the COPYING file in the top-level
12           directory.
13
14.. contents:: Table of Contents
15
16Introduction
17============
18
19This protocol is aiming to complement the ``ioctl`` interface used to
20control the vhost implementation in the Linux kernel. It implements
21the control plane needed to establish virtqueue sharing with a user
22space process on the same host. It uses communication over a Unix
23domain socket to share file descriptors in the ancillary data of the
24message.
25
26The protocol defines 2 sides of the communication, *front-end* and
27*back-end*. The *front-end* is the application that shares its virtqueues, in
28our case QEMU. The *back-end* is the consumer of the virtqueues.
29
30In the current implementation QEMU is the *front-end*, and the *back-end*
31is the external process consuming the virtio queues, for example a
32software Ethernet switch running in user space, such as Snabbswitch,
33or a block device back-end processing read & write to a virtual
34disk. In order to facilitate interoperability between various back-end
35implementations, it is recommended to follow the :ref:`Backend program
36conventions <backend_conventions>`.
37
38The *front-end* and *back-end* can be either a client (i.e. connecting) or
39server (listening) in the socket communication.
40
41Support for platforms other than Linux
42--------------------------------------
43
44While vhost-user was initially developed targeting Linux, nowadays it
45is supported on any platform that provides the following features:
46
47- A way for requesting shared memory represented by a file descriptor
48  so it can be passed over a UNIX domain socket and then mapped by the
49  other process.
50
51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
52  exchange messages through it, including ancillary data when needed.
53
54- Either eventfd or pipe/pipe2. On platforms where eventfd is not
55  available, QEMU will automatically fall back to pipe2 or, as a last
56  resort, pipe. Each file descriptor will be used for receiving or
57  sending events by reading or writing (respectively) an 8-byte value
58  to the corresponding it. The 8-value itself has no meaning and
59  should not be interpreted.
60
61Message Specification
62=====================
63
64.. Note:: All numbers are in the machine native byte order.
65
66A vhost-user message consists of 3 header fields and a payload.
67
68+---------+-------+------+---------+
69| request | flags | size | payload |
70+---------+-------+------+---------+
71
72Header
73------
74
75:request: 32-bit type of the request
76
77:flags: 32-bit bit field
78
79- Lower 2 bits are the version (currently 0x01)
80- Bit 2 is the reply flag - needs to be sent on each reply from the back-end
81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
82  details.
83
84:size: 32-bit size of the payload
85
86Payload
87-------
88
89Depending on the request type, **payload** can be:
90
91A single 64-bit integer
92^^^^^^^^^^^^^^^^^^^^^^^
93
94+-----+
95| u64 |
96+-----+
97
98:u64: a 64-bit unsigned integer
99
100A vring state description
101^^^^^^^^^^^^^^^^^^^^^^^^^
102
103+-------+-----+
104| index | num |
105+-------+-----+
106
107:index: a 32-bit index
108
109:num: a 32-bit number
110
111A vring descriptor index for split virtqueues
112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113
114+-------------+---------------------+
115| vring index | index in avail ring |
116+-------------+---------------------+
117
118:vring index: 32-bit index of the respective virtqueue
119
120:index in avail ring: 32-bit value, of which currently only the lower 16
121  bits are used:
122
123  - Bits 0–15: Index of the next *Available Ring* descriptor that the
124    back-end will process.  This is a free-running index that is not
125    wrapped by the ring size.
126  - Bits 16–31: Reserved (set to zero)
127
128Vring descriptor indices for packed virtqueues
129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
130
131+-------------+--------------------+
132| vring index | descriptor indices |
133+-------------+--------------------+
134
135:vring index: 32-bit index of the respective virtqueue
136
137:descriptor indices: 32-bit value:
138
139  - Bits 0–14: Index of the next *Available Ring* descriptor that the
140    back-end will process.  This is a free-running index that is not
141    wrapped by the ring size.
142  - Bit 15: Driver (Available) Ring Wrap Counter
143  - Bits 16–30: Index of the entry in the *Used Ring* where the back-end
144    will place the next descriptor.  This is a free-running index that
145    is not wrapped by the ring size.
146  - Bit 31: Device (Used) Ring Wrap Counter
147
148A vring address description
149^^^^^^^^^^^^^^^^^^^^^^^^^^^
150
151+-------+-------+------------+------+-----------+-----+
152| index | flags | descriptor | used | available | log |
153+-------+-------+------------+------+-----------+-----+
154
155:index: a 32-bit vring index
156
157:flags: a 32-bit vring flags
158
159:descriptor: a 64-bit ring address of the vring descriptor table
160
161:used: a 64-bit ring address of the vring used ring
162
163:available: a 64-bit ring address of the vring available ring
164
165:log: a 64-bit guest address for logging
166
167Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
168been negotiated. Otherwise it is a user address.
169
170Memory region description
171^^^^^^^^^^^^^^^^^^^^^^^^^
172
173+---------------+------+--------------+-------------+
174| guest address | size | user address | mmap offset |
175+---------------+------+--------------+-------------+
176
177:guest address: a 64-bit guest address of the region
178
179:size: a 64-bit size
180
181:user address: a 64-bit user address
182
183:mmap offset: 64-bit offset where region starts in the mapped memory
184
185When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
186successfully negotiated, the memory region description contains two extra
187fields at the end.
188
189+---------------+------+--------------+-------------+----------------+-------+
190| guest address | size | user address | mmap offset | xen mmap flags | domid |
191+---------------+------+--------------+-------------+----------------+-------+
192
193:xen mmap flags: 32-bit bit field
194
195- Bit 0 is set for Xen foreign memory mapping.
196- Bit 1 is set for Xen grant memory mapping.
197- Bit 8 is set if the memory region can not be mapped in advance, and memory
198  areas within this region must be mapped / unmapped only when required by the
199  back-end. The back-end shouldn't try to map the entire region at once, as the
200  front-end may not allow it. The back-end should rather map only the required
201  amount of memory at once and unmap it after it is used.
202
203:domid: a 32-bit Xen hypervisor specific domain id.
204
205Single memory region description
206^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
207
208+---------+--------+
209| padding | region |
210+---------+--------+
211
212:padding: 64-bit
213
214A region is represented by Memory region description.
215
216Multiple Memory regions description
217^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
218
219+-------------+---------+---------+-----+---------+
220| num regions | padding | region0 | ... | region7 |
221+-------------+---------+---------+-----+---------+
222
223:num regions: a 32-bit number of regions
224
225:padding: 32-bit
226
227A region is represented by Memory region description.
228
229Log description
230^^^^^^^^^^^^^^^
231
232+----------+------------+
233| log size | log offset |
234+----------+------------+
235
236:log size: size of area used for logging
237
238:log offset: offset from start of supplied file descriptor where
239             logging starts (i.e. where guest address 0 would be
240             logged)
241
242An IOTLB message
243^^^^^^^^^^^^^^^^
244
245+------+------+--------------+-------------------+------+
246| iova | size | user address | permissions flags | type |
247+------+------+--------------+-------------------+------+
248
249:iova: a 64-bit I/O virtual address programmed by the guest
250
251:size: a 64-bit size
252
253:user address: a 64-bit user address
254
255:permissions flags: an 8-bit value:
256  - 0: No access
257  - 1: Read access
258  - 2: Write access
259  - 3: Read/Write access
260
261:type: an 8-bit IOTLB message type:
262  - 1: IOTLB miss
263  - 2: IOTLB update
264  - 3: IOTLB invalidate
265  - 4: IOTLB access fail
266
267Virtio device config space
268^^^^^^^^^^^^^^^^^^^^^^^^^^
269
270+--------+------+-------+---------+
271| offset | size | flags | payload |
272+--------+------+-------+---------+
273
274:offset: a 32-bit offset of virtio device's configuration space
275
276:size: a 32-bit configuration space access size in bytes
277
278:flags: a 32-bit value:
279  - 0: Vhost front-end messages used for writable fields
280  - 1: Vhost front-end messages used for live migration
281
282:payload: Size bytes array holding the contents of the virtio
283          device's configuration space
284
285Vring area description
286^^^^^^^^^^^^^^^^^^^^^^
287
288+-----+------+--------+
289| u64 | size | offset |
290+-----+------+--------+
291
292:u64: a 64-bit integer contains vring index and flags
293
294:size: a 64-bit size of this area
295
296:offset: a 64-bit offset of this area from the start of the
297         supplied file descriptor
298
299Inflight description
300^^^^^^^^^^^^^^^^^^^^
301
302+-----------+-------------+------------+------------+
303| mmap size | mmap offset | num queues | queue size |
304+-----------+-------------+------------+------------+
305
306:mmap size: a 64-bit size of area to track inflight I/O
307
308:mmap offset: a 64-bit offset of this area from the start
309              of the supplied file descriptor
310
311:num queues: a 16-bit number of virtqueues
312
313:queue size: a 16-bit size of virtqueues
314
315VhostUserShared
316^^^^^^^^^^^^^^^
317
318+------+
319| UUID |
320+------+
321
322:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
323  two 16-bit values) are stored in big endian.
324
325Device state transfer parameters
326^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327
328+--------------------+-----------------+
329| transfer direction | migration phase |
330+--------------------+-----------------+
331
332:transfer direction: a 32-bit enum, describing the direction in which
333  the state is transferred:
334
335  - 0: Save: Transfer the state from the back-end to the front-end,
336    which happens on the source side of migration
337  - 1: Load: Transfer the state from the front-end to the back-end,
338    which happens on the destination side of migration
339
340:migration phase: a 32-bit enum, describing the state in which the VM
341  guest and devices are:
342
343  - 0: Stopped (in the period after the transfer of memory-mapped
344    regions before switch-over to the destination): The VM guest is
345    stopped, and the vhost-user device is suspended (see
346    :ref:`Suspended device state <suspended_device_state>`).
347
348  In the future, additional phases might be added e.g. to allow
349  iterative migration while the device is running.
350
351C structure
352-----------
353
354In QEMU the vhost-user message is implemented with the following struct:
355
356.. code:: c
357
358  typedef struct VhostUserMsg {
359      VhostUserRequest request;
360      uint32_t flags;
361      uint32_t size;
362      union {
363          uint64_t u64;
364          struct vhost_vring_state state;
365          struct vhost_vring_addr addr;
366          VhostUserMemory memory;
367          VhostUserLog log;
368          struct vhost_iotlb_msg iotlb;
369          VhostUserConfig config;
370          VhostUserVringArea area;
371          VhostUserInflight inflight;
372      };
373  } QEMU_PACKED VhostUserMsg;
374
375Communication
376=============
377
378The protocol for vhost-user is based on the existing implementation of
379vhost for the Linux Kernel. Most messages that can be sent via the
380Unix domain socket implementing vhost-user have an equivalent ioctl to
381the kernel implementation.
382
383The communication consists of the *front-end* sending message requests and
384the *back-end* sending message replies. Most of the requests don't require
385replies. Here is a list of the ones that do:
386
387* ``VHOST_USER_GET_FEATURES``
388* ``VHOST_USER_GET_PROTOCOL_FEATURES``
389* ``VHOST_USER_GET_VRING_BASE``
390* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
391* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
392
393.. seealso::
394
395   :ref:`REPLY_ACK <reply_ack>`
396       The section on ``REPLY_ACK`` protocol extension.
397
398There are several messages that the front-end sends with file descriptors passed
399in the ancillary data:
400
401* ``VHOST_USER_ADD_MEM_REG``
402* ``VHOST_USER_SET_MEM_TABLE``
403* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
404* ``VHOST_USER_SET_LOG_FD``
405* ``VHOST_USER_SET_VRING_KICK``
406* ``VHOST_USER_SET_VRING_CALL``
407* ``VHOST_USER_SET_VRING_ERR``
408* ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
409* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
410* ``VHOST_USER_SET_DEVICE_STATE_FD``
411
412If *front-end* is unable to send the full message or receives a wrong
413reply it will close the connection. An optional reconnection mechanism
414can be implemented.
415
416If *back-end* detects some error such as incompatible features, it may also
417close the connection. This should only happen in exceptional circumstances.
418
419Any protocol extensions are gated by protocol feature bits, which
420allows full backwards compatibility on both front-end and back-end.  As
421older back-ends don't support negotiating protocol features, a feature
422bit was dedicated for this purpose::
423
424  #define VHOST_USER_F_PROTOCOL_FEATURES 30
425
426Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
427bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits
428<https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
429VIRTIO devices do not advertise this feature bit and therefore VIRTIO
430drivers cannot negotiate it.
431
432This reserved feature bit was reused by the vhost-user protocol to add
433vhost-user protocol feature negotiation in a backwards compatible
434fashion. Old vhost-user front-end and back-end implementations continue to
435work even though they are not aware of vhost-user protocol feature
436negotiation.
437
438Ring states
439-----------
440
441Rings have two independent states: started/stopped, and enabled/disabled.
442
443* While a ring is stopped, the back-end must not process the ring at
444  all, regardless of whether it is enabled or disabled.  The
445  enabled/disabled state should still be tracked, though, so it can come
446  into effect once the ring is started.
447
448* started and disabled: The back-end must process the ring without
449  causing any side effects.  For example, for a networking device,
450  in the disabled state the back-end must not supply any new RX packets,
451  but must process and discard any TX packets.
452
453* started and enabled: The back-end must process the ring normally, i.e.
454  process all requests and execute them.
455
456Each ring is initialized in a stopped and disabled state.  The back-end
457must start a ring upon receiving a kick (that is, detecting that file
458descriptor is readable) on the descriptor specified by
459``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message
460``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving
461``VHOST_USER_GET_VRING_BASE``.
462
463Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
464
465In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from
466the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the
467back-end must enable all rings immediately.
468
469While processing the rings (whether they are enabled or not), the back-end
470must support changing some configuration aspects on the fly.
471
472.. _suspended_device_state:
473
474Suspended device state
475^^^^^^^^^^^^^^^^^^^^^^
476
477While all vrings are stopped, the device is *suspended*.  In addition to
478not processing any vring (because they are stopped), the device must:
479
480* not write to any guest memory regions,
481* not send any notifications to the guest,
482* not send any messages to the front-end,
483* still process and reply to messages from the front-end.
484
485Multiple queue support
486----------------------
487
488Many devices have a fixed number of virtqueues.  In this case the front-end
489already knows the number of available virtqueues without communicating with the
490back-end.
491
492Some devices do not have a fixed number of virtqueues.  Instead the maximum
493number of virtqueues is chosen by the back-end.  The number can depend on host
494resource availability or back-end implementation details.  Such devices are called
495multiple queue devices.
496
497Multiple queue support allows the back-end to advertise the maximum number of
498queues.  This is treated as a protocol extension, hence the back-end has to
499implement protocol features first. The multiple queues feature is supported
500only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
501
502The max number of queues the back-end supports can be queried with message
503``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
504queues is bigger than that.
505
506As all queues share one connection, the front-end uses a unique index for each
507queue in the sent message to identify a specified queue.
508
509The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
510vhost-user-net has historically automatically enabled the first queue pair.
511
512Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
513feature, even for devices with a fixed number of virtqueues, since it is simple
514to implement and offers a degree of introspection.
515
516Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
517devices with a fixed number of virtqueues.  Only true multiqueue devices
518require this protocol feature.
519
520Migration
521---------
522
523During live migration, the front-end may need to track the modifications
524the back-end makes to the memory mapped regions. The front-end should mark
525the dirty pages in a log. Once it complies to this logging, it may
526declare the ``VHOST_F_LOG_ALL`` vhost feature.
527
528To start/stop logging of data/used ring writes, the front-end may send
529messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
530``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
531flags set to 1/0, respectively.
532
533All the modifications to memory pointed by vring "descriptor" should
534be marked. Modifications to "used" vring should be marked if
535``VHOST_VRING_F_LOG`` is part of ring's flags.
536
537Dirty pages are of size::
538
539  #define VHOST_LOG_PAGE 0x1000
540
541The log memory fd is provided in the ancillary data of
542``VHOST_USER_SET_LOG_BASE`` message when the back-end has
543``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
544
545The size of the log is supplied as part of ``VhostUserMsg`` which
546should be large enough to cover all known guest addresses. Log starts
547at the supplied offset in the supplied file descriptor.  The log
548covers from address 0 to the maximum of guest regions. In pseudo-code,
549to mark page at ``addr`` as dirty::
550
551  page = addr / VHOST_LOG_PAGE
552  log[page / 8] |= 1 << page % 8
553
554Where ``addr`` is the guest physical address.
555
556Use atomic operations, as the log may be concurrently manipulated.
557
558Note that when logging modifications to the used ring (when
559``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
560be used to calculate the log offset: the write to first byte of the
561used ring is logged at this offset from log start. Also note that this
562value might be outside the legal guest physical address range
563(i.e. does not have to be covered by the ``VhostUserMemory`` table), but
564the bit offset of the last byte of the ring must fall within the size
565supplied by ``VhostUserLog``.
566
567``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
568ancillary data, it may be used to inform the front-end that the log has
569been modified.
570
571Once the source has finished migration, rings will be stopped by the
572source (:ref:`Suspended device state <suspended_device_state>`). No
573further update must be done before rings are restarted.
574
575In postcopy migration the back-end is started before all the memory has
576been received from the source host, and care must be taken to avoid
577accessing pages that have yet to be received.  The back-end opens a
578'userfault'-fd and registers the memory with it; this fd is then
579passed back over to the front-end.  The front-end services requests on the
580userfaultfd for pages that are accessed and when the page is available
581it performs WAKE ioctl's on the userfaultfd to wake the stalled
582back-end.  The front-end indicates support for this via the
583``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
584
585.. _migrating_backend_state:
586
587Migrating back-end state
588^^^^^^^^^^^^^^^^^^^^^^^^
589
590Migrating device state involves transferring the state from one
591back-end, called the source, to another back-end, called the
592destination.  After migration, the destination transparently resumes
593operation without requiring the driver to re-initialize the device at
594the VIRTIO level.  If the migration fails, then the source can
595transparently resume operation until another migration attempt is made.
596
597Generally, the front-end is connected to a virtual machine guest (which
598contains the driver), which has its own state to transfer between source
599and destination, and therefore will have an implementation-specific
600mechanism to do so.  The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
601provides functionality to have the front-end include the back-end's
602state in this transfer operation so the back-end does not need to
603implement its own mechanism, and so the virtual machine may have its
604complete state, including vhost-user devices' states, contained within a
605single stream of data.
606
607To do this, the back-end state is transferred from back-end to front-end
608on the source side, and vice versa on the destination side.  This
609transfer happens over a channel that is negotiated using the
610``VHOST_USER_SET_DEVICE_STATE_FD`` message.  This message has two
611parameters:
612
613* Direction of transfer: On the source, the data is saved, transferring
614  it from the back-end to the front-end.  On the destination, the data
615  is loaded, transferring it from the front-end to the back-end.
616
617* Migration phase: Currently, the only supported phase is the period
618  after the transfer of memory-mapped regions before switch-over to the
619  destination, when both the source and destination devices are
620  suspended (:ref:`Suspended device state <suspended_device_state>`).
621  In the future, additional phases might be supported to allow iterative
622  migration while the device is running.
623
624The nature of the channel is implementation-defined, but it must
625generally behave like a pipe: The writing end will write all the data it
626has into it, signalling the end of data by closing its end.  The reading
627end must read all of this data (until encountering the end of file) and
628process it.
629
630* When saving, the writing end is the source back-end, and the reading
631  end is the source front-end.  After reading the state data from the
632  channel, the source front-end must transfer it to the destination
633  front-end through an implementation-defined mechanism.
634
635* When loading, the writing end is the destination front-end, and the
636  reading end is the destination back-end.  After reading the state data
637  from the channel, the destination back-end must deserialize its
638  internal state from that data and set itself up to allow the driver to
639  seamlessly resume operation on the VIRTIO level.
640
641Seamlessly resuming operation means that the migration must be
642transparent to the guest driver, which operates on the VIRTIO level.
643This driver will not perform any re-initialization steps, but continue
644to use the device as if no migration had occurred.  The vhost-user
645front-end, however, will re-initialize the vhost state on the
646destination, following the usual protocol for establishing a connection
647to a vhost-user back-end: This includes, for example, setting up memory
648mappings and kick and call FDs as necessary, negotiating protocol
649features, or setting the initial vring base indices (to the same value
650as on the source side, so that operation can resume).
651
652Both on the source and on the destination side, after the respective
653front-end has seen all data transferred (when the transfer FD has been
654closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
655verify that data transfer was successful in the back-end, too.  The
656back-end responds once it knows whether the transfer and processing was
657successful or not.
658
659Memory access
660-------------
661
662The front-end sends a list of vhost memory regions to the back-end using the
663``VHOST_USER_SET_MEM_TABLE`` message.  Each region has two base
664addresses: a guest address and a user address.
665
666Messages contain guest addresses and/or user addresses to reference locations
667within the shared memory.  The mapping of these addresses works as follows.
668
669User addresses map to the vhost memory region containing that user address.
670
671When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
672
673* Guest addresses map to the vhost memory region containing that guest
674  address.
675
676When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
677
678* Guest addresses are also called I/O virtual addresses (IOVAs).  They are
679  translated to user addresses via the IOTLB.
680
681* The vhost memory region guest address is not used.
682
683IOMMU support
684-------------
685
686When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
687front-end sends IOTLB entries update & invalidation by sending
688``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
689vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
690has to be filled with the update message type (2), the I/O virtual
691address, the size, the user virtual address, and the permissions
692flags. Addresses and size must be within vhost memory regions set via
693the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
694``iotlb`` payload has to be filled with the invalidation message type
695(3), the I/O virtual address and the size. On success, the back-end is
696expected to reply with a zero payload, non-zero otherwise.
697
698The back-end relies on the back-end communication channel (see :ref:`Back-end
699communication <backend_communication>` section below) to send IOTLB miss
700and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG``
701requests to the front-end with a ``struct vhost_iotlb_msg`` as
702payload. For miss events, the iotlb payload has to be filled with the
703miss message type (1), the I/O virtual address and the permissions
704flags. For access failure event, the iotlb payload has to be filled
705with the access failure message type (4), the I/O virtual address and
706the permissions flags.  For synchronization purpose, the back-end may
707rely on the reply-ack feature, so the front-end may send a reply when
708operation is completed if the reply-ack feature is negotiated and
709back-ends requests a reply. For miss events, completed operation means
710either front-end sent an update message containing the IOTLB entry
711containing requested address and permission, or front-end sent nothing if
712the IOTLB miss message is invalid (invalid IOVA or permission).
713
714The front-end isn't expected to take the initiative to send IOTLB update
715messages, as the back-end sends IOTLB miss messages for the guest virtual
716memory areas it needs to access.
717
718.. _backend_communication:
719
720Back-end communication
721----------------------
722
723An optional communication channel is provided if the back-end declares
724``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the
725back-end to make requests to the front-end.
726
727The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data.
728
729A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
730using this fd communication channel.
731
732If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is
733negotiated, back-end can send file descriptors (at most 8 descriptors in
734each message) to front-end via ancillary data using this fd communication
735channel.
736
737Inflight I/O tracking
738---------------------
739
740To support reconnecting after restart or crash, back-end may need to
741resubmit inflight I/Os. If virtqueue is processed in order, we can
742easily achieve that by getting the inflight descriptors from
743descriptor table (split virtqueue) or descriptor ring (packed
744virtqueue). However, it can't work when we process descriptors
745out-of-order because some entries which store the information of
746inflight descriptors in available ring (split virtqueue) or descriptor
747ring (packed virtqueue) might be overridden by new entries. To solve
748this problem, the back-end need to allocate an extra buffer to store this
749information of inflight descriptors and share it with front-end for
750persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
751``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
752between front-end and back-end. And the format of this buffer is described
753below:
754
755+---------------+---------------+-----+---------------+
756| queue0 region | queue1 region | ... | queueN region |
757+---------------+---------------+-----+---------------+
758
759N is the number of available virtqueues. The back-end could get it from num
760queues field of ``VhostUserInflight``.
761
762For split virtqueue, queue region can be implemented as:
763
764.. code:: c
765
766  typedef struct DescStateSplit {
767      /* Indicate whether this descriptor is inflight or not.
768       * Only available for head-descriptor. */
769      uint8_t inflight;
770
771      /* Padding */
772      uint8_t padding[5];
773
774      /* Maintain a list for the last batch of used descriptors.
775       * Only available when batching is used for submitting */
776      uint16_t next;
777
778      /* Used to preserve the order of fetching available descriptors.
779       * Only available for head-descriptor. */
780      uint64_t counter;
781  } DescStateSplit;
782
783  typedef struct QueueRegionSplit {
784      /* The feature flags of this region. Now it's initialized to 0. */
785      uint64_t features;
786
787      /* The version of this region. It's 1 currently.
788       * Zero value indicates an uninitialized buffer */
789      uint16_t version;
790
791      /* The size of DescStateSplit array. It's equal to the virtqueue size.
792       * The back-end could get it from queue size field of VhostUserInflight. */
793      uint16_t desc_num;
794
795      /* The head of list that track the last batch of used descriptors. */
796      uint16_t last_batch_head;
797
798      /* Store the idx value of used ring */
799      uint16_t used_idx;
800
801      /* Used to track the state of each descriptor in descriptor table */
802      DescStateSplit desc[];
803  } QueueRegionSplit;
804
805To track inflight I/O, the queue region should be processed as follows:
806
807When receiving available buffers from the driver:
808
809#. Get the next available head-descriptor index from available ring, ``i``
810
811#. Set ``desc[i].counter`` to the value of global counter
812
813#. Increase global counter by 1
814
815#. Set ``desc[i].inflight`` to 1
816
817When supplying used buffers to the driver:
818
8191. Get corresponding used head-descriptor index, i
820
8212. Set ``desc[i].next`` to ``last_batch_head``
822
8233. Set ``last_batch_head`` to ``i``
824
825#. Steps 1,2,3 may be performed repeatedly if batching is possible
826
827#. Increase the ``idx`` value of used ring by the size of the batch
828
829#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
830
831#. Set ``used_idx`` to the ``idx`` value of used ring
832
833When reconnecting:
834
835#. If the value of ``used_idx`` does not match the ``idx`` value of
836   used ring (means the inflight field of ``DescStateSplit`` entries in
837   last batch may be incorrect),
838
839   a. Subtract the value of ``used_idx`` from the ``idx`` value of
840      used ring to get last batch size of ``DescStateSplit`` entries
841
842   #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
843      list which starts from ``last_batch_head``
844
845   #. Set ``used_idx`` to the ``idx`` value of used ring
846
847#. Resubmit inflight ``DescStateSplit`` entries in order of their
848   counter value
849
850For packed virtqueue, queue region can be implemented as:
851
852.. code:: c
853
854  typedef struct DescStatePacked {
855      /* Indicate whether this descriptor is inflight or not.
856       * Only available for head-descriptor. */
857      uint8_t inflight;
858
859      /* Padding */
860      uint8_t padding;
861
862      /* Link to the next free entry */
863      uint16_t next;
864
865      /* Link to the last entry of descriptor list.
866       * Only available for head-descriptor. */
867      uint16_t last;
868
869      /* The length of descriptor list.
870       * Only available for head-descriptor. */
871      uint16_t num;
872
873      /* Used to preserve the order of fetching available descriptors.
874       * Only available for head-descriptor. */
875      uint64_t counter;
876
877      /* The buffer id */
878      uint16_t id;
879
880      /* The descriptor flags */
881      uint16_t flags;
882
883      /* The buffer length */
884      uint32_t len;
885
886      /* The buffer address */
887      uint64_t addr;
888  } DescStatePacked;
889
890  typedef struct QueueRegionPacked {
891      /* The feature flags of this region. Now it's initialized to 0. */
892      uint64_t features;
893
894      /* The version of this region. It's 1 currently.
895       * Zero value indicates an uninitialized buffer */
896      uint16_t version;
897
898      /* The size of DescStatePacked array. It's equal to the virtqueue size.
899       * The back-end could get it from queue size field of VhostUserInflight. */
900      uint16_t desc_num;
901
902      /* The head of free DescStatePacked entry list */
903      uint16_t free_head;
904
905      /* The old head of free DescStatePacked entry list */
906      uint16_t old_free_head;
907
908      /* The used index of descriptor ring */
909      uint16_t used_idx;
910
911      /* The old used index of descriptor ring */
912      uint16_t old_used_idx;
913
914      /* Device ring wrap counter */
915      uint8_t used_wrap_counter;
916
917      /* The old device ring wrap counter */
918      uint8_t old_used_wrap_counter;
919
920      /* Padding */
921      uint8_t padding[7];
922
923      /* Used to track the state of each descriptor fetched from descriptor ring */
924      DescStatePacked desc[];
925  } QueueRegionPacked;
926
927To track inflight I/O, the queue region should be processed as follows:
928
929When receiving available buffers from the driver:
930
931#. Get the next available descriptor entry from descriptor ring, ``d``
932
933#. If ``d`` is head descriptor,
934
935   a. Set ``desc[old_free_head].num`` to 0
936
937   #. Set ``desc[old_free_head].counter`` to the value of global counter
938
939   #. Increase global counter by 1
940
941   #. Set ``desc[old_free_head].inflight`` to 1
942
943#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
944   ``free_head``
945
946#. Increase ``desc[old_free_head].num`` by 1
947
948#. Set ``desc[free_head].addr``, ``desc[free_head].len``,
949   ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
950   ``d.len``, ``d.flags``, ``d.id``
951
952#. Set ``free_head`` to ``desc[free_head].next``
953
954#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
955
956When supplying used buffers to the driver:
957
9581. Get corresponding used head-descriptor entry from descriptor ring,
959   ``d``
960
9612. Get corresponding ``DescStatePacked`` entry, ``e``
962
9633. Set ``desc[e.last].next`` to ``free_head``
964
9654. Set ``free_head`` to the index of ``e``
966
967#. Steps 1,2,3,4 may be performed repeatedly if batching is possible
968
969#. Increase ``used_idx`` by the size of the batch and update
970   ``used_wrap_counter`` if needed
971
972#. Update ``d.flags``
973
974#. Set the ``inflight`` field of each head ``DescStatePacked`` entry
975   in the batch to 0
976
977#. Set ``old_free_head``,  ``old_used_idx``, ``old_used_wrap_counter``
978   to ``free_head``, ``used_idx``, ``used_wrap_counter``
979
980When reconnecting:
981
982#. If ``used_idx`` does not match ``old_used_idx`` (means the
983   ``inflight`` field of ``DescStatePacked`` entries in last batch may
984   be incorrect),
985
986   a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
987
988   #. Use ``old_used_wrap_counter`` to calculate the available flags
989
990   #. If ``d.flags`` is not equal to the calculated flags value (means
991      back-end has submitted the buffer to guest driver before crash, so
992      it has to commit the in-progress update), set ``old_free_head``,
993      ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
994      ``used_idx``, ``used_wrap_counter``
995
996#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
997   ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
998   (roll back any in-progress update)
999
1000#. Set the ``inflight`` field of each ``DescStatePacked`` entry in
1001   free list to 0
1002
1003#. Resubmit inflight ``DescStatePacked`` entries in order of their
1004   counter value
1005
1006In-band notifications
1007---------------------
1008
1009In some limited situations (e.g. for simulation) it is desirable to
1010have the kick, call and error (if used) signals done via in-band
1011messages instead of asynchronous eventfd notifications. This can be
1012done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
1013protocol feature.
1014
1015Note that due to the fact that too many messages on the sockets can
1016cause the sending application(s) to block, it is not advised to use
1017this feature unless absolutely necessary. It is also considered an
1018error to negotiate this feature without also negotiating
1019``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
1020the former is necessary for getting a message channel from the back-end
1021to the front-end, while the latter needs to be used with the in-band
1022notification messages to block until they are processed, both to avoid
1023blocking later and for proper processing (at least in the simulation
1024use case.) As it has no other way of signalling this error, the back-end
1025should close the connection as a response to a
1026``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
1027notifications feature flag without the other two.
1028
1029Protocol features
1030-----------------
1031
1032.. code:: c
1033
1034  #define VHOST_USER_PROTOCOL_F_MQ                    0
1035  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
1036  #define VHOST_USER_PROTOCOL_F_RARP                  2
1037  #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
1038  #define VHOST_USER_PROTOCOL_F_MTU                   4
1039  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
1040  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
1041  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
1042  #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
1043  #define VHOST_USER_PROTOCOL_F_CONFIG                9
1044  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
1045  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
1046  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
1047  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
1048  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
1049  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
1050  #define VHOST_USER_PROTOCOL_F_STATUS               16
1051  #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
1052  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT        18
1053  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         19
1054
1055Front-end message types
1056-----------------------
1057
1058``VHOST_USER_GET_FEATURES``
1059  :id: 1
1060  :equivalent ioctl: ``VHOST_GET_FEATURES``
1061  :request payload: N/A
1062  :reply payload: ``u64``
1063
1064  Get from the underlying vhost implementation the features bitmask.
1065  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
1066  for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
1067  ``VHOST_USER_SET_PROTOCOL_FEATURES``.
1068
1069``VHOST_USER_SET_FEATURES``
1070  :id: 2
1071  :equivalent ioctl: ``VHOST_SET_FEATURES``
1072  :request payload: ``u64``
1073  :reply payload: N/A
1074
1075  Enable features in the underlying vhost implementation using a
1076  bitmask.  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
1077  back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
1078  ``VHOST_USER_SET_PROTOCOL_FEATURES``.
1079
1080``VHOST_USER_GET_PROTOCOL_FEATURES``
1081  :id: 15
1082  :equivalent ioctl: ``VHOST_GET_FEATURES``
1083  :request payload: N/A
1084  :reply payload: ``u64``
1085
1086  Get the protocol feature bitmask from the underlying vhost
1087  implementation.  Only legal if feature bit
1088  ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
1089  ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
1090  ``VHOST_USER_SET_FEATURES``.
1091
1092.. Note::
1093   Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
1094   support this message even before ``VHOST_USER_SET_FEATURES`` was
1095   called.
1096
1097``VHOST_USER_SET_PROTOCOL_FEATURES``
1098  :id: 16
1099  :equivalent ioctl: ``VHOST_SET_FEATURES``
1100  :request payload: ``u64``
1101  :reply payload: N/A
1102
1103  Enable protocol features in the underlying vhost implementation.
1104
1105  Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
1106  ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
1107  ``VHOST_USER_SET_FEATURES``.
1108
1109.. Note::
1110   Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
1111   this message even before ``VHOST_USER_SET_FEATURES`` was called.
1112
1113``VHOST_USER_SET_OWNER``
1114  :id: 3
1115  :equivalent ioctl: ``VHOST_SET_OWNER``
1116  :request payload: N/A
1117  :reply payload: N/A
1118
1119  Issued when a new connection is established. It marks the sender
1120  as the front-end that owns of the session. This can be used on the *back-end*
1121  as a "session start" flag.
1122
1123``VHOST_USER_RESET_OWNER``
1124  :id: 4
1125  :request payload: N/A
1126  :reply payload: N/A
1127
1128.. admonition:: Deprecated
1129
1130   This is no longer used. Used to be sent to request disabling all
1131   rings, but some back-ends interpreted it to also discard connection
1132   state (this interpretation would lead to bugs).  It is recommended
1133   that back-ends either ignore this message, or use it to disable all
1134   rings.
1135
1136``VHOST_USER_SET_MEM_TABLE``
1137  :id: 5
1138  :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
1139  :request payload: multiple memory regions description
1140  :reply payload: (postcopy only) multiple memory regions description
1141
1142  Sets the memory map regions on the back-end so it can translate the
1143  vring addresses. In the ancillary data there is an array of file
1144  descriptors for each memory mapped region. The size and ordering of
1145  the fds matches the number and ordering of memory regions.
1146
1147  When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
1148  ``SET_MEM_TABLE`` replies with the bases of the memory mapped
1149  regions to the front-end.  The back-end must have mmap'd the regions but
1150  not yet accessed them and should not yet generate a userfault
1151  event.
1152
1153.. Note::
1154   ``NEED_REPLY_MASK`` is not set in this case.  QEMU will then
1155   reply back to the list of mappings with an empty
1156   ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
1157   reception of this message may the guest start accessing the memory
1158   and generating faults.
1159
1160``VHOST_USER_SET_LOG_BASE``
1161  :id: 6
1162  :equivalent ioctl: ``VHOST_SET_LOG_BASE``
1163  :request payload: u64
1164  :reply payload: N/A
1165
1166  Sets logging shared memory space.
1167
1168  When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
1169  the log memory fd is provided in the ancillary data of
1170  ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
1171  memory area provided in the message.
1172
1173``VHOST_USER_SET_LOG_FD``
1174  :id: 7
1175  :equivalent ioctl: ``VHOST_SET_LOG_FD``
1176  :request payload: N/A
1177  :reply payload: N/A
1178
1179  Sets the logging file descriptor, which is passed as ancillary data.
1180
1181``VHOST_USER_SET_VRING_NUM``
1182  :id: 8
1183  :equivalent ioctl: ``VHOST_SET_VRING_NUM``
1184  :request payload: vring state description
1185  :reply payload: N/A
1186
1187  Set the size of the queue.
1188
1189``VHOST_USER_SET_VRING_ADDR``
1190  :id: 9
1191  :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
1192  :request payload: vring address description
1193  :reply payload: N/A
1194
1195  Sets the addresses of the different aspects of the vring.
1196
1197``VHOST_USER_SET_VRING_BASE``
1198  :id: 10
1199  :equivalent ioctl: ``VHOST_SET_VRING_BASE``
1200  :request payload: vring descriptor index/indices
1201  :reply payload: N/A
1202
1203  Sets the next index to use for descriptors in this vring:
1204
1205  * For a split virtqueue, sets only the next descriptor index to
1206    process in the *Available Ring*.  The device is supposed to read the
1207    next index in the *Used Ring* from the respective vring structure in
1208    guest memory.
1209
1210  * For a packed virtqueue, both indices are supplied, as they are not
1211    explicitly available in memory.
1212
1213  Consequently, the payload type is specific to the type of virt queue
1214  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
1215  indices for packed virtqueues*).
1216
1217``VHOST_USER_GET_VRING_BASE``
1218  :id: 11
1219  :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
1220  :request payload: vring state description
1221  :reply payload: vring descriptor index/indices
1222
1223  Stops the vring and returns the current descriptor index or indices:
1224
1225    * For a split virtqueue, returns only the 16-bit next descriptor
1226      index to process in the *Available Ring*.  Note that this may
1227      differ from the available ring index in the vring structure in
1228      memory, which points to where the driver will put new available
1229      descriptors.  For the *Used Ring*, the device only needs the next
1230      descriptor index at which to put new descriptors, which is the
1231      value in the vring structure in memory, so this value is not
1232      covered by this message.
1233
1234    * For a packed virtqueue, neither index is explicitly available to
1235      read from memory, so both indices (as maintained by the device) are
1236      returned.
1237
1238  Consequently, the payload type is specific to the type of virt queue
1239  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
1240  indices for packed virtqueues*).
1241
1242  When and as long as all of a device’s vrings are stopped, it is
1243  *suspended*, see :ref:`Suspended device state
1244  <suspended_device_state>`.
1245
1246  The request payload’s *num* field is currently reserved and must be
1247  set to 0.
1248
1249``VHOST_USER_SET_VRING_KICK``
1250  :id: 12
1251  :equivalent ioctl: ``VHOST_SET_VRING_KICK``
1252  :request payload: ``u64``
1253  :reply payload: N/A
1254
1255  Set the event file descriptor for adding buffers to the vring. It is
1256  passed in the ancillary data.
1257
1258  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1259  invalid FD flag. This flag is set when there is no file descriptor
1260  in the ancillary data. This signals that polling should be used
1261  instead of waiting for the kick. Note that if the protocol feature
1262  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
1263  this message isn't necessary as the ring is also started on the
1264  ``VHOST_USER_VRING_KICK`` message, it may however still be used to
1265  set an event file descriptor (which will be preferred over the
1266  message) or to enable polling.
1267
1268``VHOST_USER_SET_VRING_CALL``
1269  :id: 13
1270  :equivalent ioctl: ``VHOST_SET_VRING_CALL``
1271  :request payload: ``u64``
1272  :reply payload: N/A
1273
1274  Set the event file descriptor to signal when buffers are used. It is
1275  passed in the ancillary data.
1276
1277  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1278  invalid FD flag. This flag is set when there is no file descriptor
1279  in the ancillary data. This signals that polling will be used
1280  instead of waiting for the call. Note that if the protocol features
1281  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1282  ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
1283  isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be
1284  used, it may however still be used to set an event file descriptor
1285  or to enable polling.
1286
1287``VHOST_USER_SET_VRING_ERR``
1288  :id: 14
1289  :equivalent ioctl: ``VHOST_SET_VRING_ERR``
1290  :request payload: ``u64``
1291  :reply payload: N/A
1292
1293  Set the event file descriptor to signal when error occurs. It is
1294  passed in the ancillary data.
1295
1296  Bits (0-7) of the payload contain the vring index. Bit 8 is the
1297  invalid FD flag. This flag is set when there is no file descriptor
1298  in the ancillary data. Note that if the protocol features
1299  ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
1300  ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
1301  isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be
1302  used, it may however still be used to set an event file descriptor
1303  (which will be preferred over the message).
1304
1305``VHOST_USER_GET_QUEUE_NUM``
1306  :id: 17
1307  :equivalent ioctl: N/A
1308  :request payload: N/A
1309  :reply payload: u64
1310
1311  Query how many queues the back-end supports.
1312
1313  This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
1314  is set in queried protocol features by
1315  ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1316
1317``VHOST_USER_SET_VRING_ENABLE``
1318  :id: 18
1319  :equivalent ioctl: N/A
1320  :request payload: vring state description
1321  :reply payload: N/A
1322
1323  Signal the back-end to enable or disable corresponding vring.
1324
1325  This request should be sent only when
1326  ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
1327
1328``VHOST_USER_SEND_RARP``
1329  :id: 19
1330  :equivalent ioctl: N/A
1331  :request payload: ``u64``
1332  :reply payload: N/A
1333
1334  Ask vhost user back-end to broadcast a fake RARP to notify the migration
1335  is terminated for guest that does not support GUEST_ANNOUNCE.
1336
1337  Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
1338  present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1339  ``VHOST_USER_PROTOCOL_F_RARP`` is present in
1340  ``VHOST_USER_GET_PROTOCOL_FEATURES``.  The first 6 bytes of the
1341  payload contain the mac address of the guest to allow the vhost user
1342  back-end to construct and broadcast the fake RARP.
1343
1344``VHOST_USER_NET_SET_MTU``
1345  :id: 20
1346  :equivalent ioctl: N/A
1347  :request payload: ``u64``
1348  :reply payload: N/A
1349
1350  Set host MTU value exposed to the guest.
1351
1352  This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
1353  has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
1354  is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
1355  ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
1356  ``VHOST_USER_GET_PROTOCOL_FEATURES``.
1357
1358  If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1359  respond with zero in case the specified MTU is valid, or non-zero
1360  otherwise.
1361
1362``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
1363  :id: 21
1364  :equivalent ioctl: N/A
1365  :request payload: N/A
1366  :reply payload: N/A
1367
1368  Set the socket file descriptor for back-end initiated requests. It is passed
1369  in the ancillary data.
1370
1371  This request should be sent only when
1372  ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
1373  feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in
1374  ``VHOST_USER_GET_PROTOCOL_FEATURES``.  If
1375  ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
1376  respond with zero for success, non-zero otherwise.
1377
1378``VHOST_USER_IOTLB_MSG``
1379  :id: 22
1380  :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1381  :request payload: ``struct vhost_iotlb_msg``
1382  :reply payload: ``u64``
1383
1384  Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1385
1386  The front-end sends such requests to update and invalidate entries in the
1387  device IOTLB. The back-end has to acknowledge the request with sending
1388  zero as ``u64`` payload for success, non-zero otherwise.
1389
1390  This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
1391  feature has been successfully negotiated.
1392
1393``VHOST_USER_SET_VRING_ENDIAN``
1394  :id: 23
1395  :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
1396  :request payload: vring state description
1397  :reply payload: N/A
1398
1399  Set the endianness of a VQ for legacy devices. Little-endian is
1400  indicated with state.num set to 0 and big-endian is indicated with
1401  state.num set to 1. Other values are invalid.
1402
1403  This request should be sent only when
1404  ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
1405  Backends that negotiated this feature should handle both
1406  endiannesses and expect this message once (per VQ) during device
1407  configuration (ie. before the front-end starts the VQ).
1408
1409``VHOST_USER_GET_CONFIG``
1410  :id: 24
1411  :equivalent ioctl: N/A
1412  :request payload: virtio device config space
1413  :reply payload: virtio device config space
1414
1415  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1416  submitted by the vhost-user front-end to fetch the contents of the
1417  virtio device configuration space, vhost-user back-end's payload size
1418  MUST match the front-end's request, vhost-user back-end uses zero length of
1419  payload to indicate an error to the vhost-user front-end. The vhost-user
1420  front-end may cache the contents to avoid repeated
1421  ``VHOST_USER_GET_CONFIG`` calls.
1422
1423``VHOST_USER_SET_CONFIG``
1424  :id: 25
1425  :equivalent ioctl: N/A
1426  :request payload: virtio device config space
1427  :reply payload: N/A
1428
1429  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
1430  submitted by the vhost-user front-end when the Guest changes the virtio
1431  device configuration space and also can be used for live migration
1432  on the destination host. The vhost-user back-end must check the flags
1433  field, and back-ends MUST NOT accept SET_CONFIG for read-only
1434  configuration space fields unless the live migration bit is set.
1435
1436``VHOST_USER_CREATE_CRYPTO_SESSION``
1437  :id: 26
1438  :equivalent ioctl: N/A
1439  :request payload: crypto session description
1440  :reply payload: crypto session description
1441
1442  Create a session for crypto operation. The back-end must return
1443  the session id, 0 or positive for success, negative for failure.
1444  This request should be sent only when
1445  ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1446  successfully negotiated.  It's a required feature for crypto
1447  devices.
1448
1449``VHOST_USER_CLOSE_CRYPTO_SESSION``
1450  :id: 27
1451  :equivalent ioctl: N/A
1452  :request payload: ``u64``
1453  :reply payload: N/A
1454
1455  Close a session for crypto operation which was previously
1456  created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
1457
1458  This request should be sent only when
1459  ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
1460  successfully negotiated.  It's a required feature for crypto
1461  devices.
1462
1463``VHOST_USER_POSTCOPY_ADVISE``
1464  :id: 28
1465  :request payload: N/A
1466  :reply payload: userfault fd
1467
1468  When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
1469  advises back-end that a migration with postcopy enabled is underway,
1470  the back-end must open a userfaultfd for later use.  Note that at this
1471  stage the migration is still in precopy mode.
1472
1473``VHOST_USER_POSTCOPY_LISTEN``
1474  :id: 29
1475  :request payload: N/A
1476  :reply payload: N/A
1477
1478  The front-end advises back-end that a transition to postcopy mode has
1479  happened.  The back-end must ensure that shared memory is registered
1480  with userfaultfd to cause faulting of non-present pages.
1481
1482  This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
1483  and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
1484
1485``VHOST_USER_POSTCOPY_END``
1486  :id: 30
1487  :request payload: N/A
1488  :reply payload: ``u64``
1489
1490  The front-end advises that postcopy migration has now completed.  The back-end
1491  must disable the userfaultfd. The reply is an acknowledgement
1492  only.
1493
1494  When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
1495  is sent at the end of the migration, after
1496  ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
1497
1498  The value returned is an error indication; 0 is success.
1499
1500``VHOST_USER_GET_INFLIGHT_FD``
1501  :id: 31
1502  :equivalent ioctl: N/A
1503  :request payload: inflight description
1504  :reply payload: N/A
1505
1506  When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1507  been successfully negotiated, this message is submitted by the front-end to
1508  get a shared buffer from back-end. The shared buffer will be used to
1509  track inflight I/O by back-end. QEMU should retrieve a new one when vm
1510  reset.
1511
1512``VHOST_USER_SET_INFLIGHT_FD``
1513  :id: 32
1514  :equivalent ioctl: N/A
1515  :request payload: inflight description
1516  :reply payload: N/A
1517
1518  When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
1519  been successfully negotiated, this message is submitted by the front-end to
1520  send the shared inflight buffer back to the back-end so that the back-end
1521  could get inflight I/O after a crash or restart.
1522
1523``VHOST_USER_GPU_SET_SOCKET``
1524  :id: 33
1525  :equivalent ioctl: N/A
1526  :request payload: N/A
1527  :reply payload: N/A
1528
1529  Sets the GPU protocol socket file descriptor, which is passed as
1530  ancillary data. The GPU protocol is used to inform the front-end of
1531  rendering state and updates. See vhost-user-gpu.rst for details.
1532
1533``VHOST_USER_RESET_DEVICE``
1534  :id: 34
1535  :equivalent ioctl: N/A
1536  :request payload: N/A
1537  :reply payload: N/A
1538
1539  Ask the vhost user back-end to disable all rings and reset all
1540  internal device state to the initial state, ready to be
1541  reinitialized. The back-end retains ownership of the device
1542  throughout the reset operation.
1543
1544  Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
1545  feature is set by the back-end.
1546
1547``VHOST_USER_VRING_KICK``
1548  :id: 35
1549  :equivalent ioctl: N/A
1550  :request payload: vring state description
1551  :reply payload: N/A
1552
1553  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1554  feature has been successfully negotiated, this message may be
1555  submitted by the front-end to indicate that a buffer was added to
1556  the vring instead of signalling it using the vring's kick file
1557  descriptor or having the back-end rely on polling.
1558
1559  The state.num field is currently reserved and must be set to 0.
1560
1561``VHOST_USER_GET_MAX_MEM_SLOTS``
1562  :id: 36
1563  :equivalent ioctl: N/A
1564  :request payload: N/A
1565  :reply payload: u64
1566
1567  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1568  feature has been successfully negotiated, this message is submitted
1569  by the front-end to the back-end. The back-end should return the message with a
1570  u64 payload containing the maximum number of memory slots for
1571  QEMU to expose to the guest. The value returned by the back-end
1572  will be capped at the maximum number of ram slots which can be
1573  supported by the target platform.
1574
1575``VHOST_USER_ADD_MEM_REG``
1576  :id: 37
1577  :equivalent ioctl: N/A
1578  :request payload: N/A
1579  :reply payload: single memory region description
1580
1581  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1582  feature has been successfully negotiated, this message is submitted
1583  by the front-end to the back-end. The message payload contains a memory
1584  region descriptor struct, describing a region of guest memory which
1585  the back-end device must map in. When the
1586  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1587  been successfully negotiated, along with the
1588  ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
1589  update the memory tables of the back-end device.
1590
1591  Exactly one file descriptor from which the memory is mapped is
1592  passed in the ancillary data.
1593
1594  In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
1595  replies with the bases of the memory mapped region to the front-end.
1596  For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``.
1597  They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly.
1598
1599``VHOST_USER_REM_MEM_REG``
1600  :id: 38
1601  :equivalent ioctl: N/A
1602  :request payload: N/A
1603  :reply payload: single memory region description
1604
1605  When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
1606  feature has been successfully negotiated, this message is submitted
1607  by the front-end to the back-end. The message payload contains a memory
1608  region descriptor struct, describing a region of guest memory which
1609  the back-end device must unmap. When the
1610  ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
1611  been successfully negotiated, along with the
1612  ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
1613  update the memory tables of the back-end device.
1614
1615  The memory region to be removed is identified by its guest address,
1616  user address and size. The mmap offset is ignored.
1617
1618  No file descriptors SHOULD be passed in the ancillary data. For
1619  compatibility with existing incorrect implementations, the back-end MAY
1620  accept messages with one file descriptor. If a file descriptor is
1621  passed, the back-end MUST close it without using it otherwise.
1622
1623``VHOST_USER_SET_STATUS``
1624  :id: 39
1625  :equivalent ioctl: VHOST_VDPA_SET_STATUS
1626  :request payload: ``u64``
1627  :reply payload: N/A
1628
1629  When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1630  successfully negotiated, this message is submitted by the front-end to
1631  notify the back-end with updated device status as defined in the Virtio
1632  specification.
1633
1634``VHOST_USER_GET_STATUS``
1635  :id: 40
1636  :equivalent ioctl: VHOST_VDPA_GET_STATUS
1637  :request payload: N/A
1638  :reply payload: ``u64``
1639
1640  When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
1641  successfully negotiated, this message is submitted by the front-end to
1642  query the back-end for its device status as defined in the Virtio
1643  specification.
1644
1645``VHOST_USER_GET_SHARED_OBJECT``
1646  :id: 41
1647  :equivalent ioctl: N/A
1648  :request payload: ``struct VhostUserShared``
1649  :reply payload: dmabuf fd
1650
1651  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1652  feature has been successfully negotiated, and the UUID is found
1653  in the exporters cache, this message is submitted by the front-end
1654  to retrieve a given dma-buf fd from a given back-end, determined by
1655  the requested UUID. Back-end will reply passing the fd when the operation
1656  is successful, or no fd otherwise.
1657
1658``VHOST_USER_SET_DEVICE_STATE_FD``
1659  :id: 42
1660  :equivalent ioctl: N/A
1661  :request payload: device state transfer parameters
1662  :reply payload: ``u64``
1663
1664  Front-end and back-end negotiate a channel over which to transfer the
1665  back-end’s internal state during migration.  Either side (front-end or
1666  back-end) may create the channel.  The nature of this channel is not
1667  restricted or defined in this document, but whichever side creates it
1668  must create a file descriptor that is provided to the respectively
1669  other side, allowing access to the channel.  This FD must behave as
1670  follows:
1671
1672  * For the writing end, it must allow writing the whole back-end state
1673    sequentially.  Closing the file descriptor signals the end of
1674    transfer.
1675
1676  * For the reading end, it must allow reading the whole back-end state
1677    sequentially.  The end of file signals the end of the transfer.
1678
1679  For example, the channel may be a pipe, in which case the two ends of
1680  the pipe fulfill these requirements respectively.
1681
1682  Initially, the front-end creates a channel along with such an FD.  It
1683  passes the FD to the back-end as ancillary data of a
1684  ``VHOST_USER_SET_DEVICE_STATE_FD`` message.  The back-end may create a
1685  different transfer channel, passing the respective FD back to the
1686  front-end as ancillary data of the reply.  If so, the front-end must
1687  then discard its channel and use the one provided by the back-end.
1688
1689  Whether the back-end should decide to use its own channel is decided
1690  based on efficiency: If the channel is a pipe, both ends will most
1691  likely need to copy data into and out of it.  Any channel that allows
1692  for more efficient processing on at least one end, e.g. through
1693  zero-copy, is considered more efficient and thus preferred.  If the
1694  back-end can provide such a channel, it should decide to use it.
1695
1696  The request payload contains parameters for the subsequent data
1697  transfer, as described in the :ref:`Migrating back-end state
1698  <migrating_backend_state>` section.
1699
1700  The value returned is both an indication for success, and whether a
1701  file descriptor for a back-end-provided channel is returned: Bits 0–7
1702  are 0 on success, and non-zero on error.  Bit 8 is the invalid FD
1703  flag; this flag is set when there is no file descriptor returned.
1704  When this flag is not set, the front-end must use the returned file
1705  descriptor as its end of the transfer channel.  The back-end must not
1706  both indicate an error and return a file descriptor.
1707
1708  Using this function requires prior negotiation of the
1709  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
1710
1711``VHOST_USER_CHECK_DEVICE_STATE``
1712  :id: 43
1713  :equivalent ioctl: N/A
1714  :request payload: N/A
1715  :reply payload: ``u64``
1716
1717  After transferring the back-end’s internal state during migration (see
1718  the :ref:`Migrating back-end state <migrating_backend_state>`
1719  section), check whether the back-end was able to successfully fully
1720  process the state.
1721
1722  The value returned indicates success or error; 0 is success, any
1723  non-zero value is an error.
1724
1725  Using this function requires prior negotiation of the
1726  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
1727
1728Back-end message types
1729----------------------
1730
1731For this type of message, the request is sent by the back-end and the reply
1732is sent by the front-end.
1733
1734``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``)
1735  :id: 1
1736  :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
1737  :request payload: ``struct vhost_iotlb_msg``
1738  :reply payload: N/A
1739
1740  Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
1741  The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
1742  access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
1743  negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
1744  must respond with zero when operation is successfully completed, or
1745  non-zero otherwise.  This request should be send only when
1746  ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
1747  negotiated.
1748
1749``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``)
1750  :id: 2
1751  :equivalent ioctl: N/A
1752  :request payload: N/A
1753  :reply payload: N/A
1754
1755  When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
1756  back-end sends such messages to notify that the virtio device's
1757  configuration space has changed, for those host devices which can
1758  support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
1759  message to the back-end to get the latest content. If
1760  ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
1761  ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
1762  operation is successfully completed, or non-zero otherwise.
1763
1764``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``)
1765  :id: 3
1766  :equivalent ioctl: N/A
1767  :request payload: vring area description
1768  :reply payload: N/A
1769
1770  Sets host notifier for a specified queue. The queue index is
1771  contained in the ``u64`` field of the vring area description. The
1772  host notifier is described by the file descriptor (typically it's a
1773  VFIO device fd) which is passed as ancillary data and the size
1774  (which is mmap size and should be the same as host page size) and
1775  offset (which is mmap offset) carried in the vring area
1776  description. QEMU can mmap the file descriptor based on the size and
1777  offset to get a memory range. Registering a host notifier means
1778  mapping this memory range to the VM as the specified queue's notify
1779  MMIO region. The back-end sends this request to tell QEMU to de-register
1780  the existing notifier if any and register the new notifier if the
1781  request is sent with a file descriptor.
1782
1783  This request should be sent only when
1784  ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
1785  successfully negotiated.
1786
1787``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``)
1788  :id: 4
1789  :equivalent ioctl: N/A
1790  :request payload: vring state description
1791  :reply payload: N/A
1792
1793  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1794  feature has been successfully negotiated, this message may be
1795  submitted by the back-end to indicate that a buffer was used from
1796  the vring instead of signalling this using the vring's call file
1797  descriptor or having the front-end relying on polling.
1798
1799  The state.num field is currently reserved and must be set to 0.
1800
1801``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``)
1802  :id: 5
1803  :equivalent ioctl: N/A
1804  :request payload: vring state description
1805  :reply payload: N/A
1806
1807  When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
1808  feature has been successfully negotiated, this message may be
1809  submitted by the back-end to indicate that an error occurred on the
1810  specific vring, instead of signalling the error file descriptor
1811  set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
1812
1813  The state.num field is currently reserved and must be set to 0.
1814
1815``VHOST_USER_BACKEND_SHARED_OBJECT_ADD``
1816  :id: 6
1817  :equivalent ioctl: N/A
1818  :request payload: ``struct VhostUserShared``
1819  :reply payload: N/A
1820
1821  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1822  feature has been successfully negotiated, this message can be submitted
1823  by the backends to add themselves as exporters to the virtio shared lookup
1824  table. The back-end device gets associated with a UUID in the shared table.
1825  The back-end is responsible of keeping its own table with exported dma-buf fds.
1826  When another back-end tries to import the resource associated with the UUID,
1827  it will send a message to the front-end, which will act as a proxy to the
1828  exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and
1829  the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must
1830  respond with zero when operation is successfully completed, or non-zero
1831  otherwise.
1832
1833``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE``
1834  :id: 7
1835  :equivalent ioctl: N/A
1836  :request payload: ``struct VhostUserShared``
1837  :reply payload: N/A
1838
1839  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1840  feature has been successfully negotiated, this message can be submitted
1841  by the backend to remove themselves from to the virtio-dmabuf shared
1842  table API. Only the back-end owning the entry (i.e., the one that first added
1843  it) will have permission to remove it. Otherwise, the message is ignored.
1844  The shared table will remove the back-end device associated with
1845  the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the
1846  back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond
1847  with zero when operation is successfully completed, or non-zero otherwise.
1848
1849``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP``
1850  :id: 8
1851  :equivalent ioctl: N/A
1852  :request payload: ``struct VhostUserShared``
1853  :reply payload: dmabuf fd and ``u64``
1854
1855  When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol
1856  feature has been successfully negotiated, this message can be submitted
1857  by the backends to retrieve a given dma-buf fd from the virtio-dmabuf
1858  shared table given a UUID. Frontend will reply passing the fd and a zero
1859  when the operation is successful, or non-zero otherwise. Note that if the
1860  operation fails, no fd is sent to the backend.
1861
1862.. _reply_ack:
1863
1864VHOST_USER_PROTOCOL_F_REPLY_ACK
1865-------------------------------
1866
1867The original vhost-user specification only demands replies for certain
1868commands. This differs from the vhost protocol implementation where
1869commands are sent over an ``ioctl()`` call and block until the back-end
1870has completed.
1871
1872With this protocol extension negotiated, the sender (QEMU) can set the
1873``need_reply`` [Bit 3] flag to any command. This indicates that the
1874back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
1875or failure. The payload should be set to zero on success or non-zero
1876on failure, unless the message already has an explicit reply body.
1877
1878The reply payload gives QEMU a deterministic indication of the result
1879of the command. Today, QEMU is expected to terminate the main vhost-user
1880loop upon receiving such errors. In future, qemu could be taught to be more
1881resilient for selective requests.
1882
1883For the message types that already solicit a reply from the back-end,
1884the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
1885being set brings no behavioural change. (See the Communication_
1886section for details.)
1887
1888.. _backend_conventions:
1889
1890Backend program conventions
1891===========================
1892
1893vhost-user back-ends can provide various devices & services and may
1894need to be configured manually depending on the use case. However, it
1895is a good idea to follow the conventions listed here when
1896possible. Users, QEMU or libvirt, can then rely on some common
1897behaviour to avoid heterogeneous configuration and management of the
1898back-end programs and facilitate interoperability.
1899
1900Each back-end installed on a host system should come with at least one
1901JSON file that conforms to the vhost-user.json schema. Each file
1902informs the management applications about the back-end type, and binary
1903location. In addition, it defines rules for management apps for
1904picking the highest priority back-end when multiple match the search
1905criteria (see ``@VhostUserBackend`` documentation in the schema file).
1906
1907If the back-end is not capable of enabling a requested feature on the
1908host (such as 3D acceleration with virgl), or the initialization
1909failed, the back-end should fail to start early and exit with a status
1910!= 0. It may also print a message to stderr for further details.
1911
1912The back-end program must not daemonize itself, but it may be
1913daemonized by the management layer. It may also have a restricted
1914access to the system.
1915
1916File descriptors 0, 1 and 2 will exist, and have regular
1917stdin/stdout/stderr usage (they may have been redirected to /dev/null
1918by the management layer, or to a log handler).
1919
1920The back-end program must end (as quickly and cleanly as possible) when
1921the SIGTERM signal is received. Eventually, it may receive SIGKILL by
1922the management layer after a few seconds.
1923
1924The following command line options have an expected behaviour. They
1925are mandatory, unless explicitly said differently:
1926
1927--socket-path=PATH
1928
1929  This option specify the location of the vhost-user Unix domain socket.
1930  It is incompatible with --fd.
1931
1932--fd=FDNUM
1933
1934  When this argument is given, the back-end program is started with the
1935  vhost-user socket as file descriptor FDNUM. It is incompatible with
1936  --socket-path.
1937
1938--print-capabilities
1939
1940  Output to stdout the back-end capabilities in JSON format, and then
1941  exit successfully. Other options and arguments should be ignored, and
1942  the back-end program should not perform its normal function.  The
1943  capabilities can be reported dynamically depending on the host
1944  capabilities.
1945
1946The JSON output is described in the ``vhost-user.json`` schema, by
1947```@VHostUserBackendCapabilities``.  Example:
1948
1949.. code:: json
1950
1951  {
1952    "type": "foo",
1953    "features": [
1954      "feature-a",
1955      "feature-b"
1956    ]
1957  }
1958
1959vhost-user-input
1960----------------
1961
1962Command line options:
1963
1964--evdev-path=PATH
1965
1966  Specify the linux input device.
1967
1968  (optional)
1969
1970--no-grab
1971
1972  Do no request exclusive access to the input device.
1973
1974  (optional)
1975
1976vhost-user-gpu
1977--------------
1978
1979Command line options:
1980
1981--render-node=PATH
1982
1983  Specify the GPU DRM render node.
1984
1985  (optional)
1986
1987--virgl
1988
1989  Enable virgl rendering support.
1990
1991  (optional)
1992
1993vhost-user-blk
1994--------------
1995
1996Command line options:
1997
1998--blk-file=PATH
1999
2000  Specify block device or file path.
2001
2002  (optional)
2003
2004--read-only
2005
2006  Enable read-only.
2007
2008  (optional)
2009