1=================== 2Vhost-user Protocol 3=================== 4:Copyright: 2014 Virtual Open Systems Sarl. 5:Licence: This work is licensed under the terms of the GNU GPL, 6 version 2 or later. See the COPYING file in the top-level 7 directory. 8 9.. contents:: Table of Contents 10 11Introduction 12============ 13 14This protocol is aiming to complement the ``ioctl`` interface used to 15control the vhost implementation in the Linux kernel. It implements 16the control plane needed to establish virtqueue sharing with a user 17space process on the same host. It uses communication over a Unix 18domain socket to share file descriptors in the ancillary data of the 19message. 20 21The protocol defines 2 sides of the communication, *master* and 22*slave*. *Master* is the application that shares its virtqueues, in 23our case QEMU. *Slave* is the consumer of the virtqueues. 24 25In the current implementation QEMU is the *master*, and the *slave* is 26the external process consuming the virtio queues, for example a 27software Ethernet switch running in user space, such as Snabbswitch, 28or a block device backend processing read & write to a virtual 29disk. In order to facilitate interoperability between various backend 30implementations, it is recommended to follow the :ref:`Backend program 31conventions <backend_conventions>`. 32 33*Master* and *slave* can be either a client (i.e. connecting) or 34server (listening) in the socket communication. 35 36Message Specification 37===================== 38 39.. Note:: All numbers are in the machine native byte order. 40 41A vhost-user message consists of 3 header fields and a payload. 42 43+---------+-------+------+---------+ 44| request | flags | size | payload | 45+---------+-------+------+---------+ 46 47Header 48------ 49 50:request: 32-bit type of the request 51 52:flags: 32-bit bit field 53 54- Lower 2 bits are the version (currently 0x01) 55- Bit 2 is the reply flag - needs to be sent on each reply from the slave 56- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 57 details. 58 59:size: 32-bit size of the payload 60 61Payload 62------- 63 64Depending on the request type, **payload** can be: 65 66A single 64-bit integer 67^^^^^^^^^^^^^^^^^^^^^^^ 68 69+-----+ 70| u64 | 71+-----+ 72 73:u64: a 64-bit unsigned integer 74 75A vring state description 76^^^^^^^^^^^^^^^^^^^^^^^^^ 77 78+-------+-----+ 79| index | num | 80+-------+-----+ 81 82:index: a 32-bit index 83 84:num: a 32-bit number 85 86A vring address description 87^^^^^^^^^^^^^^^^^^^^^^^^^^^ 88 89+-------+-------+------+------------+------+-----------+-----+ 90| index | flags | size | descriptor | used | available | log | 91+-------+-------+------+------------+------+-----------+-----+ 92 93:index: a 32-bit vring index 94 95:flags: a 32-bit vring flags 96 97:descriptor: a 64-bit ring address of the vring descriptor table 98 99:used: a 64-bit ring address of the vring used ring 100 101:available: a 64-bit ring address of the vring available ring 102 103:log: a 64-bit guest address for logging 104 105Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 106been negotiated. Otherwise it is a user address. 107 108Memory regions description 109^^^^^^^^^^^^^^^^^^^^^^^^^^ 110 111+-------------+---------+---------+-----+---------+ 112| num regions | padding | region0 | ... | region7 | 113+-------------+---------+---------+-----+---------+ 114 115:num regions: a 32-bit number of regions 116 117:padding: 32-bit 118 119A region is: 120 121+---------------+------+--------------+-------------+ 122| guest address | size | user address | mmap offset | 123+---------------+------+--------------+-------------+ 124 125:guest address: a 64-bit guest address of the region 126 127:size: a 64-bit size 128 129:user address: a 64-bit user address 130 131:mmap offset: 64-bit offset where region starts in the mapped memory 132 133Log description 134^^^^^^^^^^^^^^^ 135 136+----------+------------+ 137| log size | log offset | 138+----------+------------+ 139 140:log size: size of area used for logging 141 142:log offset: offset from start of supplied file descriptor where 143 logging starts (i.e. where guest address 0 would be 144 logged) 145 146An IOTLB message 147^^^^^^^^^^^^^^^^ 148 149+------+------+--------------+-------------------+------+ 150| iova | size | user address | permissions flags | type | 151+------+------+--------------+-------------------+------+ 152 153:iova: a 64-bit I/O virtual address programmed by the guest 154 155:size: a 64-bit size 156 157:user address: a 64-bit user address 158 159:permissions flags: an 8-bit value: 160 - 0: No access 161 - 1: Read access 162 - 2: Write access 163 - 3: Read/Write access 164 165:type: an 8-bit IOTLB message type: 166 - 1: IOTLB miss 167 - 2: IOTLB update 168 - 3: IOTLB invalidate 169 - 4: IOTLB access fail 170 171Virtio device config space 172^^^^^^^^^^^^^^^^^^^^^^^^^^ 173 174+--------+------+-------+---------+ 175| offset | size | flags | payload | 176+--------+------+-------+---------+ 177 178:offset: a 32-bit offset of virtio device's configuration space 179 180:size: a 32-bit configuration space access size in bytes 181 182:flags: a 32-bit value: 183 - 0: Vhost master messages used for writeable fields 184 - 1: Vhost master messages used for live migration 185 186:payload: Size bytes array holding the contents of the virtio 187 device's configuration space 188 189Vring area description 190^^^^^^^^^^^^^^^^^^^^^^ 191 192+-----+------+--------+ 193| u64 | size | offset | 194+-----+------+--------+ 195 196:u64: a 64-bit integer contains vring index and flags 197 198:size: a 64-bit size of this area 199 200:offset: a 64-bit offset of this area from the start of the 201 supplied file descriptor 202 203Inflight description 204^^^^^^^^^^^^^^^^^^^^ 205 206+-----------+-------------+------------+------------+ 207| mmap size | mmap offset | num queues | queue size | 208+-----------+-------------+------------+------------+ 209 210:mmap size: a 64-bit size of area to track inflight I/O 211 212:mmap offset: a 64-bit offset of this area from the start 213 of the supplied file descriptor 214 215:num queues: a 16-bit number of virtqueues 216 217:queue size: a 16-bit size of virtqueues 218 219C structure 220----------- 221 222In QEMU the vhost-user message is implemented with the following struct: 223 224.. code:: c 225 226 typedef struct VhostUserMsg { 227 VhostUserRequest request; 228 uint32_t flags; 229 uint32_t size; 230 union { 231 uint64_t u64; 232 struct vhost_vring_state state; 233 struct vhost_vring_addr addr; 234 VhostUserMemory memory; 235 VhostUserLog log; 236 struct vhost_iotlb_msg iotlb; 237 VhostUserConfig config; 238 VhostUserVringArea area; 239 VhostUserInflight inflight; 240 }; 241 } QEMU_PACKED VhostUserMsg; 242 243Communication 244============= 245 246The protocol for vhost-user is based on the existing implementation of 247vhost for the Linux Kernel. Most messages that can be sent via the 248Unix domain socket implementing vhost-user have an equivalent ioctl to 249the kernel implementation. 250 251The communication consists of *master* sending message requests and 252*slave* sending message replies. Most of the requests don't require 253replies. Here is a list of the ones that do: 254 255* ``VHOST_USER_GET_FEATURES`` 256* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 257* ``VHOST_USER_GET_VRING_BASE`` 258* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 259* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 260 261.. seealso:: 262 263 :ref:`REPLY_ACK <reply_ack>` 264 The section on ``REPLY_ACK`` protocol extension. 265 266There are several messages that the master sends with file descriptors passed 267in the ancillary data: 268 269* ``VHOST_USER_SET_MEM_TABLE`` 270* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 271* ``VHOST_USER_SET_LOG_FD`` 272* ``VHOST_USER_SET_VRING_KICK`` 273* ``VHOST_USER_SET_VRING_CALL`` 274* ``VHOST_USER_SET_VRING_ERR`` 275* ``VHOST_USER_SET_SLAVE_REQ_FD`` 276* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 277 278If *master* is unable to send the full message or receives a wrong 279reply it will close the connection. An optional reconnection mechanism 280can be implemented. 281 282Any protocol extensions are gated by protocol feature bits, which 283allows full backwards compatibility on both master and slave. As 284older slaves don't support negotiating protocol features, a feature 285bit was dedicated for this purpose:: 286 287 #define VHOST_USER_F_PROTOCOL_FEATURES 30 288 289Starting and stopping rings 290--------------------------- 291 292Client must only process each ring when it is started. 293 294Client must only pass data between the ring and the backend, when the 295ring is enabled. 296 297If ring is started but disabled, client must process the ring without 298talking to the backend. 299 300For example, for a networking device, in the disabled state client 301must not supply any new RX packets, but must process and discard any 302TX packets. 303 304If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 305ring is initialized in an enabled state. 306 307If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 308initialized in a disabled state. Client must not pass data to/from the 309backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with 310parameter 1, or after it has been disabled by 311``VHOST_USER_SET_VRING_ENABLE`` with parameter 0. 312 313Each ring is initialized in a stopped state, client must not process 314it until ring is started, or after it has been stopped. 315 316Client must start ring upon receiving a kick (that is, detecting that 317file descriptor is readable) on the descriptor specified by 318``VHOST_USER_SET_VRING_KICK``, and stop ring upon receiving 319``VHOST_USER_GET_VRING_BASE``. 320 321While processing the rings (whether they are enabled or not), client 322must support changing some configuration aspects on the fly. 323 324Multiple queue support 325---------------------- 326 327Many devices have a fixed number of virtqueues. In this case the master 328already knows the number of available virtqueues without communicating with the 329slave. 330 331Some devices do not have a fixed number of virtqueues. Instead the maximum 332number of virtqueues is chosen by the slave. The number can depend on host 333resource availability or slave implementation details. Such devices are called 334multiple queue devices. 335 336Multiple queue support allows the slave to advertise the maximum number of 337queues. This is treated as a protocol extension, hence the slave has to 338implement protocol features first. The multiple queues feature is supported 339only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 340 341The max number of queues the slave supports can be queried with message 342``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested 343queues is bigger than that. 344 345As all queues share one connection, the master uses a unique index for each 346queue in the sent message to identify a specified queue. 347 348The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 349vhost-user-net has historically automatically enabled the first queue pair. 350 351Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 352feature, even for devices with a fixed number of virtqueues, since it is simple 353to implement and offers a degree of introspection. 354 355Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 356devices with a fixed number of virtqueues. Only true multiqueue devices 357require this protocol feature. 358 359Migration 360--------- 361 362During live migration, the master may need to track the modifications 363the slave makes to the memory mapped regions. The client should mark 364the dirty pages in a log. Once it complies to this logging, it may 365declare the ``VHOST_F_LOG_ALL`` vhost feature. 366 367To start/stop logging of data/used ring writes, server may send 368messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 369``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 370flags set to 1/0, respectively. 371 372All the modifications to memory pointed by vring "descriptor" should 373be marked. Modifications to "used" vring should be marked if 374``VHOST_VRING_F_LOG`` is part of ring's flags. 375 376Dirty pages are of size:: 377 378 #define VHOST_LOG_PAGE 0x1000 379 380The log memory fd is provided in the ancillary data of 381``VHOST_USER_SET_LOG_BASE`` message when the slave has 382``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 383 384The size of the log is supplied as part of ``VhostUserMsg`` which 385should be large enough to cover all known guest addresses. Log starts 386at the supplied offset in the supplied file descriptor. The log 387covers from address 0 to the maximum of guest regions. In pseudo-code, 388to mark page at ``addr`` as dirty:: 389 390 page = addr / VHOST_LOG_PAGE 391 log[page / 8] |= 1 << page % 8 392 393Where ``addr`` is the guest physical address. 394 395Use atomic operations, as the log may be concurrently manipulated. 396 397Note that when logging modifications to the used ring (when 398``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 399be used to calculate the log offset: the write to first byte of the 400used ring is logged at this offset from log start. Also note that this 401value might be outside the legal guest physical address range 402(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 403the bit offset of the last byte of the ring must fall within the size 404supplied by ``VhostUserLog``. 405 406``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 407ancillary data, it may be used to inform the master that the log has 408been modified. 409 410Once the source has finished migration, rings will be stopped by the 411source. No further update must be done before rings are restarted. 412 413In postcopy migration the slave is started before all the memory has 414been received from the source host, and care must be taken to avoid 415accessing pages that have yet to be received. The slave opens a 416'userfault'-fd and registers the memory with it; this fd is then 417passed back over to the master. The master services requests on the 418userfaultfd for pages that are accessed and when the page is available 419it performs WAKE ioctl's on the userfaultfd to wake the stalled 420slave. The client indicates support for this via the 421``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 422 423Memory access 424------------- 425 426The master sends a list of vhost memory regions to the slave using the 427``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 428addresses: a guest address and a user address. 429 430Messages contain guest addresses and/or user addresses to reference locations 431within the shared memory. The mapping of these addresses works as follows. 432 433User addresses map to the vhost memory region containing that user address. 434 435When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 436 437* Guest addresses map to the vhost memory region containing that guest 438 address. 439 440When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 441 442* Guest addresses are also called I/O virtual addresses (IOVAs). They are 443 translated to user addresses via the IOTLB. 444 445* The vhost memory region guest address is not used. 446 447IOMMU support 448------------- 449 450When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 451master sends IOTLB entries update & invalidation by sending 452``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct 453vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 454has to be filled with the update message type (2), the I/O virtual 455address, the size, the user virtual address, and the permissions 456flags. Addresses and size must be within vhost memory regions set via 457the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 458``iotlb`` payload has to be filled with the invalidation message type 459(3), the I/O virtual address and the size. On success, the slave is 460expected to reply with a zero payload, non-zero otherwise. 461 462The slave relies on the slave communcation channel (see :ref:`Slave 463communication <slave_communication>` section below) to send IOTLB miss 464and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 465requests to the master with a ``struct vhost_iotlb_msg`` as 466payload. For miss events, the iotlb payload has to be filled with the 467miss message type (1), the I/O virtual address and the permissions 468flags. For access failure event, the iotlb payload has to be filled 469with the access failure message type (4), the I/O virtual address and 470the permissions flags. For synchronization purpose, the slave may 471rely on the reply-ack feature, so the master may send a reply when 472operation is completed if the reply-ack feature is negotiated and 473slaves requests a reply. For miss events, completed operation means 474either master sent an update message containing the IOTLB entry 475containing requested address and permission, or master sent nothing if 476the IOTLB miss message is invalid (invalid IOVA or permission). 477 478The master isn't expected to take the initiative to send IOTLB update 479messages, as the slave sends IOTLB miss messages for the guest virtual 480memory areas it needs to access. 481 482.. _slave_communication: 483 484Slave communication 485------------------- 486 487An optional communication channel is provided if the slave declares 488``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 489slave to make requests to the master. 490 491The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 492 493A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master 494using this fd communication channel. 495 496If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 497negotiated, slave can send file descriptors (at most 8 descriptors in 498each message) to master via ancillary data using this fd communication 499channel. 500 501Inflight I/O tracking 502--------------------- 503 504To support reconnecting after restart or crash, slave may need to 505resubmit inflight I/Os. If virtqueue is processed in order, we can 506easily achieve that by getting the inflight descriptors from 507descriptor table (split virtqueue) or descriptor ring (packed 508virtqueue). However, it can't work when we process descriptors 509out-of-order because some entries which store the information of 510inflight descriptors in available ring (split virtqueue) or descriptor 511ring (packed virtqueue) might be overrided by new entries. To solve 512this problem, slave need to allocate an extra buffer to store this 513information of inflight descriptors and share it with master for 514persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 515``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 516between master and slave. And the format of this buffer is described 517below: 518 519+---------------+---------------+-----+---------------+ 520| queue0 region | queue1 region | ... | queueN region | 521+---------------+---------------+-----+---------------+ 522 523N is the number of available virtqueues. Slave could get it from num 524queues field of ``VhostUserInflight``. 525 526For split virtqueue, queue region can be implemented as: 527 528.. code:: c 529 530 typedef struct DescStateSplit { 531 /* Indicate whether this descriptor is inflight or not. 532 * Only available for head-descriptor. */ 533 uint8_t inflight; 534 535 /* Padding */ 536 uint8_t padding[5]; 537 538 /* Maintain a list for the last batch of used descriptors. 539 * Only available when batching is used for submitting */ 540 uint16_t next; 541 542 /* Used to preserve the order of fetching available descriptors. 543 * Only available for head-descriptor. */ 544 uint64_t counter; 545 } DescStateSplit; 546 547 typedef struct QueueRegionSplit { 548 /* The feature flags of this region. Now it's initialized to 0. */ 549 uint64_t features; 550 551 /* The version of this region. It's 1 currently. 552 * Zero value indicates an uninitialized buffer */ 553 uint16_t version; 554 555 /* The size of DescStateSplit array. It's equal to the virtqueue 556 * size. Slave could get it from queue size field of VhostUserInflight. */ 557 uint16_t desc_num; 558 559 /* The head of list that track the last batch of used descriptors. */ 560 uint16_t last_batch_head; 561 562 /* Store the idx value of used ring */ 563 uint16_t used_idx; 564 565 /* Used to track the state of each descriptor in descriptor table */ 566 DescStateSplit desc[0]; 567 } QueueRegionSplit; 568 569To track inflight I/O, the queue region should be processed as follows: 570 571When receiving available buffers from the driver: 572 573#. Get the next available head-descriptor index from available ring, ``i`` 574 575#. Set ``desc[i].counter`` to the value of global counter 576 577#. Increase global counter by 1 578 579#. Set ``desc[i].inflight`` to 1 580 581When supplying used buffers to the driver: 582 5831. Get corresponding used head-descriptor index, i 584 5852. Set ``desc[i].next`` to ``last_batch_head`` 586 5873. Set ``last_batch_head`` to ``i`` 588 589#. Steps 1,2,3 may be performed repeatedly if batching is possible 590 591#. Increase the ``idx`` value of used ring by the size of the batch 592 593#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 594 595#. Set ``used_idx`` to the ``idx`` value of used ring 596 597When reconnecting: 598 599#. If the value of ``used_idx`` does not match the ``idx`` value of 600 used ring (means the inflight field of ``DescStateSplit`` entries in 601 last batch may be incorrect), 602 603 a. Subtract the value of ``used_idx`` from the ``idx`` value of 604 used ring to get last batch size of ``DescStateSplit`` entries 605 606 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 607 list which starts from ``last_batch_head`` 608 609 #. Set ``used_idx`` to the ``idx`` value of used ring 610 611#. Resubmit inflight ``DescStateSplit`` entries in order of their 612 counter value 613 614For packed virtqueue, queue region can be implemented as: 615 616.. code:: c 617 618 typedef struct DescStatePacked { 619 /* Indicate whether this descriptor is inflight or not. 620 * Only available for head-descriptor. */ 621 uint8_t inflight; 622 623 /* Padding */ 624 uint8_t padding; 625 626 /* Link to the next free entry */ 627 uint16_t next; 628 629 /* Link to the last entry of descriptor list. 630 * Only available for head-descriptor. */ 631 uint16_t last; 632 633 /* The length of descriptor list. 634 * Only available for head-descriptor. */ 635 uint16_t num; 636 637 /* Used to preserve the order of fetching available descriptors. 638 * Only available for head-descriptor. */ 639 uint64_t counter; 640 641 /* The buffer id */ 642 uint16_t id; 643 644 /* The descriptor flags */ 645 uint16_t flags; 646 647 /* The buffer length */ 648 uint32_t len; 649 650 /* The buffer address */ 651 uint64_t addr; 652 } DescStatePacked; 653 654 typedef struct QueueRegionPacked { 655 /* The feature flags of this region. Now it's initialized to 0. */ 656 uint64_t features; 657 658 /* The version of this region. It's 1 currently. 659 * Zero value indicates an uninitialized buffer */ 660 uint16_t version; 661 662 /* The size of DescStatePacked array. It's equal to the virtqueue 663 * size. Slave could get it from queue size field of VhostUserInflight. */ 664 uint16_t desc_num; 665 666 /* The head of free DescStatePacked entry list */ 667 uint16_t free_head; 668 669 /* The old head of free DescStatePacked entry list */ 670 uint16_t old_free_head; 671 672 /* The used index of descriptor ring */ 673 uint16_t used_idx; 674 675 /* The old used index of descriptor ring */ 676 uint16_t old_used_idx; 677 678 /* Device ring wrap counter */ 679 uint8_t used_wrap_counter; 680 681 /* The old device ring wrap counter */ 682 uint8_t old_used_wrap_counter; 683 684 /* Padding */ 685 uint8_t padding[7]; 686 687 /* Used to track the state of each descriptor fetched from descriptor ring */ 688 DescStatePacked desc[0]; 689 } QueueRegionPacked; 690 691To track inflight I/O, the queue region should be processed as follows: 692 693When receiving available buffers from the driver: 694 695#. Get the next available descriptor entry from descriptor ring, ``d`` 696 697#. If ``d`` is head descriptor, 698 699 a. Set ``desc[old_free_head].num`` to 0 700 701 #. Set ``desc[old_free_head].counter`` to the value of global counter 702 703 #. Increase global counter by 1 704 705 #. Set ``desc[old_free_head].inflight`` to 1 706 707#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 708 ``free_head`` 709 710#. Increase ``desc[old_free_head].num`` by 1 711 712#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 713 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 714 ``d.len``, ``d.flags``, ``d.id`` 715 716#. Set ``free_head`` to ``desc[free_head].next`` 717 718#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 719 720When supplying used buffers to the driver: 721 7221. Get corresponding used head-descriptor entry from descriptor ring, 723 ``d`` 724 7252. Get corresponding ``DescStatePacked`` entry, ``e`` 726 7273. Set ``desc[e.last].next`` to ``free_head`` 728 7294. Set ``free_head`` to the index of ``e`` 730 731#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 732 733#. Increase ``used_idx`` by the size of the batch and update 734 ``used_wrap_counter`` if needed 735 736#. Update ``d.flags`` 737 738#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 739 in the batch to 0 740 741#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 742 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 743 744When reconnecting: 745 746#. If ``used_idx`` does not match ``old_used_idx`` (means the 747 ``inflight`` field of ``DescStatePacked`` entries in last batch may 748 be incorrect), 749 750 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 751 752 #. Use ``old_used_wrap_counter`` to calculate the available flags 753 754 #. If ``d.flags`` is not equal to the calculated flags value (means 755 slave has submitted the buffer to guest driver before crash, so 756 it has to commit the in-progres update), set ``old_free_head``, 757 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 758 ``used_idx``, ``used_wrap_counter`` 759 760#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 761 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 762 (roll back any in-progress update) 763 764#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 765 free list to 0 766 767#. Resubmit inflight ``DescStatePacked`` entries in order of their 768 counter value 769 770Protocol features 771----------------- 772 773.. code:: c 774 775 #define VHOST_USER_PROTOCOL_F_MQ 0 776 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 777 #define VHOST_USER_PROTOCOL_F_RARP 2 778 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 779 #define VHOST_USER_PROTOCOL_F_MTU 4 780 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 781 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 782 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 783 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 784 #define VHOST_USER_PROTOCOL_F_CONFIG 9 785 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 786 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 787 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 788 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 789 790Master message types 791-------------------- 792 793``VHOST_USER_GET_FEATURES`` 794 :id: 1 795 :equivalent ioctl: ``VHOST_GET_FEATURES`` 796 :master payload: N/A 797 :slave payload: ``u64`` 798 799 Get from the underlying vhost implementation the features bitmask. 800 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support 801 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 802 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 803 804``VHOST_USER_SET_FEATURES`` 805 :id: 2 806 :equivalent ioctl: ``VHOST_SET_FEATURES`` 807 :master payload: ``u64`` 808 809 Enable features in the underlying vhost implementation using a 810 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 811 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 812 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 813 814``VHOST_USER_GET_PROTOCOL_FEATURES`` 815 :id: 15 816 :equivalent ioctl: ``VHOST_GET_FEATURES`` 817 :master payload: N/A 818 :slave payload: ``u64`` 819 820 Get the protocol feature bitmask from the underlying vhost 821 implementation. Only legal if feature bit 822 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 823 ``VHOST_USER_GET_FEATURES``. 824 825.. Note:: 826 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must 827 support this message even before ``VHOST_USER_SET_FEATURES`` was 828 called. 829 830``VHOST_USER_SET_PROTOCOL_FEATURES`` 831 :id: 16 832 :equivalent ioctl: ``VHOST_SET_FEATURES`` 833 :master payload: ``u64`` 834 835 Enable protocol features in the underlying vhost implementation. 836 837 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 838 ``VHOST_USER_GET_FEATURES``. 839 840.. Note:: 841 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 842 this message even before ``VHOST_USER_SET_FEATURES`` was called. 843 844``VHOST_USER_SET_OWNER`` 845 :id: 3 846 :equivalent ioctl: ``VHOST_SET_OWNER`` 847 :master payload: N/A 848 849 Issued when a new connection is established. It sets the current 850 *master* as an owner of the session. This can be used on the *slave* 851 as a "session start" flag. 852 853``VHOST_USER_RESET_OWNER`` 854 :id: 4 855 :master payload: N/A 856 857.. admonition:: Deprecated 858 859 This is no longer used. Used to be sent to request disabling all 860 rings, but some clients interpreted it to also discard connection 861 state (this interpretation would lead to bugs). It is recommended 862 that clients either ignore this message, or use it to disable all 863 rings. 864 865``VHOST_USER_SET_MEM_TABLE`` 866 :id: 5 867 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 868 :master payload: memory regions description 869 :slave payload: (postcopy only) memory regions description 870 871 Sets the memory map regions on the slave so it can translate the 872 vring addresses. In the ancillary data there is an array of file 873 descriptors for each memory mapped region. The size and ordering of 874 the fds matches the number and ordering of memory regions. 875 876 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 877 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 878 regions to the master. The slave must have mmap'd the regions but 879 not yet accessed them and should not yet generate a userfault 880 event. 881 882.. Note:: 883 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 884 reply back to the list of mappings with an empty 885 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 886 reception of this message may the guest start accessing the memory 887 and generating faults. 888 889``VHOST_USER_SET_LOG_BASE`` 890 :id: 6 891 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 892 :master payload: u64 893 :slave payload: N/A 894 895 Sets logging shared memory space. 896 897 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 898 the log memory fd is provided in the ancillary data of 899 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 900 memory area provided in the message. 901 902``VHOST_USER_SET_LOG_FD`` 903 :id: 7 904 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 905 :master payload: N/A 906 907 Sets the logging file descriptor, which is passed as ancillary data. 908 909``VHOST_USER_SET_VRING_NUM`` 910 :id: 8 911 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 912 :master payload: vring state description 913 914 Set the size of the queue. 915 916``VHOST_USER_SET_VRING_ADDR`` 917 :id: 9 918 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 919 :master payload: vring address description 920 :slave payload: N/A 921 922 Sets the addresses of the different aspects of the vring. 923 924``VHOST_USER_SET_VRING_BASE`` 925 :id: 10 926 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 927 :master payload: vring state description 928 929 Sets the base offset in the available vring. 930 931``VHOST_USER_GET_VRING_BASE`` 932 :id: 11 933 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 934 :master payload: vring state description 935 :slave payload: vring state description 936 937 Get the available vring base offset. 938 939``VHOST_USER_SET_VRING_KICK`` 940 :id: 12 941 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 942 :master payload: ``u64`` 943 944 Set the event file descriptor for adding buffers to the vring. It is 945 passed in the ancillary data. 946 947 Bits (0-7) of the payload contain the vring index. Bit 8 is the 948 invalid FD flag. This flag is set when there is no file descriptor 949 in the ancillary data. This signals that polling should be used 950 instead of waiting for a kick. 951 952``VHOST_USER_SET_VRING_CALL`` 953 :id: 13 954 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 955 :master payload: ``u64`` 956 957 Set the event file descriptor to signal when buffers are used. It is 958 passed in the ancillary data. 959 960 Bits (0-7) of the payload contain the vring index. Bit 8 is the 961 invalid FD flag. This flag is set when there is no file descriptor 962 in the ancillary data. This signals that polling will be used 963 instead of waiting for the call. 964 965``VHOST_USER_SET_VRING_ERR`` 966 :id: 14 967 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 968 :master payload: ``u64`` 969 970 Set the event file descriptor to signal when error occurs. It is 971 passed in the ancillary data. 972 973 Bits (0-7) of the payload contain the vring index. Bit 8 is the 974 invalid FD flag. This flag is set when there is no file descriptor 975 in the ancillary data. 976 977``VHOST_USER_GET_QUEUE_NUM`` 978 :id: 17 979 :equivalent ioctl: N/A 980 :master payload: N/A 981 :slave payload: u64 982 983 Query how many queues the backend supports. 984 985 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 986 is set in queried protocol features by 987 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 988 989``VHOST_USER_SET_VRING_ENABLE`` 990 :id: 18 991 :equivalent ioctl: N/A 992 :master payload: vring state description 993 994 Signal slave to enable or disable corresponding vring. 995 996 This request should be sent only when 997 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 998 999``VHOST_USER_SEND_RARP`` 1000 :id: 19 1001 :equivalent ioctl: N/A 1002 :master payload: ``u64`` 1003 1004 Ask vhost user backend to broadcast a fake RARP to notify the migration 1005 is terminated for guest that does not support GUEST_ANNOUNCE. 1006 1007 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1008 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1009 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1010 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1011 payload contain the mac address of the guest to allow the vhost user 1012 backend to construct and broadcast the fake RARP. 1013 1014``VHOST_USER_NET_SET_MTU`` 1015 :id: 20 1016 :equivalent ioctl: N/A 1017 :master payload: ``u64`` 1018 1019 Set host MTU value exposed to the guest. 1020 1021 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1022 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1023 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1024 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1025 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1026 1027 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1028 respond with zero in case the specified MTU is valid, or non-zero 1029 otherwise. 1030 1031``VHOST_USER_SET_SLAVE_REQ_FD`` 1032 :id: 21 1033 :equivalent ioctl: N/A 1034 :master payload: N/A 1035 1036 Set the socket file descriptor for slave initiated requests. It is passed 1037 in the ancillary data. 1038 1039 This request should be sent only when 1040 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1041 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1042 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1043 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1044 respond with zero for success, non-zero otherwise. 1045 1046``VHOST_USER_IOTLB_MSG`` 1047 :id: 22 1048 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1049 :master payload: ``struct vhost_iotlb_msg`` 1050 :slave payload: ``u64`` 1051 1052 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1053 1054 Master sends such requests to update and invalidate entries in the 1055 device IOTLB. The slave has to acknowledge the request with sending 1056 zero as ``u64`` payload for success, non-zero otherwise. 1057 1058 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1059 feature has been successfully negotiated. 1060 1061``VHOST_USER_SET_VRING_ENDIAN`` 1062 :id: 23 1063 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1064 :master payload: vring state description 1065 1066 Set the endianness of a VQ for legacy devices. Little-endian is 1067 indicated with state.num set to 0 and big-endian is indicated with 1068 state.num set to 1. Other values are invalid. 1069 1070 This request should be sent only when 1071 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1072 Backends that negotiated this feature should handle both 1073 endiannesses and expect this message once (per VQ) during device 1074 configuration (ie. before the master starts the VQ). 1075 1076``VHOST_USER_GET_CONFIG`` 1077 :id: 24 1078 :equivalent ioctl: N/A 1079 :master payload: virtio device config space 1080 :slave payload: virtio device config space 1081 1082 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1083 submitted by the vhost-user master to fetch the contents of the 1084 virtio device configuration space, vhost-user slave's payload size 1085 MUST match master's request, vhost-user slave uses zero length of 1086 payload to indicate an error to vhost-user master. The vhost-user 1087 master may cache the contents to avoid repeated 1088 ``VHOST_USER_GET_CONFIG`` calls. 1089 1090``VHOST_USER_SET_CONFIG`` 1091 :id: 25 1092 :equivalent ioctl: N/A 1093 :master payload: virtio device config space 1094 :slave payload: N/A 1095 1096 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1097 submitted by the vhost-user master when the Guest changes the virtio 1098 device configuration space and also can be used for live migration 1099 on the destination host. The vhost-user slave must check the flags 1100 field, and slaves MUST NOT accept SET_CONFIG for read-only 1101 configuration space fields unless the live migration bit is set. 1102 1103``VHOST_USER_CREATE_CRYPTO_SESSION`` 1104 :id: 26 1105 :equivalent ioctl: N/A 1106 :master payload: crypto session description 1107 :slave payload: crypto session description 1108 1109 Create a session for crypto operation. The server side must return 1110 the session id, 0 or positive for success, negative for failure. 1111 This request should be sent only when 1112 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1113 successfully negotiated. It's a required feature for crypto 1114 devices. 1115 1116``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1117 :id: 27 1118 :equivalent ioctl: N/A 1119 :master payload: ``u64`` 1120 1121 Close a session for crypto operation which was previously 1122 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1123 1124 This request should be sent only when 1125 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1126 successfully negotiated. It's a required feature for crypto 1127 devices. 1128 1129``VHOST_USER_POSTCOPY_ADVISE`` 1130 :id: 28 1131 :master payload: N/A 1132 :slave payload: userfault fd 1133 1134 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master 1135 advises slave that a migration with postcopy enabled is underway, 1136 the slave must open a userfaultfd for later use. Note that at this 1137 stage the migration is still in precopy mode. 1138 1139``VHOST_USER_POSTCOPY_LISTEN`` 1140 :id: 29 1141 :master payload: N/A 1142 1143 Master advises slave that a transition to postcopy mode has 1144 happened. The slave must ensure that shared memory is registered 1145 with userfaultfd to cause faulting of non-present pages. 1146 1147 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1148 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1149 1150``VHOST_USER_POSTCOPY_END`` 1151 :id: 30 1152 :slave payload: ``u64`` 1153 1154 Master advises that postcopy migration has now completed. The slave 1155 must disable the userfaultfd. The response is an acknowledgement 1156 only. 1157 1158 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1159 is sent at the end of the migration, after 1160 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1161 1162 The value returned is an error indication; 0 is success. 1163 1164``VHOST_USER_GET_INFLIGHT_FD`` 1165 :id: 31 1166 :equivalent ioctl: N/A 1167 :master payload: inflight description 1168 1169 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1170 been successfully negotiated, this message is submitted by master to 1171 get a shared buffer from slave. The shared buffer will be used to 1172 track inflight I/O by slave. QEMU should retrieve a new one when vm 1173 reset. 1174 1175``VHOST_USER_SET_INFLIGHT_FD`` 1176 :id: 32 1177 :equivalent ioctl: N/A 1178 :master payload: inflight description 1179 1180 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1181 been successfully negotiated, this message is submitted by master to 1182 send the shared inflight buffer back to slave so that slave could 1183 get inflight I/O after a crash or restart. 1184 1185``VHOST_USER_GPU_SET_SOCKET`` 1186 :id: 33 1187 :equivalent ioctl: N/A 1188 :master payload: N/A 1189 1190 Sets the GPU protocol socket file descriptor, which is passed as 1191 ancillary data. The GPU protocol is used to inform the master of 1192 rendering state and updates. See vhost-user-gpu.rst for details. 1193 1194``VHOST_USER_RESET_DEVICE`` 1195 :id: 34 1196 :equivalent ioctl: N/A 1197 :master payload: N/A 1198 :slave payload: N/A 1199 1200 Ask the vhost user backend to disable all rings and reset all 1201 internal device state to the initial state, ready to be 1202 reinitialized. The backend retains ownership of the device 1203 throughout the reset operation. 1204 1205 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1206 feature is set by the backend. 1207 1208Slave message types 1209------------------- 1210 1211``VHOST_USER_SLAVE_IOTLB_MSG`` 1212 :id: 1 1213 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1214 :slave payload: ``struct vhost_iotlb_msg`` 1215 :master payload: N/A 1216 1217 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1218 Slave sends such requests to notify of an IOTLB miss, or an IOTLB 1219 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1220 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master 1221 must respond with zero when operation is successfully completed, or 1222 non-zero otherwise. This request should be send only when 1223 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1224 negotiated. 1225 1226``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1227 :id: 2 1228 :equivalent ioctl: N/A 1229 :slave payload: N/A 1230 :master payload: N/A 1231 1232 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1233 slave sends such messages to notify that the virtio device's 1234 configuration space has changed, for those host devices which can 1235 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1236 message to slave to get the latest content. If 1237 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the 1238 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when 1239 operation is successfully completed, or non-zero otherwise. 1240 1241``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1242 :id: 3 1243 :equivalent ioctl: N/A 1244 :slave payload: vring area description 1245 :master payload: N/A 1246 1247 Sets host notifier for a specified queue. The queue index is 1248 contained in the ``u64`` field of the vring area description. The 1249 host notifier is described by the file descriptor (typically it's a 1250 VFIO device fd) which is passed as ancillary data and the size 1251 (which is mmap size and should be the same as host page size) and 1252 offset (which is mmap offset) carried in the vring area 1253 description. QEMU can mmap the file descriptor based on the size and 1254 offset to get a memory range. Registering a host notifier means 1255 mapping this memory range to the VM as the specified queue's notify 1256 MMIO region. Slave sends this request to tell QEMU to de-register 1257 the existing notifier if any and register the new notifier if the 1258 request is sent with a file descriptor. 1259 1260 This request should be sent only when 1261 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1262 successfully negotiated. 1263 1264.. _reply_ack: 1265 1266VHOST_USER_PROTOCOL_F_REPLY_ACK 1267------------------------------- 1268 1269The original vhost-user specification only demands replies for certain 1270commands. This differs from the vhost protocol implementation where 1271commands are sent over an ``ioctl()`` call and block until the client 1272has completed. 1273 1274With this protocol extension negotiated, the sender (QEMU) can set the 1275``need_reply`` [Bit 3] flag to any command. This indicates that the 1276client MUST respond with a Payload ``VhostUserMsg`` indicating success 1277or failure. The payload should be set to zero on success or non-zero 1278on failure, unless the message already has an explicit reply body. 1279 1280The response payload gives QEMU a deterministic indication of the result 1281of the command. Today, QEMU is expected to terminate the main vhost-user 1282loop upon receiving such errors. In future, qemu could be taught to be more 1283resilient for selective requests. 1284 1285For the message types that already solicit a reply from the client, 1286the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1287being set brings no behavioural change. (See the Communication_ 1288section for details.) 1289 1290.. _backend_conventions: 1291 1292Backend program conventions 1293=========================== 1294 1295vhost-user backends can provide various devices & services and may 1296need to be configured manually depending on the use case. However, it 1297is a good idea to follow the conventions listed here when 1298possible. Users, QEMU or libvirt, can then rely on some common 1299behaviour to avoid heterogenous configuration and management of the 1300backend programs and facilitate interoperability. 1301 1302Each backend installed on a host system should come with at least one 1303JSON file that conforms to the vhost-user.json schema. Each file 1304informs the management applications about the backend type, and binary 1305location. In addition, it defines rules for management apps for 1306picking the highest priority backend when multiple match the search 1307criteria (see ``@VhostUserBackend`` documentation in the schema file). 1308 1309If the backend is not capable of enabling a requested feature on the 1310host (such as 3D acceleration with virgl), or the initialization 1311failed, the backend should fail to start early and exit with a status 1312!= 0. It may also print a message to stderr for further details. 1313 1314The backend program must not daemonize itself, but it may be 1315daemonized by the management layer. It may also have a restricted 1316access to the system. 1317 1318File descriptors 0, 1 and 2 will exist, and have regular 1319stdin/stdout/stderr usage (they may have been redirected to /dev/null 1320by the management layer, or to a log handler). 1321 1322The backend program must end (as quickly and cleanly as possible) when 1323the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1324the management layer after a few seconds. 1325 1326The following command line options have an expected behaviour. They 1327are mandatory, unless explicitly said differently: 1328 1329--socket-path=PATH 1330 1331 This option specify the location of the vhost-user Unix domain socket. 1332 It is incompatible with --fd. 1333 1334--fd=FDNUM 1335 1336 When this argument is given, the backend program is started with the 1337 vhost-user socket as file descriptor FDNUM. It is incompatible with 1338 --socket-path. 1339 1340--print-capabilities 1341 1342 Output to stdout the backend capabilities in JSON format, and then 1343 exit successfully. Other options and arguments should be ignored, and 1344 the backend program should not perform its normal function. The 1345 capabilities can be reported dynamically depending on the host 1346 capabilities. 1347 1348The JSON output is described in the ``vhost-user.json`` schema, by 1349```@VHostUserBackendCapabilities``. Example: 1350 1351.. code:: json 1352 1353 { 1354 "type": "foo", 1355 "features": [ 1356 "feature-a", 1357 "feature-b" 1358 ] 1359 } 1360 1361vhost-user-input 1362---------------- 1363 1364Command line options: 1365 1366--evdev-path=PATH 1367 1368 Specify the linux input device. 1369 1370 (optional) 1371 1372--no-grab 1373 1374 Do no request exclusive access to the input device. 1375 1376 (optional) 1377 1378vhost-user-gpu 1379-------------- 1380 1381Command line options: 1382 1383--render-node=PATH 1384 1385 Specify the GPU DRM render node. 1386 1387 (optional) 1388 1389--virgl 1390 1391 Enable virgl rendering support. 1392 1393 (optional) 1394 1395vhost-user-blk 1396-------------- 1397 1398Command line options: 1399 1400--blk-file=PATH 1401 1402 Specify block device or file path. 1403 1404 (optional) 1405 1406--read-only 1407 1408 Enable read-only. 1409 1410 (optional) 1411