1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *master* and 27*slave*. *Master* is the application that shares its virtqueues, in 28our case QEMU. *Slave* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *master*, and the *slave* is 31the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device backend processing read & write to a virtual 34disk. In order to facilitate interoperability between various backend 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38*Master* and *slave* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Support for platforms other than Linux 42-------------------------------------- 43 44While vhost-user was initially developed targeting Linux, nowadays it 45is supported on any platform that provides the following features: 46 47- A way for requesting shared memory represented by a file descriptor 48 so it can be passed over a UNIX domain socket and then mapped by the 49 other process. 50 51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can 52 exchange messages through it, including ancillary data when needed. 53 54- Either eventfd or pipe/pipe2. On platforms where eventfd is not 55 available, QEMU will automatically fall back to pipe2 or, as a last 56 resort, pipe. Each file descriptor will be used for receiving or 57 sending events by reading or writing (respectively) an 8-byte value 58 to the corresponding it. The 8-value itself has no meaning and 59 should not be interpreted. 60 61Message Specification 62===================== 63 64.. Note:: All numbers are in the machine native byte order. 65 66A vhost-user message consists of 3 header fields and a payload. 67 68+---------+-------+------+---------+ 69| request | flags | size | payload | 70+---------+-------+------+---------+ 71 72Header 73------ 74 75:request: 32-bit type of the request 76 77:flags: 32-bit bit field 78 79- Lower 2 bits are the version (currently 0x01) 80- Bit 2 is the reply flag - needs to be sent on each reply from the slave 81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 82 details. 83 84:size: 32-bit size of the payload 85 86Payload 87------- 88 89Depending on the request type, **payload** can be: 90 91A single 64-bit integer 92^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-----+ 95| u64 | 96+-----+ 97 98:u64: a 64-bit unsigned integer 99 100A vring state description 101^^^^^^^^^^^^^^^^^^^^^^^^^ 102 103+-------+-----+ 104| index | num | 105+-------+-----+ 106 107:index: a 32-bit index 108 109:num: a 32-bit number 110 111A vring address description 112^^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------+-------+------+------------+------+-----------+-----+ 115| index | flags | size | descriptor | used | available | log | 116+-------+-------+------+------------+------+-----------+-----+ 117 118:index: a 32-bit vring index 119 120:flags: a 32-bit vring flags 121 122:descriptor: a 64-bit ring address of the vring descriptor table 123 124:used: a 64-bit ring address of the vring used ring 125 126:available: a 64-bit ring address of the vring available ring 127 128:log: a 64-bit guest address for logging 129 130Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 131been negotiated. Otherwise it is a user address. 132 133Memory regions description 134^^^^^^^^^^^^^^^^^^^^^^^^^^ 135 136+-------------+---------+---------+-----+---------+ 137| num regions | padding | region0 | ... | region7 | 138+-------------+---------+---------+-----+---------+ 139 140:num regions: a 32-bit number of regions 141 142:padding: 32-bit 143 144A region is: 145 146+---------------+------+--------------+-------------+ 147| guest address | size | user address | mmap offset | 148+---------------+------+--------------+-------------+ 149 150:guest address: a 64-bit guest address of the region 151 152:size: a 64-bit size 153 154:user address: a 64-bit user address 155 156:mmap offset: 64-bit offset where region starts in the mapped memory 157 158Single memory region description 159^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 160 161+---------+---------------+------+--------------+-------------+ 162| padding | guest address | size | user address | mmap offset | 163+---------+---------------+------+--------------+-------------+ 164 165:padding: 64-bit 166 167:guest address: a 64-bit guest address of the region 168 169:size: a 64-bit size 170 171:user address: a 64-bit user address 172 173:mmap offset: 64-bit offset where region starts in the mapped memory 174 175Log description 176^^^^^^^^^^^^^^^ 177 178+----------+------------+ 179| log size | log offset | 180+----------+------------+ 181 182:log size: size of area used for logging 183 184:log offset: offset from start of supplied file descriptor where 185 logging starts (i.e. where guest address 0 would be 186 logged) 187 188An IOTLB message 189^^^^^^^^^^^^^^^^ 190 191+------+------+--------------+-------------------+------+ 192| iova | size | user address | permissions flags | type | 193+------+------+--------------+-------------------+------+ 194 195:iova: a 64-bit I/O virtual address programmed by the guest 196 197:size: a 64-bit size 198 199:user address: a 64-bit user address 200 201:permissions flags: an 8-bit value: 202 - 0: No access 203 - 1: Read access 204 - 2: Write access 205 - 3: Read/Write access 206 207:type: an 8-bit IOTLB message type: 208 - 1: IOTLB miss 209 - 2: IOTLB update 210 - 3: IOTLB invalidate 211 - 4: IOTLB access fail 212 213Virtio device config space 214^^^^^^^^^^^^^^^^^^^^^^^^^^ 215 216+--------+------+-------+---------+ 217| offset | size | flags | payload | 218+--------+------+-------+---------+ 219 220:offset: a 32-bit offset of virtio device's configuration space 221 222:size: a 32-bit configuration space access size in bytes 223 224:flags: a 32-bit value: 225 - 0: Vhost master messages used for writeable fields 226 - 1: Vhost master messages used for live migration 227 228:payload: Size bytes array holding the contents of the virtio 229 device's configuration space 230 231Vring area description 232^^^^^^^^^^^^^^^^^^^^^^ 233 234+-----+------+--------+ 235| u64 | size | offset | 236+-----+------+--------+ 237 238:u64: a 64-bit integer contains vring index and flags 239 240:size: a 64-bit size of this area 241 242:offset: a 64-bit offset of this area from the start of the 243 supplied file descriptor 244 245Inflight description 246^^^^^^^^^^^^^^^^^^^^ 247 248+-----------+-------------+------------+------------+ 249| mmap size | mmap offset | num queues | queue size | 250+-----------+-------------+------------+------------+ 251 252:mmap size: a 64-bit size of area to track inflight I/O 253 254:mmap offset: a 64-bit offset of this area from the start 255 of the supplied file descriptor 256 257:num queues: a 16-bit number of virtqueues 258 259:queue size: a 16-bit size of virtqueues 260 261C structure 262----------- 263 264In QEMU the vhost-user message is implemented with the following struct: 265 266.. code:: c 267 268 typedef struct VhostUserMsg { 269 VhostUserRequest request; 270 uint32_t flags; 271 uint32_t size; 272 union { 273 uint64_t u64; 274 struct vhost_vring_state state; 275 struct vhost_vring_addr addr; 276 VhostUserMemory memory; 277 VhostUserLog log; 278 struct vhost_iotlb_msg iotlb; 279 VhostUserConfig config; 280 VhostUserVringArea area; 281 VhostUserInflight inflight; 282 }; 283 } QEMU_PACKED VhostUserMsg; 284 285Communication 286============= 287 288The protocol for vhost-user is based on the existing implementation of 289vhost for the Linux Kernel. Most messages that can be sent via the 290Unix domain socket implementing vhost-user have an equivalent ioctl to 291the kernel implementation. 292 293The communication consists of *master* sending message requests and 294*slave* sending message replies. Most of the requests don't require 295replies. Here is a list of the ones that do: 296 297* ``VHOST_USER_GET_FEATURES`` 298* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 299* ``VHOST_USER_GET_VRING_BASE`` 300* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 301* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 302 303.. seealso:: 304 305 :ref:`REPLY_ACK <reply_ack>` 306 The section on ``REPLY_ACK`` protocol extension. 307 308There are several messages that the master sends with file descriptors passed 309in the ancillary data: 310 311* ``VHOST_USER_SET_MEM_TABLE`` 312* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 313* ``VHOST_USER_SET_LOG_FD`` 314* ``VHOST_USER_SET_VRING_KICK`` 315* ``VHOST_USER_SET_VRING_CALL`` 316* ``VHOST_USER_SET_VRING_ERR`` 317* ``VHOST_USER_SET_SLAVE_REQ_FD`` 318* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 319 320If *master* is unable to send the full message or receives a wrong 321reply it will close the connection. An optional reconnection mechanism 322can be implemented. 323 324If *slave* detects some error such as incompatible features, it may also 325close the connection. This should only happen in exceptional circumstances. 326 327Any protocol extensions are gated by protocol feature bits, which 328allows full backwards compatibility on both master and slave. As 329older slaves don't support negotiating protocol features, a feature 330bit was dedicated for this purpose:: 331 332 #define VHOST_USER_F_PROTOCOL_FEATURES 30 333 334Starting and stopping rings 335--------------------------- 336 337Client must only process each ring when it is started. 338 339Client must only pass data between the ring and the backend, when the 340ring is enabled. 341 342If ring is started but disabled, client must process the ring without 343talking to the backend. 344 345For example, for a networking device, in the disabled state client 346must not supply any new RX packets, but must process and discard any 347TX packets. 348 349If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 350ring is initialized in an enabled state. 351 352If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 353initialized in a disabled state. Client must not pass data to/from the 354backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with 355parameter 1, or after it has been disabled by 356``VHOST_USER_SET_VRING_ENABLE`` with parameter 0. 357 358Each ring is initialized in a stopped state, client must not process 359it until ring is started, or after it has been stopped. 360 361Client must start ring upon receiving a kick (that is, detecting that 362file descriptor is readable) on the descriptor specified by 363``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 364``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving 365``VHOST_USER_GET_VRING_BASE``. 366 367While processing the rings (whether they are enabled or not), client 368must support changing some configuration aspects on the fly. 369 370Multiple queue support 371---------------------- 372 373Many devices have a fixed number of virtqueues. In this case the master 374already knows the number of available virtqueues without communicating with the 375slave. 376 377Some devices do not have a fixed number of virtqueues. Instead the maximum 378number of virtqueues is chosen by the slave. The number can depend on host 379resource availability or slave implementation details. Such devices are called 380multiple queue devices. 381 382Multiple queue support allows the slave to advertise the maximum number of 383queues. This is treated as a protocol extension, hence the slave has to 384implement protocol features first. The multiple queues feature is supported 385only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 386 387The max number of queues the slave supports can be queried with message 388``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested 389queues is bigger than that. 390 391As all queues share one connection, the master uses a unique index for each 392queue in the sent message to identify a specified queue. 393 394The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 395vhost-user-net has historically automatically enabled the first queue pair. 396 397Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 398feature, even for devices with a fixed number of virtqueues, since it is simple 399to implement and offers a degree of introspection. 400 401Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 402devices with a fixed number of virtqueues. Only true multiqueue devices 403require this protocol feature. 404 405Migration 406--------- 407 408During live migration, the master may need to track the modifications 409the slave makes to the memory mapped regions. The client should mark 410the dirty pages in a log. Once it complies to this logging, it may 411declare the ``VHOST_F_LOG_ALL`` vhost feature. 412 413To start/stop logging of data/used ring writes, server may send 414messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 415``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 416flags set to 1/0, respectively. 417 418All the modifications to memory pointed by vring "descriptor" should 419be marked. Modifications to "used" vring should be marked if 420``VHOST_VRING_F_LOG`` is part of ring's flags. 421 422Dirty pages are of size:: 423 424 #define VHOST_LOG_PAGE 0x1000 425 426The log memory fd is provided in the ancillary data of 427``VHOST_USER_SET_LOG_BASE`` message when the slave has 428``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 429 430The size of the log is supplied as part of ``VhostUserMsg`` which 431should be large enough to cover all known guest addresses. Log starts 432at the supplied offset in the supplied file descriptor. The log 433covers from address 0 to the maximum of guest regions. In pseudo-code, 434to mark page at ``addr`` as dirty:: 435 436 page = addr / VHOST_LOG_PAGE 437 log[page / 8] |= 1 << page % 8 438 439Where ``addr`` is the guest physical address. 440 441Use atomic operations, as the log may be concurrently manipulated. 442 443Note that when logging modifications to the used ring (when 444``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 445be used to calculate the log offset: the write to first byte of the 446used ring is logged at this offset from log start. Also note that this 447value might be outside the legal guest physical address range 448(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 449the bit offset of the last byte of the ring must fall within the size 450supplied by ``VhostUserLog``. 451 452``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 453ancillary data, it may be used to inform the master that the log has 454been modified. 455 456Once the source has finished migration, rings will be stopped by the 457source. No further update must be done before rings are restarted. 458 459In postcopy migration the slave is started before all the memory has 460been received from the source host, and care must be taken to avoid 461accessing pages that have yet to be received. The slave opens a 462'userfault'-fd and registers the memory with it; this fd is then 463passed back over to the master. The master services requests on the 464userfaultfd for pages that are accessed and when the page is available 465it performs WAKE ioctl's on the userfaultfd to wake the stalled 466slave. The client indicates support for this via the 467``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 468 469Memory access 470------------- 471 472The master sends a list of vhost memory regions to the slave using the 473``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 474addresses: a guest address and a user address. 475 476Messages contain guest addresses and/or user addresses to reference locations 477within the shared memory. The mapping of these addresses works as follows. 478 479User addresses map to the vhost memory region containing that user address. 480 481When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 482 483* Guest addresses map to the vhost memory region containing that guest 484 address. 485 486When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 487 488* Guest addresses are also called I/O virtual addresses (IOVAs). They are 489 translated to user addresses via the IOTLB. 490 491* The vhost memory region guest address is not used. 492 493IOMMU support 494------------- 495 496When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 497master sends IOTLB entries update & invalidation by sending 498``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct 499vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 500has to be filled with the update message type (2), the I/O virtual 501address, the size, the user virtual address, and the permissions 502flags. Addresses and size must be within vhost memory regions set via 503the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 504``iotlb`` payload has to be filled with the invalidation message type 505(3), the I/O virtual address and the size. On success, the slave is 506expected to reply with a zero payload, non-zero otherwise. 507 508The slave relies on the slave communication channel (see :ref:`Slave 509communication <slave_communication>` section below) to send IOTLB miss 510and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 511requests to the master with a ``struct vhost_iotlb_msg`` as 512payload. For miss events, the iotlb payload has to be filled with the 513miss message type (1), the I/O virtual address and the permissions 514flags. For access failure event, the iotlb payload has to be filled 515with the access failure message type (4), the I/O virtual address and 516the permissions flags. For synchronization purpose, the slave may 517rely on the reply-ack feature, so the master may send a reply when 518operation is completed if the reply-ack feature is negotiated and 519slaves requests a reply. For miss events, completed operation means 520either master sent an update message containing the IOTLB entry 521containing requested address and permission, or master sent nothing if 522the IOTLB miss message is invalid (invalid IOVA or permission). 523 524The master isn't expected to take the initiative to send IOTLB update 525messages, as the slave sends IOTLB miss messages for the guest virtual 526memory areas it needs to access. 527 528.. _slave_communication: 529 530Slave communication 531------------------- 532 533An optional communication channel is provided if the slave declares 534``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 535slave to make requests to the master. 536 537The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 538 539A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master 540using this fd communication channel. 541 542If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 543negotiated, slave can send file descriptors (at most 8 descriptors in 544each message) to master via ancillary data using this fd communication 545channel. 546 547Inflight I/O tracking 548--------------------- 549 550To support reconnecting after restart or crash, slave may need to 551resubmit inflight I/Os. If virtqueue is processed in order, we can 552easily achieve that by getting the inflight descriptors from 553descriptor table (split virtqueue) or descriptor ring (packed 554virtqueue). However, it can't work when we process descriptors 555out-of-order because some entries which store the information of 556inflight descriptors in available ring (split virtqueue) or descriptor 557ring (packed virtqueue) might be overridden by new entries. To solve 558this problem, slave need to allocate an extra buffer to store this 559information of inflight descriptors and share it with master for 560persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 561``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 562between master and slave. And the format of this buffer is described 563below: 564 565+---------------+---------------+-----+---------------+ 566| queue0 region | queue1 region | ... | queueN region | 567+---------------+---------------+-----+---------------+ 568 569N is the number of available virtqueues. Slave could get it from num 570queues field of ``VhostUserInflight``. 571 572For split virtqueue, queue region can be implemented as: 573 574.. code:: c 575 576 typedef struct DescStateSplit { 577 /* Indicate whether this descriptor is inflight or not. 578 * Only available for head-descriptor. */ 579 uint8_t inflight; 580 581 /* Padding */ 582 uint8_t padding[5]; 583 584 /* Maintain a list for the last batch of used descriptors. 585 * Only available when batching is used for submitting */ 586 uint16_t next; 587 588 /* Used to preserve the order of fetching available descriptors. 589 * Only available for head-descriptor. */ 590 uint64_t counter; 591 } DescStateSplit; 592 593 typedef struct QueueRegionSplit { 594 /* The feature flags of this region. Now it's initialized to 0. */ 595 uint64_t features; 596 597 /* The version of this region. It's 1 currently. 598 * Zero value indicates an uninitialized buffer */ 599 uint16_t version; 600 601 /* The size of DescStateSplit array. It's equal to the virtqueue 602 * size. Slave could get it from queue size field of VhostUserInflight. */ 603 uint16_t desc_num; 604 605 /* The head of list that track the last batch of used descriptors. */ 606 uint16_t last_batch_head; 607 608 /* Store the idx value of used ring */ 609 uint16_t used_idx; 610 611 /* Used to track the state of each descriptor in descriptor table */ 612 DescStateSplit desc[]; 613 } QueueRegionSplit; 614 615To track inflight I/O, the queue region should be processed as follows: 616 617When receiving available buffers from the driver: 618 619#. Get the next available head-descriptor index from available ring, ``i`` 620 621#. Set ``desc[i].counter`` to the value of global counter 622 623#. Increase global counter by 1 624 625#. Set ``desc[i].inflight`` to 1 626 627When supplying used buffers to the driver: 628 6291. Get corresponding used head-descriptor index, i 630 6312. Set ``desc[i].next`` to ``last_batch_head`` 632 6333. Set ``last_batch_head`` to ``i`` 634 635#. Steps 1,2,3 may be performed repeatedly if batching is possible 636 637#. Increase the ``idx`` value of used ring by the size of the batch 638 639#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 640 641#. Set ``used_idx`` to the ``idx`` value of used ring 642 643When reconnecting: 644 645#. If the value of ``used_idx`` does not match the ``idx`` value of 646 used ring (means the inflight field of ``DescStateSplit`` entries in 647 last batch may be incorrect), 648 649 a. Subtract the value of ``used_idx`` from the ``idx`` value of 650 used ring to get last batch size of ``DescStateSplit`` entries 651 652 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 653 list which starts from ``last_batch_head`` 654 655 #. Set ``used_idx`` to the ``idx`` value of used ring 656 657#. Resubmit inflight ``DescStateSplit`` entries in order of their 658 counter value 659 660For packed virtqueue, queue region can be implemented as: 661 662.. code:: c 663 664 typedef struct DescStatePacked { 665 /* Indicate whether this descriptor is inflight or not. 666 * Only available for head-descriptor. */ 667 uint8_t inflight; 668 669 /* Padding */ 670 uint8_t padding; 671 672 /* Link to the next free entry */ 673 uint16_t next; 674 675 /* Link to the last entry of descriptor list. 676 * Only available for head-descriptor. */ 677 uint16_t last; 678 679 /* The length of descriptor list. 680 * Only available for head-descriptor. */ 681 uint16_t num; 682 683 /* Used to preserve the order of fetching available descriptors. 684 * Only available for head-descriptor. */ 685 uint64_t counter; 686 687 /* The buffer id */ 688 uint16_t id; 689 690 /* The descriptor flags */ 691 uint16_t flags; 692 693 /* The buffer length */ 694 uint32_t len; 695 696 /* The buffer address */ 697 uint64_t addr; 698 } DescStatePacked; 699 700 typedef struct QueueRegionPacked { 701 /* The feature flags of this region. Now it's initialized to 0. */ 702 uint64_t features; 703 704 /* The version of this region. It's 1 currently. 705 * Zero value indicates an uninitialized buffer */ 706 uint16_t version; 707 708 /* The size of DescStatePacked array. It's equal to the virtqueue 709 * size. Slave could get it from queue size field of VhostUserInflight. */ 710 uint16_t desc_num; 711 712 /* The head of free DescStatePacked entry list */ 713 uint16_t free_head; 714 715 /* The old head of free DescStatePacked entry list */ 716 uint16_t old_free_head; 717 718 /* The used index of descriptor ring */ 719 uint16_t used_idx; 720 721 /* The old used index of descriptor ring */ 722 uint16_t old_used_idx; 723 724 /* Device ring wrap counter */ 725 uint8_t used_wrap_counter; 726 727 /* The old device ring wrap counter */ 728 uint8_t old_used_wrap_counter; 729 730 /* Padding */ 731 uint8_t padding[7]; 732 733 /* Used to track the state of each descriptor fetched from descriptor ring */ 734 DescStatePacked desc[]; 735 } QueueRegionPacked; 736 737To track inflight I/O, the queue region should be processed as follows: 738 739When receiving available buffers from the driver: 740 741#. Get the next available descriptor entry from descriptor ring, ``d`` 742 743#. If ``d`` is head descriptor, 744 745 a. Set ``desc[old_free_head].num`` to 0 746 747 #. Set ``desc[old_free_head].counter`` to the value of global counter 748 749 #. Increase global counter by 1 750 751 #. Set ``desc[old_free_head].inflight`` to 1 752 753#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 754 ``free_head`` 755 756#. Increase ``desc[old_free_head].num`` by 1 757 758#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 759 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 760 ``d.len``, ``d.flags``, ``d.id`` 761 762#. Set ``free_head`` to ``desc[free_head].next`` 763 764#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 765 766When supplying used buffers to the driver: 767 7681. Get corresponding used head-descriptor entry from descriptor ring, 769 ``d`` 770 7712. Get corresponding ``DescStatePacked`` entry, ``e`` 772 7733. Set ``desc[e.last].next`` to ``free_head`` 774 7754. Set ``free_head`` to the index of ``e`` 776 777#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 778 779#. Increase ``used_idx`` by the size of the batch and update 780 ``used_wrap_counter`` if needed 781 782#. Update ``d.flags`` 783 784#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 785 in the batch to 0 786 787#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 788 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 789 790When reconnecting: 791 792#. If ``used_idx`` does not match ``old_used_idx`` (means the 793 ``inflight`` field of ``DescStatePacked`` entries in last batch may 794 be incorrect), 795 796 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 797 798 #. Use ``old_used_wrap_counter`` to calculate the available flags 799 800 #. If ``d.flags`` is not equal to the calculated flags value (means 801 slave has submitted the buffer to guest driver before crash, so 802 it has to commit the in-progres update), set ``old_free_head``, 803 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 804 ``used_idx``, ``used_wrap_counter`` 805 806#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 807 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 808 (roll back any in-progress update) 809 810#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 811 free list to 0 812 813#. Resubmit inflight ``DescStatePacked`` entries in order of their 814 counter value 815 816In-band notifications 817--------------------- 818 819In some limited situations (e.g. for simulation) it is desirable to 820have the kick, call and error (if used) signals done via in-band 821messages instead of asynchronous eventfd notifications. This can be 822done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 823protocol feature. 824 825Note that due to the fact that too many messages on the sockets can 826cause the sending application(s) to block, it is not advised to use 827this feature unless absolutely necessary. It is also considered an 828error to negotiate this feature without also negotiating 829``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 830the former is necessary for getting a message channel from the slave 831to the master, while the latter needs to be used with the in-band 832notification messages to block until they are processed, both to avoid 833blocking later and for proper processing (at least in the simulation 834use case.) As it has no other way of signalling this error, the slave 835should close the connection as a response to a 836``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 837notifications feature flag without the other two. 838 839Protocol features 840----------------- 841 842.. code:: c 843 844 #define VHOST_USER_PROTOCOL_F_MQ 0 845 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 846 #define VHOST_USER_PROTOCOL_F_RARP 2 847 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 848 #define VHOST_USER_PROTOCOL_F_MTU 4 849 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 850 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 851 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 852 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 853 #define VHOST_USER_PROTOCOL_F_CONFIG 9 854 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 855 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 856 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 857 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 858 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 859 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 860 #define VHOST_USER_PROTOCOL_F_STATUS 16 861 862Master message types 863-------------------- 864 865``VHOST_USER_GET_FEATURES`` 866 :id: 1 867 :equivalent ioctl: ``VHOST_GET_FEATURES`` 868 :master payload: N/A 869 :slave payload: ``u64`` 870 871 Get from the underlying vhost implementation the features bitmask. 872 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support 873 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 874 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 875 876``VHOST_USER_SET_FEATURES`` 877 :id: 2 878 :equivalent ioctl: ``VHOST_SET_FEATURES`` 879 :master payload: ``u64`` 880 881 Enable features in the underlying vhost implementation using a 882 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 883 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 884 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 885 886``VHOST_USER_GET_PROTOCOL_FEATURES`` 887 :id: 15 888 :equivalent ioctl: ``VHOST_GET_FEATURES`` 889 :master payload: N/A 890 :slave payload: ``u64`` 891 892 Get the protocol feature bitmask from the underlying vhost 893 implementation. Only legal if feature bit 894 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 895 ``VHOST_USER_GET_FEATURES``. 896 897.. Note:: 898 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must 899 support this message even before ``VHOST_USER_SET_FEATURES`` was 900 called. 901 902``VHOST_USER_SET_PROTOCOL_FEATURES`` 903 :id: 16 904 :equivalent ioctl: ``VHOST_SET_FEATURES`` 905 :master payload: ``u64`` 906 907 Enable protocol features in the underlying vhost implementation. 908 909 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 910 ``VHOST_USER_GET_FEATURES``. 911 912.. Note:: 913 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 914 this message even before ``VHOST_USER_SET_FEATURES`` was called. 915 916``VHOST_USER_SET_OWNER`` 917 :id: 3 918 :equivalent ioctl: ``VHOST_SET_OWNER`` 919 :master payload: N/A 920 921 Issued when a new connection is established. It sets the current 922 *master* as an owner of the session. This can be used on the *slave* 923 as a "session start" flag. 924 925``VHOST_USER_RESET_OWNER`` 926 :id: 4 927 :master payload: N/A 928 929.. admonition:: Deprecated 930 931 This is no longer used. Used to be sent to request disabling all 932 rings, but some clients interpreted it to also discard connection 933 state (this interpretation would lead to bugs). It is recommended 934 that clients either ignore this message, or use it to disable all 935 rings. 936 937``VHOST_USER_SET_MEM_TABLE`` 938 :id: 5 939 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 940 :master payload: memory regions description 941 :slave payload: (postcopy only) memory regions description 942 943 Sets the memory map regions on the slave so it can translate the 944 vring addresses. In the ancillary data there is an array of file 945 descriptors for each memory mapped region. The size and ordering of 946 the fds matches the number and ordering of memory regions. 947 948 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 949 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 950 regions to the master. The slave must have mmap'd the regions but 951 not yet accessed them and should not yet generate a userfault 952 event. 953 954.. Note:: 955 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 956 reply back to the list of mappings with an empty 957 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 958 reception of this message may the guest start accessing the memory 959 and generating faults. 960 961``VHOST_USER_SET_LOG_BASE`` 962 :id: 6 963 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 964 :master payload: u64 965 :slave payload: N/A 966 967 Sets logging shared memory space. 968 969 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 970 the log memory fd is provided in the ancillary data of 971 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 972 memory area provided in the message. 973 974``VHOST_USER_SET_LOG_FD`` 975 :id: 7 976 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 977 :master payload: N/A 978 979 Sets the logging file descriptor, which is passed as ancillary data. 980 981``VHOST_USER_SET_VRING_NUM`` 982 :id: 8 983 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 984 :master payload: vring state description 985 986 Set the size of the queue. 987 988``VHOST_USER_SET_VRING_ADDR`` 989 :id: 9 990 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 991 :master payload: vring address description 992 :slave payload: N/A 993 994 Sets the addresses of the different aspects of the vring. 995 996``VHOST_USER_SET_VRING_BASE`` 997 :id: 10 998 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 999 :master payload: vring state description 1000 1001 Sets the base offset in the available vring. 1002 1003``VHOST_USER_GET_VRING_BASE`` 1004 :id: 11 1005 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 1006 :master payload: vring state description 1007 :slave payload: vring state description 1008 1009 Get the available vring base offset. 1010 1011``VHOST_USER_SET_VRING_KICK`` 1012 :id: 12 1013 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 1014 :master payload: ``u64`` 1015 1016 Set the event file descriptor for adding buffers to the vring. It is 1017 passed in the ancillary data. 1018 1019 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1020 invalid FD flag. This flag is set when there is no file descriptor 1021 in the ancillary data. This signals that polling should be used 1022 instead of waiting for the kick. Note that if the protocol feature 1023 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1024 this message isn't necessary as the ring is also started on the 1025 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1026 set an event file descriptor (which will be preferred over the 1027 message) or to enable polling. 1028 1029``VHOST_USER_SET_VRING_CALL`` 1030 :id: 13 1031 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1032 :master payload: ``u64`` 1033 1034 Set the event file descriptor to signal when buffers are used. It is 1035 passed in the ancillary data. 1036 1037 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1038 invalid FD flag. This flag is set when there is no file descriptor 1039 in the ancillary data. This signals that polling will be used 1040 instead of waiting for the call. Note that if the protocol features 1041 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1042 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1043 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be 1044 used, it may however still be used to set an event file descriptor 1045 or to enable polling. 1046 1047``VHOST_USER_SET_VRING_ERR`` 1048 :id: 14 1049 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1050 :master payload: ``u64`` 1051 1052 Set the event file descriptor to signal when error occurs. It is 1053 passed in the ancillary data. 1054 1055 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1056 invalid FD flag. This flag is set when there is no file descriptor 1057 in the ancillary data. Note that if the protocol features 1058 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1059 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1060 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be 1061 used, it may however still be used to set an event file descriptor 1062 (which will be preferred over the message). 1063 1064``VHOST_USER_GET_QUEUE_NUM`` 1065 :id: 17 1066 :equivalent ioctl: N/A 1067 :master payload: N/A 1068 :slave payload: u64 1069 1070 Query how many queues the backend supports. 1071 1072 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1073 is set in queried protocol features by 1074 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1075 1076``VHOST_USER_SET_VRING_ENABLE`` 1077 :id: 18 1078 :equivalent ioctl: N/A 1079 :master payload: vring state description 1080 1081 Signal slave to enable or disable corresponding vring. 1082 1083 This request should be sent only when 1084 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1085 1086``VHOST_USER_SEND_RARP`` 1087 :id: 19 1088 :equivalent ioctl: N/A 1089 :master payload: ``u64`` 1090 1091 Ask vhost user backend to broadcast a fake RARP to notify the migration 1092 is terminated for guest that does not support GUEST_ANNOUNCE. 1093 1094 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1095 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1096 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1097 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1098 payload contain the mac address of the guest to allow the vhost user 1099 backend to construct and broadcast the fake RARP. 1100 1101``VHOST_USER_NET_SET_MTU`` 1102 :id: 20 1103 :equivalent ioctl: N/A 1104 :master payload: ``u64`` 1105 1106 Set host MTU value exposed to the guest. 1107 1108 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1109 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1110 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1111 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1112 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1113 1114 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1115 respond with zero in case the specified MTU is valid, or non-zero 1116 otherwise. 1117 1118``VHOST_USER_SET_SLAVE_REQ_FD`` 1119 :id: 21 1120 :equivalent ioctl: N/A 1121 :master payload: N/A 1122 1123 Set the socket file descriptor for slave initiated requests. It is passed 1124 in the ancillary data. 1125 1126 This request should be sent only when 1127 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1128 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1129 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1130 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1131 respond with zero for success, non-zero otherwise. 1132 1133``VHOST_USER_IOTLB_MSG`` 1134 :id: 22 1135 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1136 :master payload: ``struct vhost_iotlb_msg`` 1137 :slave payload: ``u64`` 1138 1139 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1140 1141 Master sends such requests to update and invalidate entries in the 1142 device IOTLB. The slave has to acknowledge the request with sending 1143 zero as ``u64`` payload for success, non-zero otherwise. 1144 1145 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1146 feature has been successfully negotiated. 1147 1148``VHOST_USER_SET_VRING_ENDIAN`` 1149 :id: 23 1150 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1151 :master payload: vring state description 1152 1153 Set the endianness of a VQ for legacy devices. Little-endian is 1154 indicated with state.num set to 0 and big-endian is indicated with 1155 state.num set to 1. Other values are invalid. 1156 1157 This request should be sent only when 1158 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1159 Backends that negotiated this feature should handle both 1160 endiannesses and expect this message once (per VQ) during device 1161 configuration (ie. before the master starts the VQ). 1162 1163``VHOST_USER_GET_CONFIG`` 1164 :id: 24 1165 :equivalent ioctl: N/A 1166 :master payload: virtio device config space 1167 :slave payload: virtio device config space 1168 1169 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1170 submitted by the vhost-user master to fetch the contents of the 1171 virtio device configuration space, vhost-user slave's payload size 1172 MUST match master's request, vhost-user slave uses zero length of 1173 payload to indicate an error to vhost-user master. The vhost-user 1174 master may cache the contents to avoid repeated 1175 ``VHOST_USER_GET_CONFIG`` calls. 1176 1177``VHOST_USER_SET_CONFIG`` 1178 :id: 25 1179 :equivalent ioctl: N/A 1180 :master payload: virtio device config space 1181 :slave payload: N/A 1182 1183 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1184 submitted by the vhost-user master when the Guest changes the virtio 1185 device configuration space and also can be used for live migration 1186 on the destination host. The vhost-user slave must check the flags 1187 field, and slaves MUST NOT accept SET_CONFIG for read-only 1188 configuration space fields unless the live migration bit is set. 1189 1190``VHOST_USER_CREATE_CRYPTO_SESSION`` 1191 :id: 26 1192 :equivalent ioctl: N/A 1193 :master payload: crypto session description 1194 :slave payload: crypto session description 1195 1196 Create a session for crypto operation. The server side must return 1197 the session id, 0 or positive for success, negative for failure. 1198 This request should be sent only when 1199 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1200 successfully negotiated. It's a required feature for crypto 1201 devices. 1202 1203``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1204 :id: 27 1205 :equivalent ioctl: N/A 1206 :master payload: ``u64`` 1207 1208 Close a session for crypto operation which was previously 1209 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1210 1211 This request should be sent only when 1212 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1213 successfully negotiated. It's a required feature for crypto 1214 devices. 1215 1216``VHOST_USER_POSTCOPY_ADVISE`` 1217 :id: 28 1218 :master payload: N/A 1219 :slave payload: userfault fd 1220 1221 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master 1222 advises slave that a migration with postcopy enabled is underway, 1223 the slave must open a userfaultfd for later use. Note that at this 1224 stage the migration is still in precopy mode. 1225 1226``VHOST_USER_POSTCOPY_LISTEN`` 1227 :id: 29 1228 :master payload: N/A 1229 1230 Master advises slave that a transition to postcopy mode has 1231 happened. The slave must ensure that shared memory is registered 1232 with userfaultfd to cause faulting of non-present pages. 1233 1234 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1235 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1236 1237``VHOST_USER_POSTCOPY_END`` 1238 :id: 30 1239 :slave payload: ``u64`` 1240 1241 Master advises that postcopy migration has now completed. The slave 1242 must disable the userfaultfd. The response is an acknowledgement 1243 only. 1244 1245 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1246 is sent at the end of the migration, after 1247 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1248 1249 The value returned is an error indication; 0 is success. 1250 1251``VHOST_USER_GET_INFLIGHT_FD`` 1252 :id: 31 1253 :equivalent ioctl: N/A 1254 :master payload: inflight description 1255 1256 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1257 been successfully negotiated, this message is submitted by master to 1258 get a shared buffer from slave. The shared buffer will be used to 1259 track inflight I/O by slave. QEMU should retrieve a new one when vm 1260 reset. 1261 1262``VHOST_USER_SET_INFLIGHT_FD`` 1263 :id: 32 1264 :equivalent ioctl: N/A 1265 :master payload: inflight description 1266 1267 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1268 been successfully negotiated, this message is submitted by master to 1269 send the shared inflight buffer back to slave so that slave could 1270 get inflight I/O after a crash or restart. 1271 1272``VHOST_USER_GPU_SET_SOCKET`` 1273 :id: 33 1274 :equivalent ioctl: N/A 1275 :master payload: N/A 1276 1277 Sets the GPU protocol socket file descriptor, which is passed as 1278 ancillary data. The GPU protocol is used to inform the master of 1279 rendering state and updates. See vhost-user-gpu.rst for details. 1280 1281``VHOST_USER_RESET_DEVICE`` 1282 :id: 34 1283 :equivalent ioctl: N/A 1284 :master payload: N/A 1285 :slave payload: N/A 1286 1287 Ask the vhost user backend to disable all rings and reset all 1288 internal device state to the initial state, ready to be 1289 reinitialized. The backend retains ownership of the device 1290 throughout the reset operation. 1291 1292 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1293 feature is set by the backend. 1294 1295``VHOST_USER_VRING_KICK`` 1296 :id: 35 1297 :equivalent ioctl: N/A 1298 :slave payload: vring state description 1299 :master payload: N/A 1300 1301 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1302 feature has been successfully negotiated, this message may be 1303 submitted by the master to indicate that a buffer was added to 1304 the vring instead of signalling it using the vring's kick file 1305 descriptor or having the slave rely on polling. 1306 1307 The state.num field is currently reserved and must be set to 0. 1308 1309``VHOST_USER_GET_MAX_MEM_SLOTS`` 1310 :id: 36 1311 :equivalent ioctl: N/A 1312 :slave payload: u64 1313 1314 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1315 feature has been successfully negotiated, this message is submitted 1316 by master to the slave. The slave should return the message with a 1317 u64 payload containing the maximum number of memory slots for 1318 QEMU to expose to the guest. The value returned by the backend 1319 will be capped at the maximum number of ram slots which can be 1320 supported by the target platform. 1321 1322``VHOST_USER_ADD_MEM_REG`` 1323 :id: 37 1324 :equivalent ioctl: N/A 1325 :slave payload: single memory region description 1326 1327 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1328 feature has been successfully negotiated, this message is submitted 1329 by the master to the slave. The message payload contains a memory 1330 region descriptor struct, describing a region of guest memory which 1331 the slave device must map in. When the 1332 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1333 been successfully negotiated, along with the 1334 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1335 update the memory tables of the slave device. 1336 1337``VHOST_USER_REM_MEM_REG`` 1338 :id: 38 1339 :equivalent ioctl: N/A 1340 :slave payload: single memory region description 1341 1342 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1343 feature has been successfully negotiated, this message is submitted 1344 by the master to the slave. The message payload contains a memory 1345 region descriptor struct, describing a region of guest memory which 1346 the slave device must unmap. When the 1347 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1348 been successfully negotiated, along with the 1349 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1350 update the memory tables of the slave device. 1351 1352``VHOST_USER_SET_STATUS`` 1353 :id: 39 1354 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1355 :slave payload: N/A 1356 :master payload: ``u64`` 1357 1358 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1359 successfully negotiated, this message is submitted by the master to 1360 notify the backend with updated device status as defined in the Virtio 1361 specification. 1362 1363``VHOST_USER_GET_STATUS`` 1364 :id: 40 1365 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1366 :slave payload: ``u64`` 1367 :master payload: N/A 1368 1369 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1370 successfully negotiated, this message is submitted by the master to 1371 query the backend for its device status as defined in the Virtio 1372 specification. 1373 1374 1375Slave message types 1376------------------- 1377 1378``VHOST_USER_SLAVE_IOTLB_MSG`` 1379 :id: 1 1380 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1381 :slave payload: ``struct vhost_iotlb_msg`` 1382 :master payload: N/A 1383 1384 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1385 Slave sends such requests to notify of an IOTLB miss, or an IOTLB 1386 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1387 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master 1388 must respond with zero when operation is successfully completed, or 1389 non-zero otherwise. This request should be send only when 1390 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1391 negotiated. 1392 1393``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1394 :id: 2 1395 :equivalent ioctl: N/A 1396 :slave payload: N/A 1397 :master payload: N/A 1398 1399 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1400 slave sends such messages to notify that the virtio device's 1401 configuration space has changed, for those host devices which can 1402 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1403 message to slave to get the latest content. If 1404 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the 1405 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when 1406 operation is successfully completed, or non-zero otherwise. 1407 1408``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1409 :id: 3 1410 :equivalent ioctl: N/A 1411 :slave payload: vring area description 1412 :master payload: N/A 1413 1414 Sets host notifier for a specified queue. The queue index is 1415 contained in the ``u64`` field of the vring area description. The 1416 host notifier is described by the file descriptor (typically it's a 1417 VFIO device fd) which is passed as ancillary data and the size 1418 (which is mmap size and should be the same as host page size) and 1419 offset (which is mmap offset) carried in the vring area 1420 description. QEMU can mmap the file descriptor based on the size and 1421 offset to get a memory range. Registering a host notifier means 1422 mapping this memory range to the VM as the specified queue's notify 1423 MMIO region. Slave sends this request to tell QEMU to de-register 1424 the existing notifier if any and register the new notifier if the 1425 request is sent with a file descriptor. 1426 1427 This request should be sent only when 1428 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1429 successfully negotiated. 1430 1431``VHOST_USER_SLAVE_VRING_CALL`` 1432 :id: 4 1433 :equivalent ioctl: N/A 1434 :slave payload: vring state description 1435 :master payload: N/A 1436 1437 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1438 feature has been successfully negotiated, this message may be 1439 submitted by the slave to indicate that a buffer was used from 1440 the vring instead of signalling this using the vring's call file 1441 descriptor or having the master relying on polling. 1442 1443 The state.num field is currently reserved and must be set to 0. 1444 1445``VHOST_USER_SLAVE_VRING_ERR`` 1446 :id: 5 1447 :equivalent ioctl: N/A 1448 :slave payload: vring state description 1449 :master payload: N/A 1450 1451 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1452 feature has been successfully negotiated, this message may be 1453 submitted by the slave to indicate that an error occurred on the 1454 specific vring, instead of signalling the error file descriptor 1455 set by the master via ``VHOST_USER_SET_VRING_ERR``. 1456 1457 The state.num field is currently reserved and must be set to 0. 1458 1459.. _reply_ack: 1460 1461VHOST_USER_PROTOCOL_F_REPLY_ACK 1462------------------------------- 1463 1464The original vhost-user specification only demands replies for certain 1465commands. This differs from the vhost protocol implementation where 1466commands are sent over an ``ioctl()`` call and block until the client 1467has completed. 1468 1469With this protocol extension negotiated, the sender (QEMU) can set the 1470``need_reply`` [Bit 3] flag to any command. This indicates that the 1471client MUST respond with a Payload ``VhostUserMsg`` indicating success 1472or failure. The payload should be set to zero on success or non-zero 1473on failure, unless the message already has an explicit reply body. 1474 1475The response payload gives QEMU a deterministic indication of the result 1476of the command. Today, QEMU is expected to terminate the main vhost-user 1477loop upon receiving such errors. In future, qemu could be taught to be more 1478resilient for selective requests. 1479 1480For the message types that already solicit a reply from the client, 1481the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1482being set brings no behavioural change. (See the Communication_ 1483section for details.) 1484 1485.. _backend_conventions: 1486 1487Backend program conventions 1488=========================== 1489 1490vhost-user backends can provide various devices & services and may 1491need to be configured manually depending on the use case. However, it 1492is a good idea to follow the conventions listed here when 1493possible. Users, QEMU or libvirt, can then rely on some common 1494behaviour to avoid heterogeneous configuration and management of the 1495backend programs and facilitate interoperability. 1496 1497Each backend installed on a host system should come with at least one 1498JSON file that conforms to the vhost-user.json schema. Each file 1499informs the management applications about the backend type, and binary 1500location. In addition, it defines rules for management apps for 1501picking the highest priority backend when multiple match the search 1502criteria (see ``@VhostUserBackend`` documentation in the schema file). 1503 1504If the backend is not capable of enabling a requested feature on the 1505host (such as 3D acceleration with virgl), or the initialization 1506failed, the backend should fail to start early and exit with a status 1507!= 0. It may also print a message to stderr for further details. 1508 1509The backend program must not daemonize itself, but it may be 1510daemonized by the management layer. It may also have a restricted 1511access to the system. 1512 1513File descriptors 0, 1 and 2 will exist, and have regular 1514stdin/stdout/stderr usage (they may have been redirected to /dev/null 1515by the management layer, or to a log handler). 1516 1517The backend program must end (as quickly and cleanly as possible) when 1518the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1519the management layer after a few seconds. 1520 1521The following command line options have an expected behaviour. They 1522are mandatory, unless explicitly said differently: 1523 1524--socket-path=PATH 1525 1526 This option specify the location of the vhost-user Unix domain socket. 1527 It is incompatible with --fd. 1528 1529--fd=FDNUM 1530 1531 When this argument is given, the backend program is started with the 1532 vhost-user socket as file descriptor FDNUM. It is incompatible with 1533 --socket-path. 1534 1535--print-capabilities 1536 1537 Output to stdout the backend capabilities in JSON format, and then 1538 exit successfully. Other options and arguments should be ignored, and 1539 the backend program should not perform its normal function. The 1540 capabilities can be reported dynamically depending on the host 1541 capabilities. 1542 1543The JSON output is described in the ``vhost-user.json`` schema, by 1544```@VHostUserBackendCapabilities``. Example: 1545 1546.. code:: json 1547 1548 { 1549 "type": "foo", 1550 "features": [ 1551 "feature-a", 1552 "feature-b" 1553 ] 1554 } 1555 1556vhost-user-input 1557---------------- 1558 1559Command line options: 1560 1561--evdev-path=PATH 1562 1563 Specify the linux input device. 1564 1565 (optional) 1566 1567--no-grab 1568 1569 Do no request exclusive access to the input device. 1570 1571 (optional) 1572 1573vhost-user-gpu 1574-------------- 1575 1576Command line options: 1577 1578--render-node=PATH 1579 1580 Specify the GPU DRM render node. 1581 1582 (optional) 1583 1584--virgl 1585 1586 Enable virgl rendering support. 1587 1588 (optional) 1589 1590vhost-user-blk 1591-------------- 1592 1593Command line options: 1594 1595--blk-file=PATH 1596 1597 Specify block device or file path. 1598 1599 (optional) 1600 1601--read-only 1602 1603 Enable read-only. 1604 1605 (optional) 1606