1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *master* and 27*slave*. *Master* is the application that shares its virtqueues, in 28our case QEMU. *Slave* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *master*, and the *slave* is 31the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device backend processing read & write to a virtual 34disk. In order to facilitate interoperability between various backend 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38*Master* and *slave* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Support for platforms other than Linux 42-------------------------------------- 43 44While vhost-user was initially developed targeting Linux, nowadays it 45is supported on any platform that provides the following features: 46 47- A way for requesting shared memory represented by a file descriptor 48 so it can be passed over a UNIX domain socket and then mapped by the 49 other process. 50 51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can 52 exchange messages through it, including ancillary data when needed. 53 54- Either eventfd or pipe/pipe2. On platforms where eventfd is not 55 available, QEMU will automatically fall back to pipe2 or, as a last 56 resort, pipe. Each file descriptor will be used for receiving or 57 sending events by reading or writing (respectively) an 8-byte value 58 to the corresponding it. The 8-value itself has no meaning and 59 should not be interpreted. 60 61Message Specification 62===================== 63 64.. Note:: All numbers are in the machine native byte order. 65 66A vhost-user message consists of 3 header fields and a payload. 67 68+---------+-------+------+---------+ 69| request | flags | size | payload | 70+---------+-------+------+---------+ 71 72Header 73------ 74 75:request: 32-bit type of the request 76 77:flags: 32-bit bit field 78 79- Lower 2 bits are the version (currently 0x01) 80- Bit 2 is the reply flag - needs to be sent on each reply from the slave 81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 82 details. 83 84:size: 32-bit size of the payload 85 86Payload 87------- 88 89Depending on the request type, **payload** can be: 90 91A single 64-bit integer 92^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-----+ 95| u64 | 96+-----+ 97 98:u64: a 64-bit unsigned integer 99 100A vring state description 101^^^^^^^^^^^^^^^^^^^^^^^^^ 102 103+-------+-----+ 104| index | num | 105+-------+-----+ 106 107:index: a 32-bit index 108 109:num: a 32-bit number 110 111A vring address description 112^^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------+-------+------+------------+------+-----------+-----+ 115| index | flags | size | descriptor | used | available | log | 116+-------+-------+------+------------+------+-----------+-----+ 117 118:index: a 32-bit vring index 119 120:flags: a 32-bit vring flags 121 122:descriptor: a 64-bit ring address of the vring descriptor table 123 124:used: a 64-bit ring address of the vring used ring 125 126:available: a 64-bit ring address of the vring available ring 127 128:log: a 64-bit guest address for logging 129 130Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 131been negotiated. Otherwise it is a user address. 132 133Memory regions description 134^^^^^^^^^^^^^^^^^^^^^^^^^^ 135 136+-------------+---------+---------+-----+---------+ 137| num regions | padding | region0 | ... | region7 | 138+-------------+---------+---------+-----+---------+ 139 140:num regions: a 32-bit number of regions 141 142:padding: 32-bit 143 144A region is: 145 146+---------------+------+--------------+-------------+ 147| guest address | size | user address | mmap offset | 148+---------------+------+--------------+-------------+ 149 150:guest address: a 64-bit guest address of the region 151 152:size: a 64-bit size 153 154:user address: a 64-bit user address 155 156:mmap offset: 64-bit offset where region starts in the mapped memory 157 158Single memory region description 159^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 160 161+---------+---------------+------+--------------+-------------+ 162| padding | guest address | size | user address | mmap offset | 163+---------+---------------+------+--------------+-------------+ 164 165:padding: 64-bit 166 167:guest address: a 64-bit guest address of the region 168 169:size: a 64-bit size 170 171:user address: a 64-bit user address 172 173:mmap offset: 64-bit offset where region starts in the mapped memory 174 175Log description 176^^^^^^^^^^^^^^^ 177 178+----------+------------+ 179| log size | log offset | 180+----------+------------+ 181 182:log size: size of area used for logging 183 184:log offset: offset from start of supplied file descriptor where 185 logging starts (i.e. where guest address 0 would be 186 logged) 187 188An IOTLB message 189^^^^^^^^^^^^^^^^ 190 191+------+------+--------------+-------------------+------+ 192| iova | size | user address | permissions flags | type | 193+------+------+--------------+-------------------+------+ 194 195:iova: a 64-bit I/O virtual address programmed by the guest 196 197:size: a 64-bit size 198 199:user address: a 64-bit user address 200 201:permissions flags: an 8-bit value: 202 - 0: No access 203 - 1: Read access 204 - 2: Write access 205 - 3: Read/Write access 206 207:type: an 8-bit IOTLB message type: 208 - 1: IOTLB miss 209 - 2: IOTLB update 210 - 3: IOTLB invalidate 211 - 4: IOTLB access fail 212 213Virtio device config space 214^^^^^^^^^^^^^^^^^^^^^^^^^^ 215 216+--------+------+-------+---------+ 217| offset | size | flags | payload | 218+--------+------+-------+---------+ 219 220:offset: a 32-bit offset of virtio device's configuration space 221 222:size: a 32-bit configuration space access size in bytes 223 224:flags: a 32-bit value: 225 - 0: Vhost master messages used for writeable fields 226 - 1: Vhost master messages used for live migration 227 228:payload: Size bytes array holding the contents of the virtio 229 device's configuration space 230 231Vring area description 232^^^^^^^^^^^^^^^^^^^^^^ 233 234+-----+------+--------+ 235| u64 | size | offset | 236+-----+------+--------+ 237 238:u64: a 64-bit integer contains vring index and flags 239 240:size: a 64-bit size of this area 241 242:offset: a 64-bit offset of this area from the start of the 243 supplied file descriptor 244 245Inflight description 246^^^^^^^^^^^^^^^^^^^^ 247 248+-----------+-------------+------------+------------+ 249| mmap size | mmap offset | num queues | queue size | 250+-----------+-------------+------------+------------+ 251 252:mmap size: a 64-bit size of area to track inflight I/O 253 254:mmap offset: a 64-bit offset of this area from the start 255 of the supplied file descriptor 256 257:num queues: a 16-bit number of virtqueues 258 259:queue size: a 16-bit size of virtqueues 260 261C structure 262----------- 263 264In QEMU the vhost-user message is implemented with the following struct: 265 266.. code:: c 267 268 typedef struct VhostUserMsg { 269 VhostUserRequest request; 270 uint32_t flags; 271 uint32_t size; 272 union { 273 uint64_t u64; 274 struct vhost_vring_state state; 275 struct vhost_vring_addr addr; 276 VhostUserMemory memory; 277 VhostUserLog log; 278 struct vhost_iotlb_msg iotlb; 279 VhostUserConfig config; 280 VhostUserVringArea area; 281 VhostUserInflight inflight; 282 }; 283 } QEMU_PACKED VhostUserMsg; 284 285Communication 286============= 287 288The protocol for vhost-user is based on the existing implementation of 289vhost for the Linux Kernel. Most messages that can be sent via the 290Unix domain socket implementing vhost-user have an equivalent ioctl to 291the kernel implementation. 292 293The communication consists of *master* sending message requests and 294*slave* sending message replies. Most of the requests don't require 295replies. Here is a list of the ones that do: 296 297* ``VHOST_USER_GET_FEATURES`` 298* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 299* ``VHOST_USER_GET_VRING_BASE`` 300* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 301* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 302 303.. seealso:: 304 305 :ref:`REPLY_ACK <reply_ack>` 306 The section on ``REPLY_ACK`` protocol extension. 307 308There are several messages that the master sends with file descriptors passed 309in the ancillary data: 310 311* ``VHOST_USER_ADD_MEM_REG`` 312* ``VHOST_USER_SET_MEM_TABLE`` 313* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 314* ``VHOST_USER_SET_LOG_FD`` 315* ``VHOST_USER_SET_VRING_KICK`` 316* ``VHOST_USER_SET_VRING_CALL`` 317* ``VHOST_USER_SET_VRING_ERR`` 318* ``VHOST_USER_SET_SLAVE_REQ_FD`` 319* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 320 321If *master* is unable to send the full message or receives a wrong 322reply it will close the connection. An optional reconnection mechanism 323can be implemented. 324 325If *slave* detects some error such as incompatible features, it may also 326close the connection. This should only happen in exceptional circumstances. 327 328Any protocol extensions are gated by protocol feature bits, which 329allows full backwards compatibility on both master and slave. As 330older slaves don't support negotiating protocol features, a feature 331bit was dedicated for this purpose:: 332 333 #define VHOST_USER_F_PROTOCOL_FEATURES 30 334 335Starting and stopping rings 336--------------------------- 337 338Client must only process each ring when it is started. 339 340Client must only pass data between the ring and the backend, when the 341ring is enabled. 342 343If ring is started but disabled, client must process the ring without 344talking to the backend. 345 346For example, for a networking device, in the disabled state client 347must not supply any new RX packets, but must process and discard any 348TX packets. 349 350If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 351ring is initialized in an enabled state. 352 353If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 354initialized in a disabled state. Client must not pass data to/from the 355backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with 356parameter 1, or after it has been disabled by 357``VHOST_USER_SET_VRING_ENABLE`` with parameter 0. 358 359Each ring is initialized in a stopped state, client must not process 360it until ring is started, or after it has been stopped. 361 362Client must start ring upon receiving a kick (that is, detecting that 363file descriptor is readable) on the descriptor specified by 364``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 365``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving 366``VHOST_USER_GET_VRING_BASE``. 367 368While processing the rings (whether they are enabled or not), client 369must support changing some configuration aspects on the fly. 370 371Multiple queue support 372---------------------- 373 374Many devices have a fixed number of virtqueues. In this case the master 375already knows the number of available virtqueues without communicating with the 376slave. 377 378Some devices do not have a fixed number of virtqueues. Instead the maximum 379number of virtqueues is chosen by the slave. The number can depend on host 380resource availability or slave implementation details. Such devices are called 381multiple queue devices. 382 383Multiple queue support allows the slave to advertise the maximum number of 384queues. This is treated as a protocol extension, hence the slave has to 385implement protocol features first. The multiple queues feature is supported 386only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 387 388The max number of queues the slave supports can be queried with message 389``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested 390queues is bigger than that. 391 392As all queues share one connection, the master uses a unique index for each 393queue in the sent message to identify a specified queue. 394 395The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 396vhost-user-net has historically automatically enabled the first queue pair. 397 398Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 399feature, even for devices with a fixed number of virtqueues, since it is simple 400to implement and offers a degree of introspection. 401 402Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 403devices with a fixed number of virtqueues. Only true multiqueue devices 404require this protocol feature. 405 406Migration 407--------- 408 409During live migration, the master may need to track the modifications 410the slave makes to the memory mapped regions. The client should mark 411the dirty pages in a log. Once it complies to this logging, it may 412declare the ``VHOST_F_LOG_ALL`` vhost feature. 413 414To start/stop logging of data/used ring writes, server may send 415messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 416``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 417flags set to 1/0, respectively. 418 419All the modifications to memory pointed by vring "descriptor" should 420be marked. Modifications to "used" vring should be marked if 421``VHOST_VRING_F_LOG`` is part of ring's flags. 422 423Dirty pages are of size:: 424 425 #define VHOST_LOG_PAGE 0x1000 426 427The log memory fd is provided in the ancillary data of 428``VHOST_USER_SET_LOG_BASE`` message when the slave has 429``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 430 431The size of the log is supplied as part of ``VhostUserMsg`` which 432should be large enough to cover all known guest addresses. Log starts 433at the supplied offset in the supplied file descriptor. The log 434covers from address 0 to the maximum of guest regions. In pseudo-code, 435to mark page at ``addr`` as dirty:: 436 437 page = addr / VHOST_LOG_PAGE 438 log[page / 8] |= 1 << page % 8 439 440Where ``addr`` is the guest physical address. 441 442Use atomic operations, as the log may be concurrently manipulated. 443 444Note that when logging modifications to the used ring (when 445``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 446be used to calculate the log offset: the write to first byte of the 447used ring is logged at this offset from log start. Also note that this 448value might be outside the legal guest physical address range 449(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 450the bit offset of the last byte of the ring must fall within the size 451supplied by ``VhostUserLog``. 452 453``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 454ancillary data, it may be used to inform the master that the log has 455been modified. 456 457Once the source has finished migration, rings will be stopped by the 458source. No further update must be done before rings are restarted. 459 460In postcopy migration the slave is started before all the memory has 461been received from the source host, and care must be taken to avoid 462accessing pages that have yet to be received. The slave opens a 463'userfault'-fd and registers the memory with it; this fd is then 464passed back over to the master. The master services requests on the 465userfaultfd for pages that are accessed and when the page is available 466it performs WAKE ioctl's on the userfaultfd to wake the stalled 467slave. The client indicates support for this via the 468``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 469 470Memory access 471------------- 472 473The master sends a list of vhost memory regions to the slave using the 474``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 475addresses: a guest address and a user address. 476 477Messages contain guest addresses and/or user addresses to reference locations 478within the shared memory. The mapping of these addresses works as follows. 479 480User addresses map to the vhost memory region containing that user address. 481 482When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 483 484* Guest addresses map to the vhost memory region containing that guest 485 address. 486 487When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 488 489* Guest addresses are also called I/O virtual addresses (IOVAs). They are 490 translated to user addresses via the IOTLB. 491 492* The vhost memory region guest address is not used. 493 494IOMMU support 495------------- 496 497When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 498master sends IOTLB entries update & invalidation by sending 499``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct 500vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 501has to be filled with the update message type (2), the I/O virtual 502address, the size, the user virtual address, and the permissions 503flags. Addresses and size must be within vhost memory regions set via 504the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 505``iotlb`` payload has to be filled with the invalidation message type 506(3), the I/O virtual address and the size. On success, the slave is 507expected to reply with a zero payload, non-zero otherwise. 508 509The slave relies on the slave communication channel (see :ref:`Slave 510communication <slave_communication>` section below) to send IOTLB miss 511and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 512requests to the master with a ``struct vhost_iotlb_msg`` as 513payload. For miss events, the iotlb payload has to be filled with the 514miss message type (1), the I/O virtual address and the permissions 515flags. For access failure event, the iotlb payload has to be filled 516with the access failure message type (4), the I/O virtual address and 517the permissions flags. For synchronization purpose, the slave may 518rely on the reply-ack feature, so the master may send a reply when 519operation is completed if the reply-ack feature is negotiated and 520slaves requests a reply. For miss events, completed operation means 521either master sent an update message containing the IOTLB entry 522containing requested address and permission, or master sent nothing if 523the IOTLB miss message is invalid (invalid IOVA or permission). 524 525The master isn't expected to take the initiative to send IOTLB update 526messages, as the slave sends IOTLB miss messages for the guest virtual 527memory areas it needs to access. 528 529.. _slave_communication: 530 531Slave communication 532------------------- 533 534An optional communication channel is provided if the slave declares 535``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 536slave to make requests to the master. 537 538The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 539 540A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master 541using this fd communication channel. 542 543If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 544negotiated, slave can send file descriptors (at most 8 descriptors in 545each message) to master via ancillary data using this fd communication 546channel. 547 548Inflight I/O tracking 549--------------------- 550 551To support reconnecting after restart or crash, slave may need to 552resubmit inflight I/Os. If virtqueue is processed in order, we can 553easily achieve that by getting the inflight descriptors from 554descriptor table (split virtqueue) or descriptor ring (packed 555virtqueue). However, it can't work when we process descriptors 556out-of-order because some entries which store the information of 557inflight descriptors in available ring (split virtqueue) or descriptor 558ring (packed virtqueue) might be overridden by new entries. To solve 559this problem, slave need to allocate an extra buffer to store this 560information of inflight descriptors and share it with master for 561persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 562``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 563between master and slave. And the format of this buffer is described 564below: 565 566+---------------+---------------+-----+---------------+ 567| queue0 region | queue1 region | ... | queueN region | 568+---------------+---------------+-----+---------------+ 569 570N is the number of available virtqueues. Slave could get it from num 571queues field of ``VhostUserInflight``. 572 573For split virtqueue, queue region can be implemented as: 574 575.. code:: c 576 577 typedef struct DescStateSplit { 578 /* Indicate whether this descriptor is inflight or not. 579 * Only available for head-descriptor. */ 580 uint8_t inflight; 581 582 /* Padding */ 583 uint8_t padding[5]; 584 585 /* Maintain a list for the last batch of used descriptors. 586 * Only available when batching is used for submitting */ 587 uint16_t next; 588 589 /* Used to preserve the order of fetching available descriptors. 590 * Only available for head-descriptor. */ 591 uint64_t counter; 592 } DescStateSplit; 593 594 typedef struct QueueRegionSplit { 595 /* The feature flags of this region. Now it's initialized to 0. */ 596 uint64_t features; 597 598 /* The version of this region. It's 1 currently. 599 * Zero value indicates an uninitialized buffer */ 600 uint16_t version; 601 602 /* The size of DescStateSplit array. It's equal to the virtqueue 603 * size. Slave could get it from queue size field of VhostUserInflight. */ 604 uint16_t desc_num; 605 606 /* The head of list that track the last batch of used descriptors. */ 607 uint16_t last_batch_head; 608 609 /* Store the idx value of used ring */ 610 uint16_t used_idx; 611 612 /* Used to track the state of each descriptor in descriptor table */ 613 DescStateSplit desc[]; 614 } QueueRegionSplit; 615 616To track inflight I/O, the queue region should be processed as follows: 617 618When receiving available buffers from the driver: 619 620#. Get the next available head-descriptor index from available ring, ``i`` 621 622#. Set ``desc[i].counter`` to the value of global counter 623 624#. Increase global counter by 1 625 626#. Set ``desc[i].inflight`` to 1 627 628When supplying used buffers to the driver: 629 6301. Get corresponding used head-descriptor index, i 631 6322. Set ``desc[i].next`` to ``last_batch_head`` 633 6343. Set ``last_batch_head`` to ``i`` 635 636#. Steps 1,2,3 may be performed repeatedly if batching is possible 637 638#. Increase the ``idx`` value of used ring by the size of the batch 639 640#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 641 642#. Set ``used_idx`` to the ``idx`` value of used ring 643 644When reconnecting: 645 646#. If the value of ``used_idx`` does not match the ``idx`` value of 647 used ring (means the inflight field of ``DescStateSplit`` entries in 648 last batch may be incorrect), 649 650 a. Subtract the value of ``used_idx`` from the ``idx`` value of 651 used ring to get last batch size of ``DescStateSplit`` entries 652 653 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 654 list which starts from ``last_batch_head`` 655 656 #. Set ``used_idx`` to the ``idx`` value of used ring 657 658#. Resubmit inflight ``DescStateSplit`` entries in order of their 659 counter value 660 661For packed virtqueue, queue region can be implemented as: 662 663.. code:: c 664 665 typedef struct DescStatePacked { 666 /* Indicate whether this descriptor is inflight or not. 667 * Only available for head-descriptor. */ 668 uint8_t inflight; 669 670 /* Padding */ 671 uint8_t padding; 672 673 /* Link to the next free entry */ 674 uint16_t next; 675 676 /* Link to the last entry of descriptor list. 677 * Only available for head-descriptor. */ 678 uint16_t last; 679 680 /* The length of descriptor list. 681 * Only available for head-descriptor. */ 682 uint16_t num; 683 684 /* Used to preserve the order of fetching available descriptors. 685 * Only available for head-descriptor. */ 686 uint64_t counter; 687 688 /* The buffer id */ 689 uint16_t id; 690 691 /* The descriptor flags */ 692 uint16_t flags; 693 694 /* The buffer length */ 695 uint32_t len; 696 697 /* The buffer address */ 698 uint64_t addr; 699 } DescStatePacked; 700 701 typedef struct QueueRegionPacked { 702 /* The feature flags of this region. Now it's initialized to 0. */ 703 uint64_t features; 704 705 /* The version of this region. It's 1 currently. 706 * Zero value indicates an uninitialized buffer */ 707 uint16_t version; 708 709 /* The size of DescStatePacked array. It's equal to the virtqueue 710 * size. Slave could get it from queue size field of VhostUserInflight. */ 711 uint16_t desc_num; 712 713 /* The head of free DescStatePacked entry list */ 714 uint16_t free_head; 715 716 /* The old head of free DescStatePacked entry list */ 717 uint16_t old_free_head; 718 719 /* The used index of descriptor ring */ 720 uint16_t used_idx; 721 722 /* The old used index of descriptor ring */ 723 uint16_t old_used_idx; 724 725 /* Device ring wrap counter */ 726 uint8_t used_wrap_counter; 727 728 /* The old device ring wrap counter */ 729 uint8_t old_used_wrap_counter; 730 731 /* Padding */ 732 uint8_t padding[7]; 733 734 /* Used to track the state of each descriptor fetched from descriptor ring */ 735 DescStatePacked desc[]; 736 } QueueRegionPacked; 737 738To track inflight I/O, the queue region should be processed as follows: 739 740When receiving available buffers from the driver: 741 742#. Get the next available descriptor entry from descriptor ring, ``d`` 743 744#. If ``d`` is head descriptor, 745 746 a. Set ``desc[old_free_head].num`` to 0 747 748 #. Set ``desc[old_free_head].counter`` to the value of global counter 749 750 #. Increase global counter by 1 751 752 #. Set ``desc[old_free_head].inflight`` to 1 753 754#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 755 ``free_head`` 756 757#. Increase ``desc[old_free_head].num`` by 1 758 759#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 760 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 761 ``d.len``, ``d.flags``, ``d.id`` 762 763#. Set ``free_head`` to ``desc[free_head].next`` 764 765#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 766 767When supplying used buffers to the driver: 768 7691. Get corresponding used head-descriptor entry from descriptor ring, 770 ``d`` 771 7722. Get corresponding ``DescStatePacked`` entry, ``e`` 773 7743. Set ``desc[e.last].next`` to ``free_head`` 775 7764. Set ``free_head`` to the index of ``e`` 777 778#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 779 780#. Increase ``used_idx`` by the size of the batch and update 781 ``used_wrap_counter`` if needed 782 783#. Update ``d.flags`` 784 785#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 786 in the batch to 0 787 788#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 789 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 790 791When reconnecting: 792 793#. If ``used_idx`` does not match ``old_used_idx`` (means the 794 ``inflight`` field of ``DescStatePacked`` entries in last batch may 795 be incorrect), 796 797 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 798 799 #. Use ``old_used_wrap_counter`` to calculate the available flags 800 801 #. If ``d.flags`` is not equal to the calculated flags value (means 802 slave has submitted the buffer to guest driver before crash, so 803 it has to commit the in-progres update), set ``old_free_head``, 804 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 805 ``used_idx``, ``used_wrap_counter`` 806 807#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 808 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 809 (roll back any in-progress update) 810 811#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 812 free list to 0 813 814#. Resubmit inflight ``DescStatePacked`` entries in order of their 815 counter value 816 817In-band notifications 818--------------------- 819 820In some limited situations (e.g. for simulation) it is desirable to 821have the kick, call and error (if used) signals done via in-band 822messages instead of asynchronous eventfd notifications. This can be 823done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 824protocol feature. 825 826Note that due to the fact that too many messages on the sockets can 827cause the sending application(s) to block, it is not advised to use 828this feature unless absolutely necessary. It is also considered an 829error to negotiate this feature without also negotiating 830``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 831the former is necessary for getting a message channel from the slave 832to the master, while the latter needs to be used with the in-band 833notification messages to block until they are processed, both to avoid 834blocking later and for proper processing (at least in the simulation 835use case.) As it has no other way of signalling this error, the slave 836should close the connection as a response to a 837``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 838notifications feature flag without the other two. 839 840Protocol features 841----------------- 842 843.. code:: c 844 845 #define VHOST_USER_PROTOCOL_F_MQ 0 846 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 847 #define VHOST_USER_PROTOCOL_F_RARP 2 848 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 849 #define VHOST_USER_PROTOCOL_F_MTU 4 850 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 851 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 852 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 853 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 854 #define VHOST_USER_PROTOCOL_F_CONFIG 9 855 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 856 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 857 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 858 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 859 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 860 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 861 #define VHOST_USER_PROTOCOL_F_STATUS 16 862 863Master message types 864-------------------- 865 866``VHOST_USER_GET_FEATURES`` 867 :id: 1 868 :equivalent ioctl: ``VHOST_GET_FEATURES`` 869 :master payload: N/A 870 :slave payload: ``u64`` 871 872 Get from the underlying vhost implementation the features bitmask. 873 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support 874 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 875 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 876 877``VHOST_USER_SET_FEATURES`` 878 :id: 2 879 :equivalent ioctl: ``VHOST_SET_FEATURES`` 880 :master payload: ``u64`` 881 882 Enable features in the underlying vhost implementation using a 883 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 884 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 885 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 886 887``VHOST_USER_GET_PROTOCOL_FEATURES`` 888 :id: 15 889 :equivalent ioctl: ``VHOST_GET_FEATURES`` 890 :master payload: N/A 891 :slave payload: ``u64`` 892 893 Get the protocol feature bitmask from the underlying vhost 894 implementation. Only legal if feature bit 895 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 896 ``VHOST_USER_GET_FEATURES``. 897 898.. Note:: 899 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must 900 support this message even before ``VHOST_USER_SET_FEATURES`` was 901 called. 902 903``VHOST_USER_SET_PROTOCOL_FEATURES`` 904 :id: 16 905 :equivalent ioctl: ``VHOST_SET_FEATURES`` 906 :master payload: ``u64`` 907 908 Enable protocol features in the underlying vhost implementation. 909 910 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 911 ``VHOST_USER_GET_FEATURES``. 912 913.. Note:: 914 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 915 this message even before ``VHOST_USER_SET_FEATURES`` was called. 916 917``VHOST_USER_SET_OWNER`` 918 :id: 3 919 :equivalent ioctl: ``VHOST_SET_OWNER`` 920 :master payload: N/A 921 922 Issued when a new connection is established. It sets the current 923 *master* as an owner of the session. This can be used on the *slave* 924 as a "session start" flag. 925 926``VHOST_USER_RESET_OWNER`` 927 :id: 4 928 :master payload: N/A 929 930.. admonition:: Deprecated 931 932 This is no longer used. Used to be sent to request disabling all 933 rings, but some clients interpreted it to also discard connection 934 state (this interpretation would lead to bugs). It is recommended 935 that clients either ignore this message, or use it to disable all 936 rings. 937 938``VHOST_USER_SET_MEM_TABLE`` 939 :id: 5 940 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 941 :master payload: memory regions description 942 :slave payload: (postcopy only) memory regions description 943 944 Sets the memory map regions on the slave so it can translate the 945 vring addresses. In the ancillary data there is an array of file 946 descriptors for each memory mapped region. The size and ordering of 947 the fds matches the number and ordering of memory regions. 948 949 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 950 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 951 regions to the master. The slave must have mmap'd the regions but 952 not yet accessed them and should not yet generate a userfault 953 event. 954 955.. Note:: 956 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 957 reply back to the list of mappings with an empty 958 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 959 reception of this message may the guest start accessing the memory 960 and generating faults. 961 962``VHOST_USER_SET_LOG_BASE`` 963 :id: 6 964 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 965 :master payload: u64 966 :slave payload: N/A 967 968 Sets logging shared memory space. 969 970 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 971 the log memory fd is provided in the ancillary data of 972 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 973 memory area provided in the message. 974 975``VHOST_USER_SET_LOG_FD`` 976 :id: 7 977 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 978 :master payload: N/A 979 980 Sets the logging file descriptor, which is passed as ancillary data. 981 982``VHOST_USER_SET_VRING_NUM`` 983 :id: 8 984 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 985 :master payload: vring state description 986 987 Set the size of the queue. 988 989``VHOST_USER_SET_VRING_ADDR`` 990 :id: 9 991 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 992 :master payload: vring address description 993 :slave payload: N/A 994 995 Sets the addresses of the different aspects of the vring. 996 997``VHOST_USER_SET_VRING_BASE`` 998 :id: 10 999 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 1000 :master payload: vring state description 1001 1002 Sets the base offset in the available vring. 1003 1004``VHOST_USER_GET_VRING_BASE`` 1005 :id: 11 1006 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 1007 :master payload: vring state description 1008 :slave payload: vring state description 1009 1010 Get the available vring base offset. 1011 1012``VHOST_USER_SET_VRING_KICK`` 1013 :id: 12 1014 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 1015 :master payload: ``u64`` 1016 1017 Set the event file descriptor for adding buffers to the vring. It is 1018 passed in the ancillary data. 1019 1020 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1021 invalid FD flag. This flag is set when there is no file descriptor 1022 in the ancillary data. This signals that polling should be used 1023 instead of waiting for the kick. Note that if the protocol feature 1024 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1025 this message isn't necessary as the ring is also started on the 1026 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1027 set an event file descriptor (which will be preferred over the 1028 message) or to enable polling. 1029 1030``VHOST_USER_SET_VRING_CALL`` 1031 :id: 13 1032 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1033 :master payload: ``u64`` 1034 1035 Set the event file descriptor to signal when buffers are used. It is 1036 passed in the ancillary data. 1037 1038 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1039 invalid FD flag. This flag is set when there is no file descriptor 1040 in the ancillary data. This signals that polling will be used 1041 instead of waiting for the call. Note that if the protocol features 1042 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1043 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1044 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be 1045 used, it may however still be used to set an event file descriptor 1046 or to enable polling. 1047 1048``VHOST_USER_SET_VRING_ERR`` 1049 :id: 14 1050 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1051 :master payload: ``u64`` 1052 1053 Set the event file descriptor to signal when error occurs. It is 1054 passed in the ancillary data. 1055 1056 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1057 invalid FD flag. This flag is set when there is no file descriptor 1058 in the ancillary data. Note that if the protocol features 1059 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1060 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1061 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be 1062 used, it may however still be used to set an event file descriptor 1063 (which will be preferred over the message). 1064 1065``VHOST_USER_GET_QUEUE_NUM`` 1066 :id: 17 1067 :equivalent ioctl: N/A 1068 :master payload: N/A 1069 :slave payload: u64 1070 1071 Query how many queues the backend supports. 1072 1073 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1074 is set in queried protocol features by 1075 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1076 1077``VHOST_USER_SET_VRING_ENABLE`` 1078 :id: 18 1079 :equivalent ioctl: N/A 1080 :master payload: vring state description 1081 1082 Signal slave to enable or disable corresponding vring. 1083 1084 This request should be sent only when 1085 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1086 1087``VHOST_USER_SEND_RARP`` 1088 :id: 19 1089 :equivalent ioctl: N/A 1090 :master payload: ``u64`` 1091 1092 Ask vhost user backend to broadcast a fake RARP to notify the migration 1093 is terminated for guest that does not support GUEST_ANNOUNCE. 1094 1095 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1096 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1097 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1098 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1099 payload contain the mac address of the guest to allow the vhost user 1100 backend to construct and broadcast the fake RARP. 1101 1102``VHOST_USER_NET_SET_MTU`` 1103 :id: 20 1104 :equivalent ioctl: N/A 1105 :master payload: ``u64`` 1106 1107 Set host MTU value exposed to the guest. 1108 1109 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1110 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1111 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1112 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1113 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1114 1115 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1116 respond with zero in case the specified MTU is valid, or non-zero 1117 otherwise. 1118 1119``VHOST_USER_SET_SLAVE_REQ_FD`` 1120 :id: 21 1121 :equivalent ioctl: N/A 1122 :master payload: N/A 1123 1124 Set the socket file descriptor for slave initiated requests. It is passed 1125 in the ancillary data. 1126 1127 This request should be sent only when 1128 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1129 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1130 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1131 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1132 respond with zero for success, non-zero otherwise. 1133 1134``VHOST_USER_IOTLB_MSG`` 1135 :id: 22 1136 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1137 :master payload: ``struct vhost_iotlb_msg`` 1138 :slave payload: ``u64`` 1139 1140 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1141 1142 Master sends such requests to update and invalidate entries in the 1143 device IOTLB. The slave has to acknowledge the request with sending 1144 zero as ``u64`` payload for success, non-zero otherwise. 1145 1146 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1147 feature has been successfully negotiated. 1148 1149``VHOST_USER_SET_VRING_ENDIAN`` 1150 :id: 23 1151 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1152 :master payload: vring state description 1153 1154 Set the endianness of a VQ for legacy devices. Little-endian is 1155 indicated with state.num set to 0 and big-endian is indicated with 1156 state.num set to 1. Other values are invalid. 1157 1158 This request should be sent only when 1159 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1160 Backends that negotiated this feature should handle both 1161 endiannesses and expect this message once (per VQ) during device 1162 configuration (ie. before the master starts the VQ). 1163 1164``VHOST_USER_GET_CONFIG`` 1165 :id: 24 1166 :equivalent ioctl: N/A 1167 :master payload: virtio device config space 1168 :slave payload: virtio device config space 1169 1170 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1171 submitted by the vhost-user master to fetch the contents of the 1172 virtio device configuration space, vhost-user slave's payload size 1173 MUST match master's request, vhost-user slave uses zero length of 1174 payload to indicate an error to vhost-user master. The vhost-user 1175 master may cache the contents to avoid repeated 1176 ``VHOST_USER_GET_CONFIG`` calls. 1177 1178``VHOST_USER_SET_CONFIG`` 1179 :id: 25 1180 :equivalent ioctl: N/A 1181 :master payload: virtio device config space 1182 :slave payload: N/A 1183 1184 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1185 submitted by the vhost-user master when the Guest changes the virtio 1186 device configuration space and also can be used for live migration 1187 on the destination host. The vhost-user slave must check the flags 1188 field, and slaves MUST NOT accept SET_CONFIG for read-only 1189 configuration space fields unless the live migration bit is set. 1190 1191``VHOST_USER_CREATE_CRYPTO_SESSION`` 1192 :id: 26 1193 :equivalent ioctl: N/A 1194 :master payload: crypto session description 1195 :slave payload: crypto session description 1196 1197 Create a session for crypto operation. The server side must return 1198 the session id, 0 or positive for success, negative for failure. 1199 This request should be sent only when 1200 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1201 successfully negotiated. It's a required feature for crypto 1202 devices. 1203 1204``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1205 :id: 27 1206 :equivalent ioctl: N/A 1207 :master payload: ``u64`` 1208 1209 Close a session for crypto operation which was previously 1210 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1211 1212 This request should be sent only when 1213 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1214 successfully negotiated. It's a required feature for crypto 1215 devices. 1216 1217``VHOST_USER_POSTCOPY_ADVISE`` 1218 :id: 28 1219 :master payload: N/A 1220 :slave payload: userfault fd 1221 1222 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master 1223 advises slave that a migration with postcopy enabled is underway, 1224 the slave must open a userfaultfd for later use. Note that at this 1225 stage the migration is still in precopy mode. 1226 1227``VHOST_USER_POSTCOPY_LISTEN`` 1228 :id: 29 1229 :master payload: N/A 1230 1231 Master advises slave that a transition to postcopy mode has 1232 happened. The slave must ensure that shared memory is registered 1233 with userfaultfd to cause faulting of non-present pages. 1234 1235 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1236 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1237 1238``VHOST_USER_POSTCOPY_END`` 1239 :id: 30 1240 :slave payload: ``u64`` 1241 1242 Master advises that postcopy migration has now completed. The slave 1243 must disable the userfaultfd. The response is an acknowledgement 1244 only. 1245 1246 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1247 is sent at the end of the migration, after 1248 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1249 1250 The value returned is an error indication; 0 is success. 1251 1252``VHOST_USER_GET_INFLIGHT_FD`` 1253 :id: 31 1254 :equivalent ioctl: N/A 1255 :master payload: inflight description 1256 1257 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1258 been successfully negotiated, this message is submitted by master to 1259 get a shared buffer from slave. The shared buffer will be used to 1260 track inflight I/O by slave. QEMU should retrieve a new one when vm 1261 reset. 1262 1263``VHOST_USER_SET_INFLIGHT_FD`` 1264 :id: 32 1265 :equivalent ioctl: N/A 1266 :master payload: inflight description 1267 1268 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1269 been successfully negotiated, this message is submitted by master to 1270 send the shared inflight buffer back to slave so that slave could 1271 get inflight I/O after a crash or restart. 1272 1273``VHOST_USER_GPU_SET_SOCKET`` 1274 :id: 33 1275 :equivalent ioctl: N/A 1276 :master payload: N/A 1277 1278 Sets the GPU protocol socket file descriptor, which is passed as 1279 ancillary data. The GPU protocol is used to inform the master of 1280 rendering state and updates. See vhost-user-gpu.rst for details. 1281 1282``VHOST_USER_RESET_DEVICE`` 1283 :id: 34 1284 :equivalent ioctl: N/A 1285 :master payload: N/A 1286 :slave payload: N/A 1287 1288 Ask the vhost user backend to disable all rings and reset all 1289 internal device state to the initial state, ready to be 1290 reinitialized. The backend retains ownership of the device 1291 throughout the reset operation. 1292 1293 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1294 feature is set by the backend. 1295 1296``VHOST_USER_VRING_KICK`` 1297 :id: 35 1298 :equivalent ioctl: N/A 1299 :slave payload: vring state description 1300 :master payload: N/A 1301 1302 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1303 feature has been successfully negotiated, this message may be 1304 submitted by the master to indicate that a buffer was added to 1305 the vring instead of signalling it using the vring's kick file 1306 descriptor or having the slave rely on polling. 1307 1308 The state.num field is currently reserved and must be set to 0. 1309 1310``VHOST_USER_GET_MAX_MEM_SLOTS`` 1311 :id: 36 1312 :equivalent ioctl: N/A 1313 :slave payload: u64 1314 1315 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1316 feature has been successfully negotiated, this message is submitted 1317 by master to the slave. The slave should return the message with a 1318 u64 payload containing the maximum number of memory slots for 1319 QEMU to expose to the guest. The value returned by the backend 1320 will be capped at the maximum number of ram slots which can be 1321 supported by the target platform. 1322 1323``VHOST_USER_ADD_MEM_REG`` 1324 :id: 37 1325 :equivalent ioctl: N/A 1326 :slave payload: single memory region description 1327 1328 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1329 feature has been successfully negotiated, this message is submitted 1330 by the master to the slave. The message payload contains a memory 1331 region descriptor struct, describing a region of guest memory which 1332 the slave device must map in. When the 1333 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1334 been successfully negotiated, along with the 1335 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1336 update the memory tables of the slave device. 1337 1338 Exactly one file descriptor from which the memory is mapped is 1339 passed in the ancillary data. 1340 1341 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the slave 1342 replies with the bases of the memory mapped region to the master. 1343 For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. 1344 They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. 1345 1346``VHOST_USER_REM_MEM_REG`` 1347 :id: 38 1348 :equivalent ioctl: N/A 1349 :slave payload: single memory region description 1350 1351 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1352 feature has been successfully negotiated, this message is submitted 1353 by the master to the slave. The message payload contains a memory 1354 region descriptor struct, describing a region of guest memory which 1355 the slave device must unmap. When the 1356 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1357 been successfully negotiated, along with the 1358 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1359 update the memory tables of the slave device. 1360 1361 The memory region to be removed is identified by its guest address, 1362 user address and size. The mmap offset is ignored. 1363 1364 No file descriptors SHOULD be passed in the ancillary data. For 1365 compatibility with existing incorrect implementations, the slave MAY 1366 accept messages with one file descriptor. If a file descriptor is 1367 passed, the slave MUST close it without using it otherwise. 1368 1369``VHOST_USER_SET_STATUS`` 1370 :id: 39 1371 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1372 :slave payload: N/A 1373 :master payload: ``u64`` 1374 1375 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1376 successfully negotiated, this message is submitted by the master to 1377 notify the backend with updated device status as defined in the Virtio 1378 specification. 1379 1380``VHOST_USER_GET_STATUS`` 1381 :id: 40 1382 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1383 :slave payload: ``u64`` 1384 :master payload: N/A 1385 1386 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1387 successfully negotiated, this message is submitted by the master to 1388 query the backend for its device status as defined in the Virtio 1389 specification. 1390 1391 1392Slave message types 1393------------------- 1394 1395``VHOST_USER_SLAVE_IOTLB_MSG`` 1396 :id: 1 1397 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1398 :slave payload: ``struct vhost_iotlb_msg`` 1399 :master payload: N/A 1400 1401 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1402 Slave sends such requests to notify of an IOTLB miss, or an IOTLB 1403 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1404 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master 1405 must respond with zero when operation is successfully completed, or 1406 non-zero otherwise. This request should be send only when 1407 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1408 negotiated. 1409 1410``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1411 :id: 2 1412 :equivalent ioctl: N/A 1413 :slave payload: N/A 1414 :master payload: N/A 1415 1416 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1417 slave sends such messages to notify that the virtio device's 1418 configuration space has changed, for those host devices which can 1419 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1420 message to slave to get the latest content. If 1421 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the 1422 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when 1423 operation is successfully completed, or non-zero otherwise. 1424 1425``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1426 :id: 3 1427 :equivalent ioctl: N/A 1428 :slave payload: vring area description 1429 :master payload: N/A 1430 1431 Sets host notifier for a specified queue. The queue index is 1432 contained in the ``u64`` field of the vring area description. The 1433 host notifier is described by the file descriptor (typically it's a 1434 VFIO device fd) which is passed as ancillary data and the size 1435 (which is mmap size and should be the same as host page size) and 1436 offset (which is mmap offset) carried in the vring area 1437 description. QEMU can mmap the file descriptor based on the size and 1438 offset to get a memory range. Registering a host notifier means 1439 mapping this memory range to the VM as the specified queue's notify 1440 MMIO region. Slave sends this request to tell QEMU to de-register 1441 the existing notifier if any and register the new notifier if the 1442 request is sent with a file descriptor. 1443 1444 This request should be sent only when 1445 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1446 successfully negotiated. 1447 1448``VHOST_USER_SLAVE_VRING_CALL`` 1449 :id: 4 1450 :equivalent ioctl: N/A 1451 :slave payload: vring state description 1452 :master payload: N/A 1453 1454 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1455 feature has been successfully negotiated, this message may be 1456 submitted by the slave to indicate that a buffer was used from 1457 the vring instead of signalling this using the vring's call file 1458 descriptor or having the master relying on polling. 1459 1460 The state.num field is currently reserved and must be set to 0. 1461 1462``VHOST_USER_SLAVE_VRING_ERR`` 1463 :id: 5 1464 :equivalent ioctl: N/A 1465 :slave payload: vring state description 1466 :master payload: N/A 1467 1468 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1469 feature has been successfully negotiated, this message may be 1470 submitted by the slave to indicate that an error occurred on the 1471 specific vring, instead of signalling the error file descriptor 1472 set by the master via ``VHOST_USER_SET_VRING_ERR``. 1473 1474 The state.num field is currently reserved and must be set to 0. 1475 1476.. _reply_ack: 1477 1478VHOST_USER_PROTOCOL_F_REPLY_ACK 1479------------------------------- 1480 1481The original vhost-user specification only demands replies for certain 1482commands. This differs from the vhost protocol implementation where 1483commands are sent over an ``ioctl()`` call and block until the client 1484has completed. 1485 1486With this protocol extension negotiated, the sender (QEMU) can set the 1487``need_reply`` [Bit 3] flag to any command. This indicates that the 1488client MUST respond with a Payload ``VhostUserMsg`` indicating success 1489or failure. The payload should be set to zero on success or non-zero 1490on failure, unless the message already has an explicit reply body. 1491 1492The response payload gives QEMU a deterministic indication of the result 1493of the command. Today, QEMU is expected to terminate the main vhost-user 1494loop upon receiving such errors. In future, qemu could be taught to be more 1495resilient for selective requests. 1496 1497For the message types that already solicit a reply from the client, 1498the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1499being set brings no behavioural change. (See the Communication_ 1500section for details.) 1501 1502.. _backend_conventions: 1503 1504Backend program conventions 1505=========================== 1506 1507vhost-user backends can provide various devices & services and may 1508need to be configured manually depending on the use case. However, it 1509is a good idea to follow the conventions listed here when 1510possible. Users, QEMU or libvirt, can then rely on some common 1511behaviour to avoid heterogeneous configuration and management of the 1512backend programs and facilitate interoperability. 1513 1514Each backend installed on a host system should come with at least one 1515JSON file that conforms to the vhost-user.json schema. Each file 1516informs the management applications about the backend type, and binary 1517location. In addition, it defines rules for management apps for 1518picking the highest priority backend when multiple match the search 1519criteria (see ``@VhostUserBackend`` documentation in the schema file). 1520 1521If the backend is not capable of enabling a requested feature on the 1522host (such as 3D acceleration with virgl), or the initialization 1523failed, the backend should fail to start early and exit with a status 1524!= 0. It may also print a message to stderr for further details. 1525 1526The backend program must not daemonize itself, but it may be 1527daemonized by the management layer. It may also have a restricted 1528access to the system. 1529 1530File descriptors 0, 1 and 2 will exist, and have regular 1531stdin/stdout/stderr usage (they may have been redirected to /dev/null 1532by the management layer, or to a log handler). 1533 1534The backend program must end (as quickly and cleanly as possible) when 1535the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1536the management layer after a few seconds. 1537 1538The following command line options have an expected behaviour. They 1539are mandatory, unless explicitly said differently: 1540 1541--socket-path=PATH 1542 1543 This option specify the location of the vhost-user Unix domain socket. 1544 It is incompatible with --fd. 1545 1546--fd=FDNUM 1547 1548 When this argument is given, the backend program is started with the 1549 vhost-user socket as file descriptor FDNUM. It is incompatible with 1550 --socket-path. 1551 1552--print-capabilities 1553 1554 Output to stdout the backend capabilities in JSON format, and then 1555 exit successfully. Other options and arguments should be ignored, and 1556 the backend program should not perform its normal function. The 1557 capabilities can be reported dynamically depending on the host 1558 capabilities. 1559 1560The JSON output is described in the ``vhost-user.json`` schema, by 1561```@VHostUserBackendCapabilities``. Example: 1562 1563.. code:: json 1564 1565 { 1566 "type": "foo", 1567 "features": [ 1568 "feature-a", 1569 "feature-b" 1570 ] 1571 } 1572 1573vhost-user-input 1574---------------- 1575 1576Command line options: 1577 1578--evdev-path=PATH 1579 1580 Specify the linux input device. 1581 1582 (optional) 1583 1584--no-grab 1585 1586 Do no request exclusive access to the input device. 1587 1588 (optional) 1589 1590vhost-user-gpu 1591-------------- 1592 1593Command line options: 1594 1595--render-node=PATH 1596 1597 Specify the GPU DRM render node. 1598 1599 (optional) 1600 1601--virgl 1602 1603 Enable virgl rendering support. 1604 1605 (optional) 1606 1607vhost-user-blk 1608-------------- 1609 1610Command line options: 1611 1612--blk-file=PATH 1613 1614 Specify block device or file path. 1615 1616 (optional) 1617 1618--read-only 1619 1620 Enable read-only. 1621 1622 (optional) 1623