1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *front-end* and 27*back-end*. The *front-end* is the application that shares its virtqueues, in 28our case QEMU. The *back-end* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *front-end*, and the *back-end* 31is the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device back-end processing read & write to a virtual 34disk. In order to facilitate interoperability between various back-end 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38The *front-end* and *back-end* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Support for platforms other than Linux 42-------------------------------------- 43 44While vhost-user was initially developed targeting Linux, nowadays it 45is supported on any platform that provides the following features: 46 47- A way for requesting shared memory represented by a file descriptor 48 so it can be passed over a UNIX domain socket and then mapped by the 49 other process. 50 51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can 52 exchange messages through it, including ancillary data when needed. 53 54- Either eventfd or pipe/pipe2. On platforms where eventfd is not 55 available, QEMU will automatically fall back to pipe2 or, as a last 56 resort, pipe. Each file descriptor will be used for receiving or 57 sending events by reading or writing (respectively) an 8-byte value 58 to the corresponding it. The 8-value itself has no meaning and 59 should not be interpreted. 60 61Message Specification 62===================== 63 64.. Note:: All numbers are in the machine native byte order. 65 66A vhost-user message consists of 3 header fields and a payload. 67 68+---------+-------+------+---------+ 69| request | flags | size | payload | 70+---------+-------+------+---------+ 71 72Header 73------ 74 75:request: 32-bit type of the request 76 77:flags: 32-bit bit field 78 79- Lower 2 bits are the version (currently 0x01) 80- Bit 2 is the reply flag - needs to be sent on each reply from the back-end 81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 82 details. 83 84:size: 32-bit size of the payload 85 86Payload 87------- 88 89Depending on the request type, **payload** can be: 90 91A single 64-bit integer 92^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-----+ 95| u64 | 96+-----+ 97 98:u64: a 64-bit unsigned integer 99 100A vring state description 101^^^^^^^^^^^^^^^^^^^^^^^^^ 102 103+-------+-----+ 104| index | num | 105+-------+-----+ 106 107:index: a 32-bit index 108 109:num: a 32-bit number 110 111A vring address description 112^^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------+-------+------+------------+------+-----------+-----+ 115| index | flags | size | descriptor | used | available | log | 116+-------+-------+------+------------+------+-----------+-----+ 117 118:index: a 32-bit vring index 119 120:flags: a 32-bit vring flags 121 122:descriptor: a 64-bit ring address of the vring descriptor table 123 124:used: a 64-bit ring address of the vring used ring 125 126:available: a 64-bit ring address of the vring available ring 127 128:log: a 64-bit guest address for logging 129 130Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 131been negotiated. Otherwise it is a user address. 132 133Memory regions description 134^^^^^^^^^^^^^^^^^^^^^^^^^^ 135 136+-------------+---------+---------+-----+---------+ 137| num regions | padding | region0 | ... | region7 | 138+-------------+---------+---------+-----+---------+ 139 140:num regions: a 32-bit number of regions 141 142:padding: 32-bit 143 144A region is: 145 146+---------------+------+--------------+-------------+ 147| guest address | size | user address | mmap offset | 148+---------------+------+--------------+-------------+ 149 150:guest address: a 64-bit guest address of the region 151 152:size: a 64-bit size 153 154:user address: a 64-bit user address 155 156:mmap offset: 64-bit offset where region starts in the mapped memory 157 158Single memory region description 159^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 160 161+---------+---------------+------+--------------+-------------+ 162| padding | guest address | size | user address | mmap offset | 163+---------+---------------+------+--------------+-------------+ 164 165:padding: 64-bit 166 167:guest address: a 64-bit guest address of the region 168 169:size: a 64-bit size 170 171:user address: a 64-bit user address 172 173:mmap offset: 64-bit offset where region starts in the mapped memory 174 175Log description 176^^^^^^^^^^^^^^^ 177 178+----------+------------+ 179| log size | log offset | 180+----------+------------+ 181 182:log size: size of area used for logging 183 184:log offset: offset from start of supplied file descriptor where 185 logging starts (i.e. where guest address 0 would be 186 logged) 187 188An IOTLB message 189^^^^^^^^^^^^^^^^ 190 191+------+------+--------------+-------------------+------+ 192| iova | size | user address | permissions flags | type | 193+------+------+--------------+-------------------+------+ 194 195:iova: a 64-bit I/O virtual address programmed by the guest 196 197:size: a 64-bit size 198 199:user address: a 64-bit user address 200 201:permissions flags: an 8-bit value: 202 - 0: No access 203 - 1: Read access 204 - 2: Write access 205 - 3: Read/Write access 206 207:type: an 8-bit IOTLB message type: 208 - 1: IOTLB miss 209 - 2: IOTLB update 210 - 3: IOTLB invalidate 211 - 4: IOTLB access fail 212 213Virtio device config space 214^^^^^^^^^^^^^^^^^^^^^^^^^^ 215 216+--------+------+-------+---------+ 217| offset | size | flags | payload | 218+--------+------+-------+---------+ 219 220:offset: a 32-bit offset of virtio device's configuration space 221 222:size: a 32-bit configuration space access size in bytes 223 224:flags: a 32-bit value: 225 - 0: Vhost front-end messages used for writable fields 226 - 1: Vhost front-end messages used for live migration 227 228:payload: Size bytes array holding the contents of the virtio 229 device's configuration space 230 231Vring area description 232^^^^^^^^^^^^^^^^^^^^^^ 233 234+-----+------+--------+ 235| u64 | size | offset | 236+-----+------+--------+ 237 238:u64: a 64-bit integer contains vring index and flags 239 240:size: a 64-bit size of this area 241 242:offset: a 64-bit offset of this area from the start of the 243 supplied file descriptor 244 245Inflight description 246^^^^^^^^^^^^^^^^^^^^ 247 248+-----------+-------------+------------+------------+ 249| mmap size | mmap offset | num queues | queue size | 250+-----------+-------------+------------+------------+ 251 252:mmap size: a 64-bit size of area to track inflight I/O 253 254:mmap offset: a 64-bit offset of this area from the start 255 of the supplied file descriptor 256 257:num queues: a 16-bit number of virtqueues 258 259:queue size: a 16-bit size of virtqueues 260 261C structure 262----------- 263 264In QEMU the vhost-user message is implemented with the following struct: 265 266.. code:: c 267 268 typedef struct VhostUserMsg { 269 VhostUserRequest request; 270 uint32_t flags; 271 uint32_t size; 272 union { 273 uint64_t u64; 274 struct vhost_vring_state state; 275 struct vhost_vring_addr addr; 276 VhostUserMemory memory; 277 VhostUserLog log; 278 struct vhost_iotlb_msg iotlb; 279 VhostUserConfig config; 280 VhostUserVringArea area; 281 VhostUserInflight inflight; 282 }; 283 } QEMU_PACKED VhostUserMsg; 284 285Communication 286============= 287 288The protocol for vhost-user is based on the existing implementation of 289vhost for the Linux Kernel. Most messages that can be sent via the 290Unix domain socket implementing vhost-user have an equivalent ioctl to 291the kernel implementation. 292 293The communication consists of the *front-end* sending message requests and 294the *back-end* sending message replies. Most of the requests don't require 295replies. Here is a list of the ones that do: 296 297* ``VHOST_USER_GET_FEATURES`` 298* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 299* ``VHOST_USER_GET_VRING_BASE`` 300* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 301* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 302 303.. seealso:: 304 305 :ref:`REPLY_ACK <reply_ack>` 306 The section on ``REPLY_ACK`` protocol extension. 307 308There are several messages that the front-end sends with file descriptors passed 309in the ancillary data: 310 311* ``VHOST_USER_ADD_MEM_REG`` 312* ``VHOST_USER_SET_MEM_TABLE`` 313* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 314* ``VHOST_USER_SET_LOG_FD`` 315* ``VHOST_USER_SET_VRING_KICK`` 316* ``VHOST_USER_SET_VRING_CALL`` 317* ``VHOST_USER_SET_VRING_ERR`` 318* ``VHOST_USER_SET_SLAVE_REQ_FD`` 319* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 320 321If *front-end* is unable to send the full message or receives a wrong 322reply it will close the connection. An optional reconnection mechanism 323can be implemented. 324 325If *back-end* detects some error such as incompatible features, it may also 326close the connection. This should only happen in exceptional circumstances. 327 328Any protocol extensions are gated by protocol feature bits, which 329allows full backwards compatibility on both front-end and back-end. As 330older back-ends don't support negotiating protocol features, a feature 331bit was dedicated for this purpose:: 332 333 #define VHOST_USER_F_PROTOCOL_FEATURES 30 334 335Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature 336bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits 337<https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_. 338VIRTIO devices do not advertise this feature bit and therefore VIRTIO 339drivers cannot negotiate it. 340 341This reserved feature bit was reused by the vhost-user protocol to add 342vhost-user protocol feature negotiation in a backwards compatible 343fashion. Old vhost-user front-end and back-end implementations continue to 344work even though they are not aware of vhost-user protocol feature 345negotiation. 346 347Ring states 348----------- 349 350Rings can be in one of three states: 351 352* stopped: the back-end must not process the ring at all. 353 354* started but disabled: the back-end must process the ring without 355 causing any side effects. For example, for a networking device, 356 in the disabled state the back-end must not supply any new RX packets, 357 but must process and discard any TX packets. 358 359* started and enabled. 360 361Each ring is initialized in a stopped state. The back-end must start 362ring upon receiving a kick (that is, detecting that file descriptor is 363readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` 364or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, 365and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. 366 367Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. 368 369If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 370ring starts directly in the enabled state. 371 372If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 373initialized in a disabled state and is enabled by 374``VHOST_USER_SET_VRING_ENABLE`` with parameter 1. 375 376While processing the rings (whether they are enabled or not), the back-end 377must support changing some configuration aspects on the fly. 378 379Multiple queue support 380---------------------- 381 382Many devices have a fixed number of virtqueues. In this case the front-end 383already knows the number of available virtqueues without communicating with the 384back-end. 385 386Some devices do not have a fixed number of virtqueues. Instead the maximum 387number of virtqueues is chosen by the back-end. The number can depend on host 388resource availability or back-end implementation details. Such devices are called 389multiple queue devices. 390 391Multiple queue support allows the back-end to advertise the maximum number of 392queues. This is treated as a protocol extension, hence the back-end has to 393implement protocol features first. The multiple queues feature is supported 394only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 395 396The max number of queues the back-end supports can be queried with message 397``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested 398queues is bigger than that. 399 400As all queues share one connection, the front-end uses a unique index for each 401queue in the sent message to identify a specified queue. 402 403The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 404vhost-user-net has historically automatically enabled the first queue pair. 405 406Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 407feature, even for devices with a fixed number of virtqueues, since it is simple 408to implement and offers a degree of introspection. 409 410Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 411devices with a fixed number of virtqueues. Only true multiqueue devices 412require this protocol feature. 413 414Migration 415--------- 416 417During live migration, the front-end may need to track the modifications 418the back-end makes to the memory mapped regions. The front-end should mark 419the dirty pages in a log. Once it complies to this logging, it may 420declare the ``VHOST_F_LOG_ALL`` vhost feature. 421 422To start/stop logging of data/used ring writes, the front-end may send 423messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 424``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 425flags set to 1/0, respectively. 426 427All the modifications to memory pointed by vring "descriptor" should 428be marked. Modifications to "used" vring should be marked if 429``VHOST_VRING_F_LOG`` is part of ring's flags. 430 431Dirty pages are of size:: 432 433 #define VHOST_LOG_PAGE 0x1000 434 435The log memory fd is provided in the ancillary data of 436``VHOST_USER_SET_LOG_BASE`` message when the back-end has 437``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 438 439The size of the log is supplied as part of ``VhostUserMsg`` which 440should be large enough to cover all known guest addresses. Log starts 441at the supplied offset in the supplied file descriptor. The log 442covers from address 0 to the maximum of guest regions. In pseudo-code, 443to mark page at ``addr`` as dirty:: 444 445 page = addr / VHOST_LOG_PAGE 446 log[page / 8] |= 1 << page % 8 447 448Where ``addr`` is the guest physical address. 449 450Use atomic operations, as the log may be concurrently manipulated. 451 452Note that when logging modifications to the used ring (when 453``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 454be used to calculate the log offset: the write to first byte of the 455used ring is logged at this offset from log start. Also note that this 456value might be outside the legal guest physical address range 457(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 458the bit offset of the last byte of the ring must fall within the size 459supplied by ``VhostUserLog``. 460 461``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 462ancillary data, it may be used to inform the front-end that the log has 463been modified. 464 465Once the source has finished migration, rings will be stopped by the 466source. No further update must be done before rings are restarted. 467 468In postcopy migration the back-end is started before all the memory has 469been received from the source host, and care must be taken to avoid 470accessing pages that have yet to be received. The back-end opens a 471'userfault'-fd and registers the memory with it; this fd is then 472passed back over to the front-end. The front-end services requests on the 473userfaultfd for pages that are accessed and when the page is available 474it performs WAKE ioctl's on the userfaultfd to wake the stalled 475back-end. The front-end indicates support for this via the 476``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 477 478Memory access 479------------- 480 481The front-end sends a list of vhost memory regions to the back-end using the 482``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 483addresses: a guest address and a user address. 484 485Messages contain guest addresses and/or user addresses to reference locations 486within the shared memory. The mapping of these addresses works as follows. 487 488User addresses map to the vhost memory region containing that user address. 489 490When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 491 492* Guest addresses map to the vhost memory region containing that guest 493 address. 494 495When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 496 497* Guest addresses are also called I/O virtual addresses (IOVAs). They are 498 translated to user addresses via the IOTLB. 499 500* The vhost memory region guest address is not used. 501 502IOMMU support 503------------- 504 505When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 506front-end sends IOTLB entries update & invalidation by sending 507``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct 508vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 509has to be filled with the update message type (2), the I/O virtual 510address, the size, the user virtual address, and the permissions 511flags. Addresses and size must be within vhost memory regions set via 512the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 513``iotlb`` payload has to be filled with the invalidation message type 514(3), the I/O virtual address and the size. On success, the back-end is 515expected to reply with a zero payload, non-zero otherwise. 516 517The back-end relies on the back-end communication channel (see :ref:`Back-end 518communication <backend_communication>` section below) to send IOTLB miss 519and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 520requests to the front-end with a ``struct vhost_iotlb_msg`` as 521payload. For miss events, the iotlb payload has to be filled with the 522miss message type (1), the I/O virtual address and the permissions 523flags. For access failure event, the iotlb payload has to be filled 524with the access failure message type (4), the I/O virtual address and 525the permissions flags. For synchronization purpose, the back-end may 526rely on the reply-ack feature, so the front-end may send a reply when 527operation is completed if the reply-ack feature is negotiated and 528back-ends requests a reply. For miss events, completed operation means 529either front-end sent an update message containing the IOTLB entry 530containing requested address and permission, or front-end sent nothing if 531the IOTLB miss message is invalid (invalid IOVA or permission). 532 533The front-end isn't expected to take the initiative to send IOTLB update 534messages, as the back-end sends IOTLB miss messages for the guest virtual 535memory areas it needs to access. 536 537.. _backend_communication: 538 539Back-end communication 540---------------------- 541 542An optional communication channel is provided if the back-end declares 543``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 544back-end to make requests to the front-end. 545 546The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 547 548A back-end may then send ``VHOST_USER_SLAVE_*`` messages to the front-end 549using this fd communication channel. 550 551If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 552negotiated, back-end can send file descriptors (at most 8 descriptors in 553each message) to front-end via ancillary data using this fd communication 554channel. 555 556Inflight I/O tracking 557--------------------- 558 559To support reconnecting after restart or crash, back-end may need to 560resubmit inflight I/Os. If virtqueue is processed in order, we can 561easily achieve that by getting the inflight descriptors from 562descriptor table (split virtqueue) or descriptor ring (packed 563virtqueue). However, it can't work when we process descriptors 564out-of-order because some entries which store the information of 565inflight descriptors in available ring (split virtqueue) or descriptor 566ring (packed virtqueue) might be overridden by new entries. To solve 567this problem, the back-end need to allocate an extra buffer to store this 568information of inflight descriptors and share it with front-end for 569persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 570``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 571between front-end and back-end. And the format of this buffer is described 572below: 573 574+---------------+---------------+-----+---------------+ 575| queue0 region | queue1 region | ... | queueN region | 576+---------------+---------------+-----+---------------+ 577 578N is the number of available virtqueues. The back-end could get it from num 579queues field of ``VhostUserInflight``. 580 581For split virtqueue, queue region can be implemented as: 582 583.. code:: c 584 585 typedef struct DescStateSplit { 586 /* Indicate whether this descriptor is inflight or not. 587 * Only available for head-descriptor. */ 588 uint8_t inflight; 589 590 /* Padding */ 591 uint8_t padding[5]; 592 593 /* Maintain a list for the last batch of used descriptors. 594 * Only available when batching is used for submitting */ 595 uint16_t next; 596 597 /* Used to preserve the order of fetching available descriptors. 598 * Only available for head-descriptor. */ 599 uint64_t counter; 600 } DescStateSplit; 601 602 typedef struct QueueRegionSplit { 603 /* The feature flags of this region. Now it's initialized to 0. */ 604 uint64_t features; 605 606 /* The version of this region. It's 1 currently. 607 * Zero value indicates an uninitialized buffer */ 608 uint16_t version; 609 610 /* The size of DescStateSplit array. It's equal to the virtqueue size. 611 * The back-end could get it from queue size field of VhostUserInflight. */ 612 uint16_t desc_num; 613 614 /* The head of list that track the last batch of used descriptors. */ 615 uint16_t last_batch_head; 616 617 /* Store the idx value of used ring */ 618 uint16_t used_idx; 619 620 /* Used to track the state of each descriptor in descriptor table */ 621 DescStateSplit desc[]; 622 } QueueRegionSplit; 623 624To track inflight I/O, the queue region should be processed as follows: 625 626When receiving available buffers from the driver: 627 628#. Get the next available head-descriptor index from available ring, ``i`` 629 630#. Set ``desc[i].counter`` to the value of global counter 631 632#. Increase global counter by 1 633 634#. Set ``desc[i].inflight`` to 1 635 636When supplying used buffers to the driver: 637 6381. Get corresponding used head-descriptor index, i 639 6402. Set ``desc[i].next`` to ``last_batch_head`` 641 6423. Set ``last_batch_head`` to ``i`` 643 644#. Steps 1,2,3 may be performed repeatedly if batching is possible 645 646#. Increase the ``idx`` value of used ring by the size of the batch 647 648#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 649 650#. Set ``used_idx`` to the ``idx`` value of used ring 651 652When reconnecting: 653 654#. If the value of ``used_idx`` does not match the ``idx`` value of 655 used ring (means the inflight field of ``DescStateSplit`` entries in 656 last batch may be incorrect), 657 658 a. Subtract the value of ``used_idx`` from the ``idx`` value of 659 used ring to get last batch size of ``DescStateSplit`` entries 660 661 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 662 list which starts from ``last_batch_head`` 663 664 #. Set ``used_idx`` to the ``idx`` value of used ring 665 666#. Resubmit inflight ``DescStateSplit`` entries in order of their 667 counter value 668 669For packed virtqueue, queue region can be implemented as: 670 671.. code:: c 672 673 typedef struct DescStatePacked { 674 /* Indicate whether this descriptor is inflight or not. 675 * Only available for head-descriptor. */ 676 uint8_t inflight; 677 678 /* Padding */ 679 uint8_t padding; 680 681 /* Link to the next free entry */ 682 uint16_t next; 683 684 /* Link to the last entry of descriptor list. 685 * Only available for head-descriptor. */ 686 uint16_t last; 687 688 /* The length of descriptor list. 689 * Only available for head-descriptor. */ 690 uint16_t num; 691 692 /* Used to preserve the order of fetching available descriptors. 693 * Only available for head-descriptor. */ 694 uint64_t counter; 695 696 /* The buffer id */ 697 uint16_t id; 698 699 /* The descriptor flags */ 700 uint16_t flags; 701 702 /* The buffer length */ 703 uint32_t len; 704 705 /* The buffer address */ 706 uint64_t addr; 707 } DescStatePacked; 708 709 typedef struct QueueRegionPacked { 710 /* The feature flags of this region. Now it's initialized to 0. */ 711 uint64_t features; 712 713 /* The version of this region. It's 1 currently. 714 * Zero value indicates an uninitialized buffer */ 715 uint16_t version; 716 717 /* The size of DescStatePacked array. It's equal to the virtqueue size. 718 * The back-end could get it from queue size field of VhostUserInflight. */ 719 uint16_t desc_num; 720 721 /* The head of free DescStatePacked entry list */ 722 uint16_t free_head; 723 724 /* The old head of free DescStatePacked entry list */ 725 uint16_t old_free_head; 726 727 /* The used index of descriptor ring */ 728 uint16_t used_idx; 729 730 /* The old used index of descriptor ring */ 731 uint16_t old_used_idx; 732 733 /* Device ring wrap counter */ 734 uint8_t used_wrap_counter; 735 736 /* The old device ring wrap counter */ 737 uint8_t old_used_wrap_counter; 738 739 /* Padding */ 740 uint8_t padding[7]; 741 742 /* Used to track the state of each descriptor fetched from descriptor ring */ 743 DescStatePacked desc[]; 744 } QueueRegionPacked; 745 746To track inflight I/O, the queue region should be processed as follows: 747 748When receiving available buffers from the driver: 749 750#. Get the next available descriptor entry from descriptor ring, ``d`` 751 752#. If ``d`` is head descriptor, 753 754 a. Set ``desc[old_free_head].num`` to 0 755 756 #. Set ``desc[old_free_head].counter`` to the value of global counter 757 758 #. Increase global counter by 1 759 760 #. Set ``desc[old_free_head].inflight`` to 1 761 762#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 763 ``free_head`` 764 765#. Increase ``desc[old_free_head].num`` by 1 766 767#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 768 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 769 ``d.len``, ``d.flags``, ``d.id`` 770 771#. Set ``free_head`` to ``desc[free_head].next`` 772 773#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 774 775When supplying used buffers to the driver: 776 7771. Get corresponding used head-descriptor entry from descriptor ring, 778 ``d`` 779 7802. Get corresponding ``DescStatePacked`` entry, ``e`` 781 7823. Set ``desc[e.last].next`` to ``free_head`` 783 7844. Set ``free_head`` to the index of ``e`` 785 786#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 787 788#. Increase ``used_idx`` by the size of the batch and update 789 ``used_wrap_counter`` if needed 790 791#. Update ``d.flags`` 792 793#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 794 in the batch to 0 795 796#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 797 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 798 799When reconnecting: 800 801#. If ``used_idx`` does not match ``old_used_idx`` (means the 802 ``inflight`` field of ``DescStatePacked`` entries in last batch may 803 be incorrect), 804 805 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 806 807 #. Use ``old_used_wrap_counter`` to calculate the available flags 808 809 #. If ``d.flags`` is not equal to the calculated flags value (means 810 back-end has submitted the buffer to guest driver before crash, so 811 it has to commit the in-progres update), set ``old_free_head``, 812 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 813 ``used_idx``, ``used_wrap_counter`` 814 815#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 816 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 817 (roll back any in-progress update) 818 819#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 820 free list to 0 821 822#. Resubmit inflight ``DescStatePacked`` entries in order of their 823 counter value 824 825In-band notifications 826--------------------- 827 828In some limited situations (e.g. for simulation) it is desirable to 829have the kick, call and error (if used) signals done via in-band 830messages instead of asynchronous eventfd notifications. This can be 831done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 832protocol feature. 833 834Note that due to the fact that too many messages on the sockets can 835cause the sending application(s) to block, it is not advised to use 836this feature unless absolutely necessary. It is also considered an 837error to negotiate this feature without also negotiating 838``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 839the former is necessary for getting a message channel from the back-end 840to the front-end, while the latter needs to be used with the in-band 841notification messages to block until they are processed, both to avoid 842blocking later and for proper processing (at least in the simulation 843use case.) As it has no other way of signalling this error, the back-end 844should close the connection as a response to a 845``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 846notifications feature flag without the other two. 847 848Protocol features 849----------------- 850 851.. code:: c 852 853 #define VHOST_USER_PROTOCOL_F_MQ 0 854 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 855 #define VHOST_USER_PROTOCOL_F_RARP 2 856 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 857 #define VHOST_USER_PROTOCOL_F_MTU 4 858 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 859 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 860 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 861 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 862 #define VHOST_USER_PROTOCOL_F_CONFIG 9 863 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 864 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 865 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 866 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 867 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 868 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 869 #define VHOST_USER_PROTOCOL_F_STATUS 16 870 871Front-end message types 872----------------------- 873 874``VHOST_USER_GET_FEATURES`` 875 :id: 1 876 :equivalent ioctl: ``VHOST_GET_FEATURES`` 877 :request payload: N/A 878 :reply payload: ``u64`` 879 880 Get from the underlying vhost implementation the features bitmask. 881 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support 882 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 883 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 884 885``VHOST_USER_SET_FEATURES`` 886 :id: 2 887 :equivalent ioctl: ``VHOST_SET_FEATURES`` 888 :request payload: ``u64`` 889 :reply payload: N/A 890 891 Enable features in the underlying vhost implementation using a 892 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 893 back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 894 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 895 896``VHOST_USER_GET_PROTOCOL_FEATURES`` 897 :id: 15 898 :equivalent ioctl: ``VHOST_GET_FEATURES`` 899 :request payload: N/A 900 :reply payload: ``u64`` 901 902 Get the protocol feature bitmask from the underlying vhost 903 implementation. Only legal if feature bit 904 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 905 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 906 ``VHOST_USER_SET_FEATURES``. 907 908.. Note:: 909 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must 910 support this message even before ``VHOST_USER_SET_FEATURES`` was 911 called. 912 913``VHOST_USER_SET_PROTOCOL_FEATURES`` 914 :id: 16 915 :equivalent ioctl: ``VHOST_SET_FEATURES`` 916 :request payload: ``u64`` 917 :reply payload: N/A 918 919 Enable protocol features in the underlying vhost implementation. 920 921 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 922 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 923 ``VHOST_USER_SET_FEATURES``. 924 925.. Note:: 926 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 927 this message even before ``VHOST_USER_SET_FEATURES`` was called. 928 929``VHOST_USER_SET_OWNER`` 930 :id: 3 931 :equivalent ioctl: ``VHOST_SET_OWNER`` 932 :request payload: N/A 933 :reply payload: N/A 934 935 Issued when a new connection is established. It marks the sender 936 as the front-end that owns of the session. This can be used on the *back-end* 937 as a "session start" flag. 938 939``VHOST_USER_RESET_OWNER`` 940 :id: 4 941 :request payload: N/A 942 :reply payload: N/A 943 944.. admonition:: Deprecated 945 946 This is no longer used. Used to be sent to request disabling all 947 rings, but some back-ends interpreted it to also discard connection 948 state (this interpretation would lead to bugs). It is recommended 949 that back-ends either ignore this message, or use it to disable all 950 rings. 951 952``VHOST_USER_SET_MEM_TABLE`` 953 :id: 5 954 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 955 :request payload: memory regions description 956 :reply payload: (postcopy only) memory regions description 957 958 Sets the memory map regions on the back-end so it can translate the 959 vring addresses. In the ancillary data there is an array of file 960 descriptors for each memory mapped region. The size and ordering of 961 the fds matches the number and ordering of memory regions. 962 963 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 964 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 965 regions to the front-end. The back-end must have mmap'd the regions but 966 not yet accessed them and should not yet generate a userfault 967 event. 968 969.. Note:: 970 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 971 reply back to the list of mappings with an empty 972 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 973 reception of this message may the guest start accessing the memory 974 and generating faults. 975 976``VHOST_USER_SET_LOG_BASE`` 977 :id: 6 978 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 979 :request payload: u64 980 :reply payload: N/A 981 982 Sets logging shared memory space. 983 984 When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 985 the log memory fd is provided in the ancillary data of 986 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 987 memory area provided in the message. 988 989``VHOST_USER_SET_LOG_FD`` 990 :id: 7 991 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 992 :request payload: N/A 993 :reply payload: N/A 994 995 Sets the logging file descriptor, which is passed as ancillary data. 996 997``VHOST_USER_SET_VRING_NUM`` 998 :id: 8 999 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 1000 :request payload: vring state description 1001 :reply payload: N/A 1002 1003 Set the size of the queue. 1004 1005``VHOST_USER_SET_VRING_ADDR`` 1006 :id: 9 1007 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 1008 :request payload: vring address description 1009 :reply payload: N/A 1010 1011 Sets the addresses of the different aspects of the vring. 1012 1013``VHOST_USER_SET_VRING_BASE`` 1014 :id: 10 1015 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 1016 :request payload: vring state description 1017 :reply payload: N/A 1018 1019 Sets the base offset in the available vring. 1020 1021``VHOST_USER_GET_VRING_BASE`` 1022 :id: 11 1023 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 1024 :request payload: vring state description 1025 :reply payload: vring state description 1026 1027 Get the available vring base offset. 1028 1029``VHOST_USER_SET_VRING_KICK`` 1030 :id: 12 1031 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 1032 :request payload: ``u64`` 1033 :reply payload: N/A 1034 1035 Set the event file descriptor for adding buffers to the vring. It is 1036 passed in the ancillary data. 1037 1038 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1039 invalid FD flag. This flag is set when there is no file descriptor 1040 in the ancillary data. This signals that polling should be used 1041 instead of waiting for the kick. Note that if the protocol feature 1042 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1043 this message isn't necessary as the ring is also started on the 1044 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1045 set an event file descriptor (which will be preferred over the 1046 message) or to enable polling. 1047 1048``VHOST_USER_SET_VRING_CALL`` 1049 :id: 13 1050 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1051 :request payload: ``u64`` 1052 :reply payload: N/A 1053 1054 Set the event file descriptor to signal when buffers are used. It is 1055 passed in the ancillary data. 1056 1057 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1058 invalid FD flag. This flag is set when there is no file descriptor 1059 in the ancillary data. This signals that polling will be used 1060 instead of waiting for the call. Note that if the protocol features 1061 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1062 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1063 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be 1064 used, it may however still be used to set an event file descriptor 1065 or to enable polling. 1066 1067``VHOST_USER_SET_VRING_ERR`` 1068 :id: 14 1069 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1070 :request payload: ``u64`` 1071 :reply payload: N/A 1072 1073 Set the event file descriptor to signal when error occurs. It is 1074 passed in the ancillary data. 1075 1076 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1077 invalid FD flag. This flag is set when there is no file descriptor 1078 in the ancillary data. Note that if the protocol features 1079 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1080 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1081 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be 1082 used, it may however still be used to set an event file descriptor 1083 (which will be preferred over the message). 1084 1085``VHOST_USER_GET_QUEUE_NUM`` 1086 :id: 17 1087 :equivalent ioctl: N/A 1088 :request payload: N/A 1089 :reply payload: u64 1090 1091 Query how many queues the back-end supports. 1092 1093 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1094 is set in queried protocol features by 1095 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1096 1097``VHOST_USER_SET_VRING_ENABLE`` 1098 :id: 18 1099 :equivalent ioctl: N/A 1100 :request payload: vring state description 1101 :reply payload: N/A 1102 1103 Signal the back-end to enable or disable corresponding vring. 1104 1105 This request should be sent only when 1106 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1107 1108``VHOST_USER_SEND_RARP`` 1109 :id: 19 1110 :equivalent ioctl: N/A 1111 :request payload: ``u64`` 1112 :reply payload: N/A 1113 1114 Ask vhost user back-end to broadcast a fake RARP to notify the migration 1115 is terminated for guest that does not support GUEST_ANNOUNCE. 1116 1117 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1118 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1119 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1120 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1121 payload contain the mac address of the guest to allow the vhost user 1122 back-end to construct and broadcast the fake RARP. 1123 1124``VHOST_USER_NET_SET_MTU`` 1125 :id: 20 1126 :equivalent ioctl: N/A 1127 :request payload: ``u64`` 1128 :reply payload: N/A 1129 1130 Set host MTU value exposed to the guest. 1131 1132 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1133 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1134 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1135 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1136 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1137 1138 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1139 respond with zero in case the specified MTU is valid, or non-zero 1140 otherwise. 1141 1142``VHOST_USER_SET_SLAVE_REQ_FD`` 1143 :id: 21 1144 :equivalent ioctl: N/A 1145 :request payload: N/A 1146 :reply payload: N/A 1147 1148 Set the socket file descriptor for back-end initiated requests. It is passed 1149 in the ancillary data. 1150 1151 This request should be sent only when 1152 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1153 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1154 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1155 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1156 respond with zero for success, non-zero otherwise. 1157 1158``VHOST_USER_IOTLB_MSG`` 1159 :id: 22 1160 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1161 :request payload: ``struct vhost_iotlb_msg`` 1162 :reply payload: ``u64`` 1163 1164 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1165 1166 The front-end sends such requests to update and invalidate entries in the 1167 device IOTLB. The back-end has to acknowledge the request with sending 1168 zero as ``u64`` payload for success, non-zero otherwise. 1169 1170 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1171 feature has been successfully negotiated. 1172 1173``VHOST_USER_SET_VRING_ENDIAN`` 1174 :id: 23 1175 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1176 :request payload: vring state description 1177 :reply payload: N/A 1178 1179 Set the endianness of a VQ for legacy devices. Little-endian is 1180 indicated with state.num set to 0 and big-endian is indicated with 1181 state.num set to 1. Other values are invalid. 1182 1183 This request should be sent only when 1184 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1185 Backends that negotiated this feature should handle both 1186 endiannesses and expect this message once (per VQ) during device 1187 configuration (ie. before the front-end starts the VQ). 1188 1189``VHOST_USER_GET_CONFIG`` 1190 :id: 24 1191 :equivalent ioctl: N/A 1192 :request payload: virtio device config space 1193 :reply payload: virtio device config space 1194 1195 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1196 submitted by the vhost-user front-end to fetch the contents of the 1197 virtio device configuration space, vhost-user back-end's payload size 1198 MUST match the front-end's request, vhost-user back-end uses zero length of 1199 payload to indicate an error to the vhost-user front-end. The vhost-user 1200 front-end may cache the contents to avoid repeated 1201 ``VHOST_USER_GET_CONFIG`` calls. 1202 1203``VHOST_USER_SET_CONFIG`` 1204 :id: 25 1205 :equivalent ioctl: N/A 1206 :request payload: virtio device config space 1207 :reply payload: N/A 1208 1209 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1210 submitted by the vhost-user front-end when the Guest changes the virtio 1211 device configuration space and also can be used for live migration 1212 on the destination host. The vhost-user back-end must check the flags 1213 field, and back-ends MUST NOT accept SET_CONFIG for read-only 1214 configuration space fields unless the live migration bit is set. 1215 1216``VHOST_USER_CREATE_CRYPTO_SESSION`` 1217 :id: 26 1218 :equivalent ioctl: N/A 1219 :request payload: crypto session description 1220 :reply payload: crypto session description 1221 1222 Create a session for crypto operation. The back-end must return 1223 the session id, 0 or positive for success, negative for failure. 1224 This request should be sent only when 1225 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1226 successfully negotiated. It's a required feature for crypto 1227 devices. 1228 1229``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1230 :id: 27 1231 :equivalent ioctl: N/A 1232 :request payload: ``u64`` 1233 :reply payload: N/A 1234 1235 Close a session for crypto operation which was previously 1236 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1237 1238 This request should be sent only when 1239 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1240 successfully negotiated. It's a required feature for crypto 1241 devices. 1242 1243``VHOST_USER_POSTCOPY_ADVISE`` 1244 :id: 28 1245 :request payload: N/A 1246 :reply payload: userfault fd 1247 1248 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end 1249 advises back-end that a migration with postcopy enabled is underway, 1250 the back-end must open a userfaultfd for later use. Note that at this 1251 stage the migration is still in precopy mode. 1252 1253``VHOST_USER_POSTCOPY_LISTEN`` 1254 :id: 29 1255 :request payload: N/A 1256 :reply payload: N/A 1257 1258 The front-end advises back-end that a transition to postcopy mode has 1259 happened. The back-end must ensure that shared memory is registered 1260 with userfaultfd to cause faulting of non-present pages. 1261 1262 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1263 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1264 1265``VHOST_USER_POSTCOPY_END`` 1266 :id: 30 1267 :request payload: N/A 1268 :reply payload: ``u64`` 1269 1270 The front-end advises that postcopy migration has now completed. The back-end 1271 must disable the userfaultfd. The reply is an acknowledgement 1272 only. 1273 1274 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1275 is sent at the end of the migration, after 1276 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1277 1278 The value returned is an error indication; 0 is success. 1279 1280``VHOST_USER_GET_INFLIGHT_FD`` 1281 :id: 31 1282 :equivalent ioctl: N/A 1283 :request payload: inflight description 1284 :reply payload: N/A 1285 1286 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1287 been successfully negotiated, this message is submitted by the front-end to 1288 get a shared buffer from back-end. The shared buffer will be used to 1289 track inflight I/O by back-end. QEMU should retrieve a new one when vm 1290 reset. 1291 1292``VHOST_USER_SET_INFLIGHT_FD`` 1293 :id: 32 1294 :equivalent ioctl: N/A 1295 :request payload: inflight description 1296 :reply payload: N/A 1297 1298 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1299 been successfully negotiated, this message is submitted by the front-end to 1300 send the shared inflight buffer back to the back-end so that the back-end 1301 could get inflight I/O after a crash or restart. 1302 1303``VHOST_USER_GPU_SET_SOCKET`` 1304 :id: 33 1305 :equivalent ioctl: N/A 1306 :request payload: N/A 1307 :reply payload: N/A 1308 1309 Sets the GPU protocol socket file descriptor, which is passed as 1310 ancillary data. The GPU protocol is used to inform the front-end of 1311 rendering state and updates. See vhost-user-gpu.rst for details. 1312 1313``VHOST_USER_RESET_DEVICE`` 1314 :id: 34 1315 :equivalent ioctl: N/A 1316 :request payload: N/A 1317 :reply payload: N/A 1318 1319 Ask the vhost user back-end to disable all rings and reset all 1320 internal device state to the initial state, ready to be 1321 reinitialized. The back-end retains ownership of the device 1322 throughout the reset operation. 1323 1324 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1325 feature is set by the back-end. 1326 1327``VHOST_USER_VRING_KICK`` 1328 :id: 35 1329 :equivalent ioctl: N/A 1330 :request payload: vring state description 1331 :reply payload: N/A 1332 1333 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1334 feature has been successfully negotiated, this message may be 1335 submitted by the front-end to indicate that a buffer was added to 1336 the vring instead of signalling it using the vring's kick file 1337 descriptor or having the back-end rely on polling. 1338 1339 The state.num field is currently reserved and must be set to 0. 1340 1341``VHOST_USER_GET_MAX_MEM_SLOTS`` 1342 :id: 36 1343 :equivalent ioctl: N/A 1344 :request payload: N/A 1345 :reply payload: u64 1346 1347 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1348 feature has been successfully negotiated, this message is submitted 1349 by the front-end to the back-end. The back-end should return the message with a 1350 u64 payload containing the maximum number of memory slots for 1351 QEMU to expose to the guest. The value returned by the back-end 1352 will be capped at the maximum number of ram slots which can be 1353 supported by the target platform. 1354 1355``VHOST_USER_ADD_MEM_REG`` 1356 :id: 37 1357 :equivalent ioctl: N/A 1358 :request payload: N/A 1359 :reply payload: single memory region description 1360 1361 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1362 feature has been successfully negotiated, this message is submitted 1363 by the front-end to the back-end. The message payload contains a memory 1364 region descriptor struct, describing a region of guest memory which 1365 the back-end device must map in. When the 1366 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1367 been successfully negotiated, along with the 1368 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1369 update the memory tables of the back-end device. 1370 1371 Exactly one file descriptor from which the memory is mapped is 1372 passed in the ancillary data. 1373 1374 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end 1375 replies with the bases of the memory mapped region to the front-end. 1376 For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. 1377 They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. 1378 1379 Exactly one file descriptor from which the memory is mapped is 1380 passed in the ancillary data. 1381 1382 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end 1383 replies with the bases of the memory mapped region to the front-end. 1384 For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. 1385 They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. 1386 1387``VHOST_USER_REM_MEM_REG`` 1388 :id: 38 1389 :equivalent ioctl: N/A 1390 :request payload: N/A 1391 :reply payload: single memory region description 1392 1393 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1394 feature has been successfully negotiated, this message is submitted 1395 by the front-end to the back-end. The message payload contains a memory 1396 region descriptor struct, describing a region of guest memory which 1397 the back-end device must unmap. When the 1398 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1399 been successfully negotiated, along with the 1400 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1401 update the memory tables of the back-end device. 1402 1403 The memory region to be removed is identified by its guest address, 1404 user address and size. The mmap offset is ignored. 1405 1406 No file descriptors SHOULD be passed in the ancillary data. For 1407 compatibility with existing incorrect implementations, the back-end MAY 1408 accept messages with one file descriptor. If a file descriptor is 1409 passed, the back-end MUST close it without using it otherwise. 1410 1411 The memory region to be removed is identified by its guest address, 1412 user address and size. The mmap offset is ignored. 1413 1414 No file descriptors SHOULD be passed in the ancillary data. For 1415 compatibility with existing incorrect implementations, the back-end MAY 1416 accept messages with one file descriptor. If a file descriptor is 1417 passed, the back-end MUST close it without using it otherwise. 1418 1419``VHOST_USER_SET_STATUS`` 1420 :id: 39 1421 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1422 :request payload: ``u64`` 1423 :reply payload: N/A 1424 1425 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1426 successfully negotiated, this message is submitted by the front-end to 1427 notify the back-end with updated device status as defined in the Virtio 1428 specification. 1429 1430``VHOST_USER_GET_STATUS`` 1431 :id: 40 1432 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1433 :request payload: N/A 1434 :reply payload: ``u64`` 1435 1436 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1437 successfully negotiated, this message is submitted by the front-end to 1438 query the back-end for its device status as defined in the Virtio 1439 specification. 1440 1441 1442Back-end message types 1443---------------------- 1444 1445For this type of message, the request is sent by the back-end and the reply 1446is sent by the front-end. 1447 1448``VHOST_USER_SLAVE_IOTLB_MSG`` 1449 :id: 1 1450 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1451 :request payload: ``struct vhost_iotlb_msg`` 1452 :reply payload: N/A 1453 1454 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1455 The back-end sends such requests to notify of an IOTLB miss, or an IOTLB 1456 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1457 negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end 1458 must respond with zero when operation is successfully completed, or 1459 non-zero otherwise. This request should be send only when 1460 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1461 negotiated. 1462 1463``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1464 :id: 2 1465 :equivalent ioctl: N/A 1466 :request payload: N/A 1467 :reply payload: N/A 1468 1469 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1470 back-end sends such messages to notify that the virtio device's 1471 configuration space has changed, for those host devices which can 1472 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1473 message to the back-end to get the latest content. If 1474 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the 1475 ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when 1476 operation is successfully completed, or non-zero otherwise. 1477 1478``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1479 :id: 3 1480 :equivalent ioctl: N/A 1481 :request payload: vring area description 1482 :reply payload: N/A 1483 1484 Sets host notifier for a specified queue. The queue index is 1485 contained in the ``u64`` field of the vring area description. The 1486 host notifier is described by the file descriptor (typically it's a 1487 VFIO device fd) which is passed as ancillary data and the size 1488 (which is mmap size and should be the same as host page size) and 1489 offset (which is mmap offset) carried in the vring area 1490 description. QEMU can mmap the file descriptor based on the size and 1491 offset to get a memory range. Registering a host notifier means 1492 mapping this memory range to the VM as the specified queue's notify 1493 MMIO region. The back-end sends this request to tell QEMU to de-register 1494 the existing notifier if any and register the new notifier if the 1495 request is sent with a file descriptor. 1496 1497 This request should be sent only when 1498 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1499 successfully negotiated. 1500 1501``VHOST_USER_SLAVE_VRING_CALL`` 1502 :id: 4 1503 :equivalent ioctl: N/A 1504 :request payload: vring state description 1505 :reply payload: N/A 1506 1507 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1508 feature has been successfully negotiated, this message may be 1509 submitted by the back-end to indicate that a buffer was used from 1510 the vring instead of signalling this using the vring's call file 1511 descriptor or having the front-end relying on polling. 1512 1513 The state.num field is currently reserved and must be set to 0. 1514 1515``VHOST_USER_SLAVE_VRING_ERR`` 1516 :id: 5 1517 :equivalent ioctl: N/A 1518 :request payload: vring state description 1519 :reply payload: N/A 1520 1521 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1522 feature has been successfully negotiated, this message may be 1523 submitted by the back-end to indicate that an error occurred on the 1524 specific vring, instead of signalling the error file descriptor 1525 set by the front-end via ``VHOST_USER_SET_VRING_ERR``. 1526 1527 The state.num field is currently reserved and must be set to 0. 1528 1529.. _reply_ack: 1530 1531VHOST_USER_PROTOCOL_F_REPLY_ACK 1532------------------------------- 1533 1534The original vhost-user specification only demands replies for certain 1535commands. This differs from the vhost protocol implementation where 1536commands are sent over an ``ioctl()`` call and block until the back-end 1537has completed. 1538 1539With this protocol extension negotiated, the sender (QEMU) can set the 1540``need_reply`` [Bit 3] flag to any command. This indicates that the 1541back-end MUST respond with a Payload ``VhostUserMsg`` indicating success 1542or failure. The payload should be set to zero on success or non-zero 1543on failure, unless the message already has an explicit reply body. 1544 1545The reply payload gives QEMU a deterministic indication of the result 1546of the command. Today, QEMU is expected to terminate the main vhost-user 1547loop upon receiving such errors. In future, qemu could be taught to be more 1548resilient for selective requests. 1549 1550For the message types that already solicit a reply from the back-end, 1551the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1552being set brings no behavioural change. (See the Communication_ 1553section for details.) 1554 1555.. _backend_conventions: 1556 1557Backend program conventions 1558=========================== 1559 1560vhost-user back-ends can provide various devices & services and may 1561need to be configured manually depending on the use case. However, it 1562is a good idea to follow the conventions listed here when 1563possible. Users, QEMU or libvirt, can then rely on some common 1564behaviour to avoid heterogeneous configuration and management of the 1565back-end programs and facilitate interoperability. 1566 1567Each back-end installed on a host system should come with at least one 1568JSON file that conforms to the vhost-user.json schema. Each file 1569informs the management applications about the back-end type, and binary 1570location. In addition, it defines rules for management apps for 1571picking the highest priority back-end when multiple match the search 1572criteria (see ``@VhostUserBackend`` documentation in the schema file). 1573 1574If the back-end is not capable of enabling a requested feature on the 1575host (such as 3D acceleration with virgl), or the initialization 1576failed, the back-end should fail to start early and exit with a status 1577!= 0. It may also print a message to stderr for further details. 1578 1579The back-end program must not daemonize itself, but it may be 1580daemonized by the management layer. It may also have a restricted 1581access to the system. 1582 1583File descriptors 0, 1 and 2 will exist, and have regular 1584stdin/stdout/stderr usage (they may have been redirected to /dev/null 1585by the management layer, or to a log handler). 1586 1587The back-end program must end (as quickly and cleanly as possible) when 1588the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1589the management layer after a few seconds. 1590 1591The following command line options have an expected behaviour. They 1592are mandatory, unless explicitly said differently: 1593 1594--socket-path=PATH 1595 1596 This option specify the location of the vhost-user Unix domain socket. 1597 It is incompatible with --fd. 1598 1599--fd=FDNUM 1600 1601 When this argument is given, the back-end program is started with the 1602 vhost-user socket as file descriptor FDNUM. It is incompatible with 1603 --socket-path. 1604 1605--print-capabilities 1606 1607 Output to stdout the back-end capabilities in JSON format, and then 1608 exit successfully. Other options and arguments should be ignored, and 1609 the back-end program should not perform its normal function. The 1610 capabilities can be reported dynamically depending on the host 1611 capabilities. 1612 1613The JSON output is described in the ``vhost-user.json`` schema, by 1614```@VHostUserBackendCapabilities``. Example: 1615 1616.. code:: json 1617 1618 { 1619 "type": "foo", 1620 "features": [ 1621 "feature-a", 1622 "feature-b" 1623 ] 1624 } 1625 1626vhost-user-input 1627---------------- 1628 1629Command line options: 1630 1631--evdev-path=PATH 1632 1633 Specify the linux input device. 1634 1635 (optional) 1636 1637--no-grab 1638 1639 Do no request exclusive access to the input device. 1640 1641 (optional) 1642 1643vhost-user-gpu 1644-------------- 1645 1646Command line options: 1647 1648--render-node=PATH 1649 1650 Specify the GPU DRM render node. 1651 1652 (optional) 1653 1654--virgl 1655 1656 Enable virgl rendering support. 1657 1658 (optional) 1659 1660vhost-user-blk 1661-------------- 1662 1663Command line options: 1664 1665--blk-file=PATH 1666 1667 Specify block device or file path. 1668 1669 (optional) 1670 1671--read-only 1672 1673 Enable read-only. 1674 1675 (optional) 1676