1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6:Copyright: 2014 Virtual Open Systems Sarl. 7:Copyright: 2019 Intel Corporation 8:Licence: This work is licensed under the terms of the GNU GPL, 9 version 2 or later. See the COPYING file in the top-level 10 directory. 11 12.. contents:: Table of Contents 13 14Introduction 15============ 16 17This protocol is aiming to complement the ``ioctl`` interface used to 18control the vhost implementation in the Linux kernel. It implements 19the control plane needed to establish virtqueue sharing with a user 20space process on the same host. It uses communication over a Unix 21domain socket to share file descriptors in the ancillary data of the 22message. 23 24The protocol defines 2 sides of the communication, *master* and 25*slave*. *Master* is the application that shares its virtqueues, in 26our case QEMU. *Slave* is the consumer of the virtqueues. 27 28In the current implementation QEMU is the *master*, and the *slave* is 29the external process consuming the virtio queues, for example a 30software Ethernet switch running in user space, such as Snabbswitch, 31or a block device backend processing read & write to a virtual 32disk. In order to facilitate interoperability between various backend 33implementations, it is recommended to follow the :ref:`Backend program 34conventions <backend_conventions>`. 35 36*Master* and *slave* can be either a client (i.e. connecting) or 37server (listening) in the socket communication. 38 39Message Specification 40===================== 41 42.. Note:: All numbers are in the machine native byte order. 43 44A vhost-user message consists of 3 header fields and a payload. 45 46+---------+-------+------+---------+ 47| request | flags | size | payload | 48+---------+-------+------+---------+ 49 50Header 51------ 52 53:request: 32-bit type of the request 54 55:flags: 32-bit bit field 56 57- Lower 2 bits are the version (currently 0x01) 58- Bit 2 is the reply flag - needs to be sent on each reply from the slave 59- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 60 details. 61 62:size: 32-bit size of the payload 63 64Payload 65------- 66 67Depending on the request type, **payload** can be: 68 69A single 64-bit integer 70^^^^^^^^^^^^^^^^^^^^^^^ 71 72+-----+ 73| u64 | 74+-----+ 75 76:u64: a 64-bit unsigned integer 77 78A vring state description 79^^^^^^^^^^^^^^^^^^^^^^^^^ 80 81+-------+-----+ 82| index | num | 83+-------+-----+ 84 85:index: a 32-bit index 86 87:num: a 32-bit number 88 89A vring address description 90^^^^^^^^^^^^^^^^^^^^^^^^^^^ 91 92+-------+-------+------+------------+------+-----------+-----+ 93| index | flags | size | descriptor | used | available | log | 94+-------+-------+------+------------+------+-----------+-----+ 95 96:index: a 32-bit vring index 97 98:flags: a 32-bit vring flags 99 100:descriptor: a 64-bit ring address of the vring descriptor table 101 102:used: a 64-bit ring address of the vring used ring 103 104:available: a 64-bit ring address of the vring available ring 105 106:log: a 64-bit guest address for logging 107 108Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 109been negotiated. Otherwise it is a user address. 110 111Memory regions description 112^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------------+---------+---------+-----+---------+ 115| num regions | padding | region0 | ... | region7 | 116+-------------+---------+---------+-----+---------+ 117 118:num regions: a 32-bit number of regions 119 120:padding: 32-bit 121 122A region is: 123 124+---------------+------+--------------+-------------+ 125| guest address | size | user address | mmap offset | 126+---------------+------+--------------+-------------+ 127 128:guest address: a 64-bit guest address of the region 129 130:size: a 64-bit size 131 132:user address: a 64-bit user address 133 134:mmap offset: 64-bit offset where region starts in the mapped memory 135 136Single memory region description 137^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 138 139+---------+---------------+------+--------------+-------------+ 140| padding | guest address | size | user address | mmap offset | 141+---------+---------------+------+--------------+-------------+ 142 143:padding: 64-bit 144 145:guest address: a 64-bit guest address of the region 146 147:size: a 64-bit size 148 149:user address: a 64-bit user address 150 151:mmap offset: 64-bit offset where region starts in the mapped memory 152 153Log description 154^^^^^^^^^^^^^^^ 155 156+----------+------------+ 157| log size | log offset | 158+----------+------------+ 159 160:log size: size of area used for logging 161 162:log offset: offset from start of supplied file descriptor where 163 logging starts (i.e. where guest address 0 would be 164 logged) 165 166An IOTLB message 167^^^^^^^^^^^^^^^^ 168 169+------+------+--------------+-------------------+------+ 170| iova | size | user address | permissions flags | type | 171+------+------+--------------+-------------------+------+ 172 173:iova: a 64-bit I/O virtual address programmed by the guest 174 175:size: a 64-bit size 176 177:user address: a 64-bit user address 178 179:permissions flags: an 8-bit value: 180 - 0: No access 181 - 1: Read access 182 - 2: Write access 183 - 3: Read/Write access 184 185:type: an 8-bit IOTLB message type: 186 - 1: IOTLB miss 187 - 2: IOTLB update 188 - 3: IOTLB invalidate 189 - 4: IOTLB access fail 190 191Virtio device config space 192^^^^^^^^^^^^^^^^^^^^^^^^^^ 193 194+--------+------+-------+---------+ 195| offset | size | flags | payload | 196+--------+------+-------+---------+ 197 198:offset: a 32-bit offset of virtio device's configuration space 199 200:size: a 32-bit configuration space access size in bytes 201 202:flags: a 32-bit value: 203 - 0: Vhost master messages used for writeable fields 204 - 1: Vhost master messages used for live migration 205 206:payload: Size bytes array holding the contents of the virtio 207 device's configuration space 208 209Vring area description 210^^^^^^^^^^^^^^^^^^^^^^ 211 212+-----+------+--------+ 213| u64 | size | offset | 214+-----+------+--------+ 215 216:u64: a 64-bit integer contains vring index and flags 217 218:size: a 64-bit size of this area 219 220:offset: a 64-bit offset of this area from the start of the 221 supplied file descriptor 222 223Inflight description 224^^^^^^^^^^^^^^^^^^^^ 225 226+-----------+-------------+------------+------------+ 227| mmap size | mmap offset | num queues | queue size | 228+-----------+-------------+------------+------------+ 229 230:mmap size: a 64-bit size of area to track inflight I/O 231 232:mmap offset: a 64-bit offset of this area from the start 233 of the supplied file descriptor 234 235:num queues: a 16-bit number of virtqueues 236 237:queue size: a 16-bit size of virtqueues 238 239C structure 240----------- 241 242In QEMU the vhost-user message is implemented with the following struct: 243 244.. code:: c 245 246 typedef struct VhostUserMsg { 247 VhostUserRequest request; 248 uint32_t flags; 249 uint32_t size; 250 union { 251 uint64_t u64; 252 struct vhost_vring_state state; 253 struct vhost_vring_addr addr; 254 VhostUserMemory memory; 255 VhostUserLog log; 256 struct vhost_iotlb_msg iotlb; 257 VhostUserConfig config; 258 VhostUserVringArea area; 259 VhostUserInflight inflight; 260 }; 261 } QEMU_PACKED VhostUserMsg; 262 263Communication 264============= 265 266The protocol for vhost-user is based on the existing implementation of 267vhost for the Linux Kernel. Most messages that can be sent via the 268Unix domain socket implementing vhost-user have an equivalent ioctl to 269the kernel implementation. 270 271The communication consists of *master* sending message requests and 272*slave* sending message replies. Most of the requests don't require 273replies. Here is a list of the ones that do: 274 275* ``VHOST_USER_GET_FEATURES`` 276* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 277* ``VHOST_USER_GET_VRING_BASE`` 278* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 279* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 280 281.. seealso:: 282 283 :ref:`REPLY_ACK <reply_ack>` 284 The section on ``REPLY_ACK`` protocol extension. 285 286There are several messages that the master sends with file descriptors passed 287in the ancillary data: 288 289* ``VHOST_USER_SET_MEM_TABLE`` 290* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 291* ``VHOST_USER_SET_LOG_FD`` 292* ``VHOST_USER_SET_VRING_KICK`` 293* ``VHOST_USER_SET_VRING_CALL`` 294* ``VHOST_USER_SET_VRING_ERR`` 295* ``VHOST_USER_SET_SLAVE_REQ_FD`` 296* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 297 298If *master* is unable to send the full message or receives a wrong 299reply it will close the connection. An optional reconnection mechanism 300can be implemented. 301 302If *slave* detects some error such as incompatible features, it may also 303close the connection. This should only happen in exceptional circumstances. 304 305Any protocol extensions are gated by protocol feature bits, which 306allows full backwards compatibility on both master and slave. As 307older slaves don't support negotiating protocol features, a feature 308bit was dedicated for this purpose:: 309 310 #define VHOST_USER_F_PROTOCOL_FEATURES 30 311 312Starting and stopping rings 313--------------------------- 314 315Client must only process each ring when it is started. 316 317Client must only pass data between the ring and the backend, when the 318ring is enabled. 319 320If ring is started but disabled, client must process the ring without 321talking to the backend. 322 323For example, for a networking device, in the disabled state client 324must not supply any new RX packets, but must process and discard any 325TX packets. 326 327If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the 328ring is initialized in an enabled state. 329 330If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is 331initialized in a disabled state. Client must not pass data to/from the 332backend until ring is enabled by ``VHOST_USER_SET_VRING_ENABLE`` with 333parameter 1, or after it has been disabled by 334``VHOST_USER_SET_VRING_ENABLE`` with parameter 0. 335 336Each ring is initialized in a stopped state, client must not process 337it until ring is started, or after it has been stopped. 338 339Client must start ring upon receiving a kick (that is, detecting that 340file descriptor is readable) on the descriptor specified by 341``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 342``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving 343``VHOST_USER_GET_VRING_BASE``. 344 345While processing the rings (whether they are enabled or not), client 346must support changing some configuration aspects on the fly. 347 348Multiple queue support 349---------------------- 350 351Many devices have a fixed number of virtqueues. In this case the master 352already knows the number of available virtqueues without communicating with the 353slave. 354 355Some devices do not have a fixed number of virtqueues. Instead the maximum 356number of virtqueues is chosen by the slave. The number can depend on host 357resource availability or slave implementation details. Such devices are called 358multiple queue devices. 359 360Multiple queue support allows the slave to advertise the maximum number of 361queues. This is treated as a protocol extension, hence the slave has to 362implement protocol features first. The multiple queues feature is supported 363only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 364 365The max number of queues the slave supports can be queried with message 366``VHOST_USER_GET_QUEUE_NUM``. Master should stop when the number of requested 367queues is bigger than that. 368 369As all queues share one connection, the master uses a unique index for each 370queue in the sent message to identify a specified queue. 371 372The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 373vhost-user-net has historically automatically enabled the first queue pair. 374 375Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 376feature, even for devices with a fixed number of virtqueues, since it is simple 377to implement and offers a degree of introspection. 378 379Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 380devices with a fixed number of virtqueues. Only true multiqueue devices 381require this protocol feature. 382 383Migration 384--------- 385 386During live migration, the master may need to track the modifications 387the slave makes to the memory mapped regions. The client should mark 388the dirty pages in a log. Once it complies to this logging, it may 389declare the ``VHOST_F_LOG_ALL`` vhost feature. 390 391To start/stop logging of data/used ring writes, server may send 392messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 393``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 394flags set to 1/0, respectively. 395 396All the modifications to memory pointed by vring "descriptor" should 397be marked. Modifications to "used" vring should be marked if 398``VHOST_VRING_F_LOG`` is part of ring's flags. 399 400Dirty pages are of size:: 401 402 #define VHOST_LOG_PAGE 0x1000 403 404The log memory fd is provided in the ancillary data of 405``VHOST_USER_SET_LOG_BASE`` message when the slave has 406``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 407 408The size of the log is supplied as part of ``VhostUserMsg`` which 409should be large enough to cover all known guest addresses. Log starts 410at the supplied offset in the supplied file descriptor. The log 411covers from address 0 to the maximum of guest regions. In pseudo-code, 412to mark page at ``addr`` as dirty:: 413 414 page = addr / VHOST_LOG_PAGE 415 log[page / 8] |= 1 << page % 8 416 417Where ``addr`` is the guest physical address. 418 419Use atomic operations, as the log may be concurrently manipulated. 420 421Note that when logging modifications to the used ring (when 422``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 423be used to calculate the log offset: the write to first byte of the 424used ring is logged at this offset from log start. Also note that this 425value might be outside the legal guest physical address range 426(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 427the bit offset of the last byte of the ring must fall within the size 428supplied by ``VhostUserLog``. 429 430``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 431ancillary data, it may be used to inform the master that the log has 432been modified. 433 434Once the source has finished migration, rings will be stopped by the 435source. No further update must be done before rings are restarted. 436 437In postcopy migration the slave is started before all the memory has 438been received from the source host, and care must be taken to avoid 439accessing pages that have yet to be received. The slave opens a 440'userfault'-fd and registers the memory with it; this fd is then 441passed back over to the master. The master services requests on the 442userfaultfd for pages that are accessed and when the page is available 443it performs WAKE ioctl's on the userfaultfd to wake the stalled 444slave. The client indicates support for this via the 445``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 446 447Memory access 448------------- 449 450The master sends a list of vhost memory regions to the slave using the 451``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 452addresses: a guest address and a user address. 453 454Messages contain guest addresses and/or user addresses to reference locations 455within the shared memory. The mapping of these addresses works as follows. 456 457User addresses map to the vhost memory region containing that user address. 458 459When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 460 461* Guest addresses map to the vhost memory region containing that guest 462 address. 463 464When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 465 466* Guest addresses are also called I/O virtual addresses (IOVAs). They are 467 translated to user addresses via the IOTLB. 468 469* The vhost memory region guest address is not used. 470 471IOMMU support 472------------- 473 474When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 475master sends IOTLB entries update & invalidation by sending 476``VHOST_USER_IOTLB_MSG`` requests to the slave with a ``struct 477vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 478has to be filled with the update message type (2), the I/O virtual 479address, the size, the user virtual address, and the permissions 480flags. Addresses and size must be within vhost memory regions set via 481the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 482``iotlb`` payload has to be filled with the invalidation message type 483(3), the I/O virtual address and the size. On success, the slave is 484expected to reply with a zero payload, non-zero otherwise. 485 486The slave relies on the slave communication channel (see :ref:`Slave 487communication <slave_communication>` section below) to send IOTLB miss 488and access failure events, by sending ``VHOST_USER_SLAVE_IOTLB_MSG`` 489requests to the master with a ``struct vhost_iotlb_msg`` as 490payload. For miss events, the iotlb payload has to be filled with the 491miss message type (1), the I/O virtual address and the permissions 492flags. For access failure event, the iotlb payload has to be filled 493with the access failure message type (4), the I/O virtual address and 494the permissions flags. For synchronization purpose, the slave may 495rely on the reply-ack feature, so the master may send a reply when 496operation is completed if the reply-ack feature is negotiated and 497slaves requests a reply. For miss events, completed operation means 498either master sent an update message containing the IOTLB entry 499containing requested address and permission, or master sent nothing if 500the IOTLB miss message is invalid (invalid IOVA or permission). 501 502The master isn't expected to take the initiative to send IOTLB update 503messages, as the slave sends IOTLB miss messages for the guest virtual 504memory areas it needs to access. 505 506.. _slave_communication: 507 508Slave communication 509------------------- 510 511An optional communication channel is provided if the slave declares 512``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` protocol feature, to allow the 513slave to make requests to the master. 514 515The fd is provided via ``VHOST_USER_SET_SLAVE_REQ_FD`` ancillary data. 516 517A slave may then send ``VHOST_USER_SLAVE_*`` messages to the master 518using this fd communication channel. 519 520If ``VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD`` protocol feature is 521negotiated, slave can send file descriptors (at most 8 descriptors in 522each message) to master via ancillary data using this fd communication 523channel. 524 525Inflight I/O tracking 526--------------------- 527 528To support reconnecting after restart or crash, slave may need to 529resubmit inflight I/Os. If virtqueue is processed in order, we can 530easily achieve that by getting the inflight descriptors from 531descriptor table (split virtqueue) or descriptor ring (packed 532virtqueue). However, it can't work when we process descriptors 533out-of-order because some entries which store the information of 534inflight descriptors in available ring (split virtqueue) or descriptor 535ring (packed virtqueue) might be overridden by new entries. To solve 536this problem, slave need to allocate an extra buffer to store this 537information of inflight descriptors and share it with master for 538persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 539``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 540between master and slave. And the format of this buffer is described 541below: 542 543+---------------+---------------+-----+---------------+ 544| queue0 region | queue1 region | ... | queueN region | 545+---------------+---------------+-----+---------------+ 546 547N is the number of available virtqueues. Slave could get it from num 548queues field of ``VhostUserInflight``. 549 550For split virtqueue, queue region can be implemented as: 551 552.. code:: c 553 554 typedef struct DescStateSplit { 555 /* Indicate whether this descriptor is inflight or not. 556 * Only available for head-descriptor. */ 557 uint8_t inflight; 558 559 /* Padding */ 560 uint8_t padding[5]; 561 562 /* Maintain a list for the last batch of used descriptors. 563 * Only available when batching is used for submitting */ 564 uint16_t next; 565 566 /* Used to preserve the order of fetching available descriptors. 567 * Only available for head-descriptor. */ 568 uint64_t counter; 569 } DescStateSplit; 570 571 typedef struct QueueRegionSplit { 572 /* The feature flags of this region. Now it's initialized to 0. */ 573 uint64_t features; 574 575 /* The version of this region. It's 1 currently. 576 * Zero value indicates an uninitialized buffer */ 577 uint16_t version; 578 579 /* The size of DescStateSplit array. It's equal to the virtqueue 580 * size. Slave could get it from queue size field of VhostUserInflight. */ 581 uint16_t desc_num; 582 583 /* The head of list that track the last batch of used descriptors. */ 584 uint16_t last_batch_head; 585 586 /* Store the idx value of used ring */ 587 uint16_t used_idx; 588 589 /* Used to track the state of each descriptor in descriptor table */ 590 DescStateSplit desc[]; 591 } QueueRegionSplit; 592 593To track inflight I/O, the queue region should be processed as follows: 594 595When receiving available buffers from the driver: 596 597#. Get the next available head-descriptor index from available ring, ``i`` 598 599#. Set ``desc[i].counter`` to the value of global counter 600 601#. Increase global counter by 1 602 603#. Set ``desc[i].inflight`` to 1 604 605When supplying used buffers to the driver: 606 6071. Get corresponding used head-descriptor index, i 608 6092. Set ``desc[i].next`` to ``last_batch_head`` 610 6113. Set ``last_batch_head`` to ``i`` 612 613#. Steps 1,2,3 may be performed repeatedly if batching is possible 614 615#. Increase the ``idx`` value of used ring by the size of the batch 616 617#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 618 619#. Set ``used_idx`` to the ``idx`` value of used ring 620 621When reconnecting: 622 623#. If the value of ``used_idx`` does not match the ``idx`` value of 624 used ring (means the inflight field of ``DescStateSplit`` entries in 625 last batch may be incorrect), 626 627 a. Subtract the value of ``used_idx`` from the ``idx`` value of 628 used ring to get last batch size of ``DescStateSplit`` entries 629 630 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 631 list which starts from ``last_batch_head`` 632 633 #. Set ``used_idx`` to the ``idx`` value of used ring 634 635#. Resubmit inflight ``DescStateSplit`` entries in order of their 636 counter value 637 638For packed virtqueue, queue region can be implemented as: 639 640.. code:: c 641 642 typedef struct DescStatePacked { 643 /* Indicate whether this descriptor is inflight or not. 644 * Only available for head-descriptor. */ 645 uint8_t inflight; 646 647 /* Padding */ 648 uint8_t padding; 649 650 /* Link to the next free entry */ 651 uint16_t next; 652 653 /* Link to the last entry of descriptor list. 654 * Only available for head-descriptor. */ 655 uint16_t last; 656 657 /* The length of descriptor list. 658 * Only available for head-descriptor. */ 659 uint16_t num; 660 661 /* Used to preserve the order of fetching available descriptors. 662 * Only available for head-descriptor. */ 663 uint64_t counter; 664 665 /* The buffer id */ 666 uint16_t id; 667 668 /* The descriptor flags */ 669 uint16_t flags; 670 671 /* The buffer length */ 672 uint32_t len; 673 674 /* The buffer address */ 675 uint64_t addr; 676 } DescStatePacked; 677 678 typedef struct QueueRegionPacked { 679 /* The feature flags of this region. Now it's initialized to 0. */ 680 uint64_t features; 681 682 /* The version of this region. It's 1 currently. 683 * Zero value indicates an uninitialized buffer */ 684 uint16_t version; 685 686 /* The size of DescStatePacked array. It's equal to the virtqueue 687 * size. Slave could get it from queue size field of VhostUserInflight. */ 688 uint16_t desc_num; 689 690 /* The head of free DescStatePacked entry list */ 691 uint16_t free_head; 692 693 /* The old head of free DescStatePacked entry list */ 694 uint16_t old_free_head; 695 696 /* The used index of descriptor ring */ 697 uint16_t used_idx; 698 699 /* The old used index of descriptor ring */ 700 uint16_t old_used_idx; 701 702 /* Device ring wrap counter */ 703 uint8_t used_wrap_counter; 704 705 /* The old device ring wrap counter */ 706 uint8_t old_used_wrap_counter; 707 708 /* Padding */ 709 uint8_t padding[7]; 710 711 /* Used to track the state of each descriptor fetched from descriptor ring */ 712 DescStatePacked desc[]; 713 } QueueRegionPacked; 714 715To track inflight I/O, the queue region should be processed as follows: 716 717When receiving available buffers from the driver: 718 719#. Get the next available descriptor entry from descriptor ring, ``d`` 720 721#. If ``d`` is head descriptor, 722 723 a. Set ``desc[old_free_head].num`` to 0 724 725 #. Set ``desc[old_free_head].counter`` to the value of global counter 726 727 #. Increase global counter by 1 728 729 #. Set ``desc[old_free_head].inflight`` to 1 730 731#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 732 ``free_head`` 733 734#. Increase ``desc[old_free_head].num`` by 1 735 736#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 737 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 738 ``d.len``, ``d.flags``, ``d.id`` 739 740#. Set ``free_head`` to ``desc[free_head].next`` 741 742#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 743 744When supplying used buffers to the driver: 745 7461. Get corresponding used head-descriptor entry from descriptor ring, 747 ``d`` 748 7492. Get corresponding ``DescStatePacked`` entry, ``e`` 750 7513. Set ``desc[e.last].next`` to ``free_head`` 752 7534. Set ``free_head`` to the index of ``e`` 754 755#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 756 757#. Increase ``used_idx`` by the size of the batch and update 758 ``used_wrap_counter`` if needed 759 760#. Update ``d.flags`` 761 762#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 763 in the batch to 0 764 765#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 766 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 767 768When reconnecting: 769 770#. If ``used_idx`` does not match ``old_used_idx`` (means the 771 ``inflight`` field of ``DescStatePacked`` entries in last batch may 772 be incorrect), 773 774 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 775 776 #. Use ``old_used_wrap_counter`` to calculate the available flags 777 778 #. If ``d.flags`` is not equal to the calculated flags value (means 779 slave has submitted the buffer to guest driver before crash, so 780 it has to commit the in-progres update), set ``old_free_head``, 781 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 782 ``used_idx``, ``used_wrap_counter`` 783 784#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 785 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 786 (roll back any in-progress update) 787 788#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 789 free list to 0 790 791#. Resubmit inflight ``DescStatePacked`` entries in order of their 792 counter value 793 794In-band notifications 795--------------------- 796 797In some limited situations (e.g. for simulation) it is desirable to 798have the kick, call and error (if used) signals done via in-band 799messages instead of asynchronous eventfd notifications. This can be 800done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 801protocol feature. 802 803Note that due to the fact that too many messages on the sockets can 804cause the sending application(s) to block, it is not advised to use 805this feature unless absolutely necessary. It is also considered an 806error to negotiate this feature without also negotiating 807``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 808the former is necessary for getting a message channel from the slave 809to the master, while the latter needs to be used with the in-band 810notification messages to block until they are processed, both to avoid 811blocking later and for proper processing (at least in the simulation 812use case.) As it has no other way of signalling this error, the slave 813should close the connection as a response to a 814``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 815notifications feature flag without the other two. 816 817Protocol features 818----------------- 819 820.. code:: c 821 822 #define VHOST_USER_PROTOCOL_F_MQ 0 823 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 824 #define VHOST_USER_PROTOCOL_F_RARP 2 825 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 826 #define VHOST_USER_PROTOCOL_F_MTU 4 827 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 828 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 829 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 830 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 831 #define VHOST_USER_PROTOCOL_F_CONFIG 9 832 #define VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD 10 833 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 834 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 835 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 836 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 837 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 838 #define VHOST_USER_PROTOCOL_F_STATUS 16 839 840Master message types 841-------------------- 842 843``VHOST_USER_GET_FEATURES`` 844 :id: 1 845 :equivalent ioctl: ``VHOST_GET_FEATURES`` 846 :master payload: N/A 847 :slave payload: ``u64`` 848 849 Get from the underlying vhost implementation the features bitmask. 850 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals slave support 851 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 852 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 853 854``VHOST_USER_SET_FEATURES`` 855 :id: 2 856 :equivalent ioctl: ``VHOST_SET_FEATURES`` 857 :master payload: ``u64`` 858 859 Enable features in the underlying vhost implementation using a 860 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 861 slave support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 862 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 863 864``VHOST_USER_GET_PROTOCOL_FEATURES`` 865 :id: 15 866 :equivalent ioctl: ``VHOST_GET_FEATURES`` 867 :master payload: N/A 868 :slave payload: ``u64`` 869 870 Get the protocol feature bitmask from the underlying vhost 871 implementation. Only legal if feature bit 872 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 873 ``VHOST_USER_GET_FEATURES``. 874 875.. Note:: 876 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must 877 support this message even before ``VHOST_USER_SET_FEATURES`` was 878 called. 879 880``VHOST_USER_SET_PROTOCOL_FEATURES`` 881 :id: 16 882 :equivalent ioctl: ``VHOST_SET_FEATURES`` 883 :master payload: ``u64`` 884 885 Enable protocol features in the underlying vhost implementation. 886 887 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 888 ``VHOST_USER_GET_FEATURES``. 889 890.. Note:: 891 Slave that reported ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 892 this message even before ``VHOST_USER_SET_FEATURES`` was called. 893 894``VHOST_USER_SET_OWNER`` 895 :id: 3 896 :equivalent ioctl: ``VHOST_SET_OWNER`` 897 :master payload: N/A 898 899 Issued when a new connection is established. It sets the current 900 *master* as an owner of the session. This can be used on the *slave* 901 as a "session start" flag. 902 903``VHOST_USER_RESET_OWNER`` 904 :id: 4 905 :master payload: N/A 906 907.. admonition:: Deprecated 908 909 This is no longer used. Used to be sent to request disabling all 910 rings, but some clients interpreted it to also discard connection 911 state (this interpretation would lead to bugs). It is recommended 912 that clients either ignore this message, or use it to disable all 913 rings. 914 915``VHOST_USER_SET_MEM_TABLE`` 916 :id: 5 917 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 918 :master payload: memory regions description 919 :slave payload: (postcopy only) memory regions description 920 921 Sets the memory map regions on the slave so it can translate the 922 vring addresses. In the ancillary data there is an array of file 923 descriptors for each memory mapped region. The size and ordering of 924 the fds matches the number and ordering of memory regions. 925 926 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 927 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 928 regions to the master. The slave must have mmap'd the regions but 929 not yet accessed them and should not yet generate a userfault 930 event. 931 932.. Note:: 933 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 934 reply back to the list of mappings with an empty 935 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 936 reception of this message may the guest start accessing the memory 937 and generating faults. 938 939``VHOST_USER_SET_LOG_BASE`` 940 :id: 6 941 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 942 :master payload: u64 943 :slave payload: N/A 944 945 Sets logging shared memory space. 946 947 When slave has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 948 the log memory fd is provided in the ancillary data of 949 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 950 memory area provided in the message. 951 952``VHOST_USER_SET_LOG_FD`` 953 :id: 7 954 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 955 :master payload: N/A 956 957 Sets the logging file descriptor, which is passed as ancillary data. 958 959``VHOST_USER_SET_VRING_NUM`` 960 :id: 8 961 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 962 :master payload: vring state description 963 964 Set the size of the queue. 965 966``VHOST_USER_SET_VRING_ADDR`` 967 :id: 9 968 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 969 :master payload: vring address description 970 :slave payload: N/A 971 972 Sets the addresses of the different aspects of the vring. 973 974``VHOST_USER_SET_VRING_BASE`` 975 :id: 10 976 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 977 :master payload: vring state description 978 979 Sets the base offset in the available vring. 980 981``VHOST_USER_GET_VRING_BASE`` 982 :id: 11 983 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 984 :master payload: vring state description 985 :slave payload: vring state description 986 987 Get the available vring base offset. 988 989``VHOST_USER_SET_VRING_KICK`` 990 :id: 12 991 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 992 :master payload: ``u64`` 993 994 Set the event file descriptor for adding buffers to the vring. It is 995 passed in the ancillary data. 996 997 Bits (0-7) of the payload contain the vring index. Bit 8 is the 998 invalid FD flag. This flag is set when there is no file descriptor 999 in the ancillary data. This signals that polling should be used 1000 instead of waiting for the kick. Note that if the protocol feature 1001 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1002 this message isn't necessary as the ring is also started on the 1003 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1004 set an event file descriptor (which will be preferred over the 1005 message) or to enable polling. 1006 1007``VHOST_USER_SET_VRING_CALL`` 1008 :id: 13 1009 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1010 :master payload: ``u64`` 1011 1012 Set the event file descriptor to signal when buffers are used. It is 1013 passed in the ancillary data. 1014 1015 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1016 invalid FD flag. This flag is set when there is no file descriptor 1017 in the ancillary data. This signals that polling will be used 1018 instead of waiting for the call. Note that if the protocol features 1019 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1020 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1021 isn't necessary as the ``VHOST_USER_SLAVE_VRING_CALL`` message can be 1022 used, it may however still be used to set an event file descriptor 1023 or to enable polling. 1024 1025``VHOST_USER_SET_VRING_ERR`` 1026 :id: 14 1027 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1028 :master payload: ``u64`` 1029 1030 Set the event file descriptor to signal when error occurs. It is 1031 passed in the ancillary data. 1032 1033 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1034 invalid FD flag. This flag is set when there is no file descriptor 1035 in the ancillary data. Note that if the protocol features 1036 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1037 ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` have been negotiated this message 1038 isn't necessary as the ``VHOST_USER_SLAVE_VRING_ERR`` message can be 1039 used, it may however still be used to set an event file descriptor 1040 (which will be preferred over the message). 1041 1042``VHOST_USER_GET_QUEUE_NUM`` 1043 :id: 17 1044 :equivalent ioctl: N/A 1045 :master payload: N/A 1046 :slave payload: u64 1047 1048 Query how many queues the backend supports. 1049 1050 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1051 is set in queried protocol features by 1052 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1053 1054``VHOST_USER_SET_VRING_ENABLE`` 1055 :id: 18 1056 :equivalent ioctl: N/A 1057 :master payload: vring state description 1058 1059 Signal slave to enable or disable corresponding vring. 1060 1061 This request should be sent only when 1062 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1063 1064``VHOST_USER_SEND_RARP`` 1065 :id: 19 1066 :equivalent ioctl: N/A 1067 :master payload: ``u64`` 1068 1069 Ask vhost user backend to broadcast a fake RARP to notify the migration 1070 is terminated for guest that does not support GUEST_ANNOUNCE. 1071 1072 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1073 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1074 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1075 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1076 payload contain the mac address of the guest to allow the vhost user 1077 backend to construct and broadcast the fake RARP. 1078 1079``VHOST_USER_NET_SET_MTU`` 1080 :id: 20 1081 :equivalent ioctl: N/A 1082 :master payload: ``u64`` 1083 1084 Set host MTU value exposed to the guest. 1085 1086 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1087 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1088 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1089 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1090 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1091 1092 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1093 respond with zero in case the specified MTU is valid, or non-zero 1094 otherwise. 1095 1096``VHOST_USER_SET_SLAVE_REQ_FD`` 1097 :id: 21 1098 :equivalent ioctl: N/A 1099 :master payload: N/A 1100 1101 Set the socket file descriptor for slave initiated requests. It is passed 1102 in the ancillary data. 1103 1104 This request should be sent only when 1105 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1106 feature bit ``VHOST_USER_PROTOCOL_F_SLAVE_REQ`` bit is present in 1107 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1108 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, slave must 1109 respond with zero for success, non-zero otherwise. 1110 1111``VHOST_USER_IOTLB_MSG`` 1112 :id: 22 1113 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1114 :master payload: ``struct vhost_iotlb_msg`` 1115 :slave payload: ``u64`` 1116 1117 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1118 1119 Master sends such requests to update and invalidate entries in the 1120 device IOTLB. The slave has to acknowledge the request with sending 1121 zero as ``u64`` payload for success, non-zero otherwise. 1122 1123 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1124 feature has been successfully negotiated. 1125 1126``VHOST_USER_SET_VRING_ENDIAN`` 1127 :id: 23 1128 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1129 :master payload: vring state description 1130 1131 Set the endianness of a VQ for legacy devices. Little-endian is 1132 indicated with state.num set to 0 and big-endian is indicated with 1133 state.num set to 1. Other values are invalid. 1134 1135 This request should be sent only when 1136 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1137 Backends that negotiated this feature should handle both 1138 endiannesses and expect this message once (per VQ) during device 1139 configuration (ie. before the master starts the VQ). 1140 1141``VHOST_USER_GET_CONFIG`` 1142 :id: 24 1143 :equivalent ioctl: N/A 1144 :master payload: virtio device config space 1145 :slave payload: virtio device config space 1146 1147 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1148 submitted by the vhost-user master to fetch the contents of the 1149 virtio device configuration space, vhost-user slave's payload size 1150 MUST match master's request, vhost-user slave uses zero length of 1151 payload to indicate an error to vhost-user master. The vhost-user 1152 master may cache the contents to avoid repeated 1153 ``VHOST_USER_GET_CONFIG`` calls. 1154 1155``VHOST_USER_SET_CONFIG`` 1156 :id: 25 1157 :equivalent ioctl: N/A 1158 :master payload: virtio device config space 1159 :slave payload: N/A 1160 1161 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1162 submitted by the vhost-user master when the Guest changes the virtio 1163 device configuration space and also can be used for live migration 1164 on the destination host. The vhost-user slave must check the flags 1165 field, and slaves MUST NOT accept SET_CONFIG for read-only 1166 configuration space fields unless the live migration bit is set. 1167 1168``VHOST_USER_CREATE_CRYPTO_SESSION`` 1169 :id: 26 1170 :equivalent ioctl: N/A 1171 :master payload: crypto session description 1172 :slave payload: crypto session description 1173 1174 Create a session for crypto operation. The server side must return 1175 the session id, 0 or positive for success, negative for failure. 1176 This request should be sent only when 1177 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1178 successfully negotiated. It's a required feature for crypto 1179 devices. 1180 1181``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1182 :id: 27 1183 :equivalent ioctl: N/A 1184 :master payload: ``u64`` 1185 1186 Close a session for crypto operation which was previously 1187 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1188 1189 This request should be sent only when 1190 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1191 successfully negotiated. It's a required feature for crypto 1192 devices. 1193 1194``VHOST_USER_POSTCOPY_ADVISE`` 1195 :id: 28 1196 :master payload: N/A 1197 :slave payload: userfault fd 1198 1199 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the master 1200 advises slave that a migration with postcopy enabled is underway, 1201 the slave must open a userfaultfd for later use. Note that at this 1202 stage the migration is still in precopy mode. 1203 1204``VHOST_USER_POSTCOPY_LISTEN`` 1205 :id: 29 1206 :master payload: N/A 1207 1208 Master advises slave that a transition to postcopy mode has 1209 happened. The slave must ensure that shared memory is registered 1210 with userfaultfd to cause faulting of non-present pages. 1211 1212 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1213 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1214 1215``VHOST_USER_POSTCOPY_END`` 1216 :id: 30 1217 :slave payload: ``u64`` 1218 1219 Master advises that postcopy migration has now completed. The slave 1220 must disable the userfaultfd. The response is an acknowledgement 1221 only. 1222 1223 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1224 is sent at the end of the migration, after 1225 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1226 1227 The value returned is an error indication; 0 is success. 1228 1229``VHOST_USER_GET_INFLIGHT_FD`` 1230 :id: 31 1231 :equivalent ioctl: N/A 1232 :master payload: inflight description 1233 1234 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1235 been successfully negotiated, this message is submitted by master to 1236 get a shared buffer from slave. The shared buffer will be used to 1237 track inflight I/O by slave. QEMU should retrieve a new one when vm 1238 reset. 1239 1240``VHOST_USER_SET_INFLIGHT_FD`` 1241 :id: 32 1242 :equivalent ioctl: N/A 1243 :master payload: inflight description 1244 1245 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1246 been successfully negotiated, this message is submitted by master to 1247 send the shared inflight buffer back to slave so that slave could 1248 get inflight I/O after a crash or restart. 1249 1250``VHOST_USER_GPU_SET_SOCKET`` 1251 :id: 33 1252 :equivalent ioctl: N/A 1253 :master payload: N/A 1254 1255 Sets the GPU protocol socket file descriptor, which is passed as 1256 ancillary data. The GPU protocol is used to inform the master of 1257 rendering state and updates. See vhost-user-gpu.rst for details. 1258 1259``VHOST_USER_RESET_DEVICE`` 1260 :id: 34 1261 :equivalent ioctl: N/A 1262 :master payload: N/A 1263 :slave payload: N/A 1264 1265 Ask the vhost user backend to disable all rings and reset all 1266 internal device state to the initial state, ready to be 1267 reinitialized. The backend retains ownership of the device 1268 throughout the reset operation. 1269 1270 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1271 feature is set by the backend. 1272 1273``VHOST_USER_VRING_KICK`` 1274 :id: 35 1275 :equivalent ioctl: N/A 1276 :slave payload: vring state description 1277 :master payload: N/A 1278 1279 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1280 feature has been successfully negotiated, this message may be 1281 submitted by the master to indicate that a buffer was added to 1282 the vring instead of signalling it using the vring's kick file 1283 descriptor or having the slave rely on polling. 1284 1285 The state.num field is currently reserved and must be set to 0. 1286 1287``VHOST_USER_GET_MAX_MEM_SLOTS`` 1288 :id: 36 1289 :equivalent ioctl: N/A 1290 :slave payload: u64 1291 1292 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1293 feature has been successfully negotiated, this message is submitted 1294 by master to the slave. The slave should return the message with a 1295 u64 payload containing the maximum number of memory slots for 1296 QEMU to expose to the guest. The value returned by the backend 1297 will be capped at the maximum number of ram slots which can be 1298 supported by the target platform. 1299 1300``VHOST_USER_ADD_MEM_REG`` 1301 :id: 37 1302 :equivalent ioctl: N/A 1303 :slave payload: single memory region description 1304 1305 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1306 feature has been successfully negotiated, this message is submitted 1307 by the master to the slave. The message payload contains a memory 1308 region descriptor struct, describing a region of guest memory which 1309 the slave device must map in. When the 1310 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1311 been successfully negotiated, along with the 1312 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1313 update the memory tables of the slave device. 1314 1315``VHOST_USER_REM_MEM_REG`` 1316 :id: 38 1317 :equivalent ioctl: N/A 1318 :slave payload: single memory region description 1319 1320 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1321 feature has been successfully negotiated, this message is submitted 1322 by the master to the slave. The message payload contains a memory 1323 region descriptor struct, describing a region of guest memory which 1324 the slave device must unmap. When the 1325 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1326 been successfully negotiated, along with the 1327 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1328 update the memory tables of the slave device. 1329 1330``VHOST_USER_SET_STATUS`` 1331 :id: 39 1332 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1333 :slave payload: N/A 1334 :master payload: ``u64`` 1335 1336 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1337 successfully negotiated, this message is submitted by the master to 1338 notify the backend with updated device status as defined in the Virtio 1339 specification. 1340 1341``VHOST_USER_GET_STATUS`` 1342 :id: 40 1343 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1344 :slave payload: ``u64`` 1345 :master payload: N/A 1346 1347 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1348 successfully negotiated, this message is submitted by the master to 1349 query the backend for its device status as defined in the Virtio 1350 specification. 1351 1352 1353Slave message types 1354------------------- 1355 1356``VHOST_USER_SLAVE_IOTLB_MSG`` 1357 :id: 1 1358 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1359 :slave payload: ``struct vhost_iotlb_msg`` 1360 :master payload: N/A 1361 1362 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1363 Slave sends such requests to notify of an IOTLB miss, or an IOTLB 1364 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1365 negotiated, and slave set the ``VHOST_USER_NEED_REPLY`` flag, master 1366 must respond with zero when operation is successfully completed, or 1367 non-zero otherwise. This request should be send only when 1368 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1369 negotiated. 1370 1371``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG`` 1372 :id: 2 1373 :equivalent ioctl: N/A 1374 :slave payload: N/A 1375 :master payload: N/A 1376 1377 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1378 slave sends such messages to notify that the virtio device's 1379 configuration space has changed, for those host devices which can 1380 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1381 message to slave to get the latest content. If 1382 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and slave set the 1383 ``VHOST_USER_NEED_REPLY`` flag, master must respond with zero when 1384 operation is successfully completed, or non-zero otherwise. 1385 1386``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG`` 1387 :id: 3 1388 :equivalent ioctl: N/A 1389 :slave payload: vring area description 1390 :master payload: N/A 1391 1392 Sets host notifier for a specified queue. The queue index is 1393 contained in the ``u64`` field of the vring area description. The 1394 host notifier is described by the file descriptor (typically it's a 1395 VFIO device fd) which is passed as ancillary data and the size 1396 (which is mmap size and should be the same as host page size) and 1397 offset (which is mmap offset) carried in the vring area 1398 description. QEMU can mmap the file descriptor based on the size and 1399 offset to get a memory range. Registering a host notifier means 1400 mapping this memory range to the VM as the specified queue's notify 1401 MMIO region. Slave sends this request to tell QEMU to de-register 1402 the existing notifier if any and register the new notifier if the 1403 request is sent with a file descriptor. 1404 1405 This request should be sent only when 1406 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1407 successfully negotiated. 1408 1409``VHOST_USER_SLAVE_VRING_CALL`` 1410 :id: 4 1411 :equivalent ioctl: N/A 1412 :slave payload: vring state description 1413 :master payload: N/A 1414 1415 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1416 feature has been successfully negotiated, this message may be 1417 submitted by the slave to indicate that a buffer was used from 1418 the vring instead of signalling this using the vring's call file 1419 descriptor or having the master relying on polling. 1420 1421 The state.num field is currently reserved and must be set to 0. 1422 1423``VHOST_USER_SLAVE_VRING_ERR`` 1424 :id: 5 1425 :equivalent ioctl: N/A 1426 :slave payload: vring state description 1427 :master payload: N/A 1428 1429 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1430 feature has been successfully negotiated, this message may be 1431 submitted by the slave to indicate that an error occurred on the 1432 specific vring, instead of signalling the error file descriptor 1433 set by the master via ``VHOST_USER_SET_VRING_ERR``. 1434 1435 The state.num field is currently reserved and must be set to 0. 1436 1437.. _reply_ack: 1438 1439VHOST_USER_PROTOCOL_F_REPLY_ACK 1440------------------------------- 1441 1442The original vhost-user specification only demands replies for certain 1443commands. This differs from the vhost protocol implementation where 1444commands are sent over an ``ioctl()`` call and block until the client 1445has completed. 1446 1447With this protocol extension negotiated, the sender (QEMU) can set the 1448``need_reply`` [Bit 3] flag to any command. This indicates that the 1449client MUST respond with a Payload ``VhostUserMsg`` indicating success 1450or failure. The payload should be set to zero on success or non-zero 1451on failure, unless the message already has an explicit reply body. 1452 1453The response payload gives QEMU a deterministic indication of the result 1454of the command. Today, QEMU is expected to terminate the main vhost-user 1455loop upon receiving such errors. In future, qemu could be taught to be more 1456resilient for selective requests. 1457 1458For the message types that already solicit a reply from the client, 1459the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1460being set brings no behavioural change. (See the Communication_ 1461section for details.) 1462 1463.. _backend_conventions: 1464 1465Backend program conventions 1466=========================== 1467 1468vhost-user backends can provide various devices & services and may 1469need to be configured manually depending on the use case. However, it 1470is a good idea to follow the conventions listed here when 1471possible. Users, QEMU or libvirt, can then rely on some common 1472behaviour to avoid heterogeneous configuration and management of the 1473backend programs and facilitate interoperability. 1474 1475Each backend installed on a host system should come with at least one 1476JSON file that conforms to the vhost-user.json schema. Each file 1477informs the management applications about the backend type, and binary 1478location. In addition, it defines rules for management apps for 1479picking the highest priority backend when multiple match the search 1480criteria (see ``@VhostUserBackend`` documentation in the schema file). 1481 1482If the backend is not capable of enabling a requested feature on the 1483host (such as 3D acceleration with virgl), or the initialization 1484failed, the backend should fail to start early and exit with a status 1485!= 0. It may also print a message to stderr for further details. 1486 1487The backend program must not daemonize itself, but it may be 1488daemonized by the management layer. It may also have a restricted 1489access to the system. 1490 1491File descriptors 0, 1 and 2 will exist, and have regular 1492stdin/stdout/stderr usage (they may have been redirected to /dev/null 1493by the management layer, or to a log handler). 1494 1495The backend program must end (as quickly and cleanly as possible) when 1496the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1497the management layer after a few seconds. 1498 1499The following command line options have an expected behaviour. They 1500are mandatory, unless explicitly said differently: 1501 1502--socket-path=PATH 1503 1504 This option specify the location of the vhost-user Unix domain socket. 1505 It is incompatible with --fd. 1506 1507--fd=FDNUM 1508 1509 When this argument is given, the backend program is started with the 1510 vhost-user socket as file descriptor FDNUM. It is incompatible with 1511 --socket-path. 1512 1513--print-capabilities 1514 1515 Output to stdout the backend capabilities in JSON format, and then 1516 exit successfully. Other options and arguments should be ignored, and 1517 the backend program should not perform its normal function. The 1518 capabilities can be reported dynamically depending on the host 1519 capabilities. 1520 1521The JSON output is described in the ``vhost-user.json`` schema, by 1522```@VHostUserBackendCapabilities``. Example: 1523 1524.. code:: json 1525 1526 { 1527 "type": "foo", 1528 "features": [ 1529 "feature-a", 1530 "feature-b" 1531 ] 1532 } 1533 1534vhost-user-input 1535---------------- 1536 1537Command line options: 1538 1539--evdev-path=PATH 1540 1541 Specify the linux input device. 1542 1543 (optional) 1544 1545--no-grab 1546 1547 Do no request exclusive access to the input device. 1548 1549 (optional) 1550 1551vhost-user-gpu 1552-------------- 1553 1554Command line options: 1555 1556--render-node=PATH 1557 1558 Specify the GPU DRM render node. 1559 1560 (optional) 1561 1562--virgl 1563 1564 Enable virgl rendering support. 1565 1566 (optional) 1567 1568vhost-user-blk 1569-------------- 1570 1571Command line options: 1572 1573--blk-file=PATH 1574 1575 Specify block device or file path. 1576 1577 (optional) 1578 1579--read-only 1580 1581 Enable read-only. 1582 1583 (optional) 1584