1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *front-end* and 27*back-end*. The *front-end* is the application that shares its virtqueues, in 28our case QEMU. The *back-end* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *front-end*, and the *back-end* 31is the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device back-end processing read & write to a virtual 34disk. In order to facilitate interoperability between various back-end 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38The *front-end* and *back-end* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Support for platforms other than Linux 42-------------------------------------- 43 44While vhost-user was initially developed targeting Linux, nowadays it 45is supported on any platform that provides the following features: 46 47- A way for requesting shared memory represented by a file descriptor 48 so it can be passed over a UNIX domain socket and then mapped by the 49 other process. 50 51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can 52 exchange messages through it, including ancillary data when needed. 53 54- Either eventfd or pipe/pipe2. On platforms where eventfd is not 55 available, QEMU will automatically fall back to pipe2 or, as a last 56 resort, pipe. Each file descriptor will be used for receiving or 57 sending events by reading or writing (respectively) an 8-byte value 58 to the corresponding it. The 8-value itself has no meaning and 59 should not be interpreted. 60 61Message Specification 62===================== 63 64.. Note:: All numbers are in the machine native byte order. 65 66A vhost-user message consists of 3 header fields and a payload. 67 68+---------+-------+------+---------+ 69| request | flags | size | payload | 70+---------+-------+------+---------+ 71 72Header 73------ 74 75:request: 32-bit type of the request 76 77:flags: 32-bit bit field 78 79- Lower 2 bits are the version (currently 0x01) 80- Bit 2 is the reply flag - needs to be sent on each reply from the back-end 81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 82 details. 83 84:size: 32-bit size of the payload 85 86Payload 87------- 88 89Depending on the request type, **payload** can be: 90 91A single 64-bit integer 92^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-----+ 95| u64 | 96+-----+ 97 98:u64: a 64-bit unsigned integer 99 100A vring state description 101^^^^^^^^^^^^^^^^^^^^^^^^^ 102 103+-------+-----+ 104| index | num | 105+-------+-----+ 106 107:index: a 32-bit index 108 109:num: a 32-bit number 110 111A vring descriptor index for split virtqueues 112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------------+---------------------+ 115| vring index | index in avail ring | 116+-------------+---------------------+ 117 118:vring index: 32-bit index of the respective virtqueue 119 120:index in avail ring: 32-bit value, of which currently only the lower 16 121 bits are used: 122 123 - Bits 0–15: Index of the next *Available Ring* descriptor that the 124 back-end will process. This is a free-running index that is not 125 wrapped by the ring size. 126 - Bits 16–31: Reserved (set to zero) 127 128Vring descriptor indices for packed virtqueues 129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 130 131+-------------+--------------------+ 132| vring index | descriptor indices | 133+-------------+--------------------+ 134 135:vring index: 32-bit index of the respective virtqueue 136 137:descriptor indices: 32-bit value: 138 139 - Bits 0–14: Index of the next *Available Ring* descriptor that the 140 back-end will process. This is a free-running index that is not 141 wrapped by the ring size. 142 - Bit 15: Driver (Available) Ring Wrap Counter 143 - Bits 16–30: Index of the entry in the *Used Ring* where the back-end 144 will place the next descriptor. This is a free-running index that 145 is not wrapped by the ring size. 146 - Bit 31: Device (Used) Ring Wrap Counter 147 148A vring address description 149^^^^^^^^^^^^^^^^^^^^^^^^^^^ 150 151+-------+-------+------+------------+------+-----------+-----+ 152| index | flags | size | descriptor | used | available | log | 153+-------+-------+------+------------+------+-----------+-----+ 154 155:index: a 32-bit vring index 156 157:flags: a 32-bit vring flags 158 159:descriptor: a 64-bit ring address of the vring descriptor table 160 161:used: a 64-bit ring address of the vring used ring 162 163:available: a 64-bit ring address of the vring available ring 164 165:log: a 64-bit guest address for logging 166 167Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 168been negotiated. Otherwise it is a user address. 169 170Memory region description 171^^^^^^^^^^^^^^^^^^^^^^^^^ 172 173+---------------+------+--------------+-------------+ 174| guest address | size | user address | mmap offset | 175+---------------+------+--------------+-------------+ 176 177:guest address: a 64-bit guest address of the region 178 179:size: a 64-bit size 180 181:user address: a 64-bit user address 182 183:mmap offset: 64-bit offset where region starts in the mapped memory 184 185When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been 186successfully negotiated, the memory region description contains two extra 187fields at the end. 188 189+---------------+------+--------------+-------------+----------------+-------+ 190| guest address | size | user address | mmap offset | xen mmap flags | domid | 191+---------------+------+--------------+-------------+----------------+-------+ 192 193:xen mmap flags: 32-bit bit field 194 195- Bit 0 is set for Xen foreign memory mapping. 196- Bit 1 is set for Xen grant memory mapping. 197- Bit 8 is set if the memory region can not be mapped in advance, and memory 198 areas within this region must be mapped / unmapped only when required by the 199 back-end. The back-end shouldn't try to map the entire region at once, as the 200 front-end may not allow it. The back-end should rather map only the required 201 amount of memory at once and unmap it after it is used. 202 203:domid: a 32-bit Xen hypervisor specific domain id. 204 205Single memory region description 206^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 207 208+---------+--------+ 209| padding | region | 210+---------+--------+ 211 212:padding: 64-bit 213 214A region is represented by Memory region description. 215 216Multiple Memory regions description 217^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 218 219+-------------+---------+---------+-----+---------+ 220| num regions | padding | region0 | ... | region7 | 221+-------------+---------+---------+-----+---------+ 222 223:num regions: a 32-bit number of regions 224 225:padding: 32-bit 226 227A region is represented by Memory region description. 228 229Log description 230^^^^^^^^^^^^^^^ 231 232+----------+------------+ 233| log size | log offset | 234+----------+------------+ 235 236:log size: size of area used for logging 237 238:log offset: offset from start of supplied file descriptor where 239 logging starts (i.e. where guest address 0 would be 240 logged) 241 242An IOTLB message 243^^^^^^^^^^^^^^^^ 244 245+------+------+--------------+-------------------+------+ 246| iova | size | user address | permissions flags | type | 247+------+------+--------------+-------------------+------+ 248 249:iova: a 64-bit I/O virtual address programmed by the guest 250 251:size: a 64-bit size 252 253:user address: a 64-bit user address 254 255:permissions flags: an 8-bit value: 256 - 0: No access 257 - 1: Read access 258 - 2: Write access 259 - 3: Read/Write access 260 261:type: an 8-bit IOTLB message type: 262 - 1: IOTLB miss 263 - 2: IOTLB update 264 - 3: IOTLB invalidate 265 - 4: IOTLB access fail 266 267Virtio device config space 268^^^^^^^^^^^^^^^^^^^^^^^^^^ 269 270+--------+------+-------+---------+ 271| offset | size | flags | payload | 272+--------+------+-------+---------+ 273 274:offset: a 32-bit offset of virtio device's configuration space 275 276:size: a 32-bit configuration space access size in bytes 277 278:flags: a 32-bit value: 279 - 0: Vhost front-end messages used for writable fields 280 - 1: Vhost front-end messages used for live migration 281 282:payload: Size bytes array holding the contents of the virtio 283 device's configuration space 284 285Vring area description 286^^^^^^^^^^^^^^^^^^^^^^ 287 288+-----+------+--------+ 289| u64 | size | offset | 290+-----+------+--------+ 291 292:u64: a 64-bit integer contains vring index and flags 293 294:size: a 64-bit size of this area 295 296:offset: a 64-bit offset of this area from the start of the 297 supplied file descriptor 298 299Inflight description 300^^^^^^^^^^^^^^^^^^^^ 301 302+-----------+-------------+------------+------------+ 303| mmap size | mmap offset | num queues | queue size | 304+-----------+-------------+------------+------------+ 305 306:mmap size: a 64-bit size of area to track inflight I/O 307 308:mmap offset: a 64-bit offset of this area from the start 309 of the supplied file descriptor 310 311:num queues: a 16-bit number of virtqueues 312 313:queue size: a 16-bit size of virtqueues 314 315VhostUserShared 316^^^^^^^^^^^^^^^ 317 318+------+ 319| UUID | 320+------+ 321 322:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then 323 two 16-bit values) are stored in big endian. 324 325Device state transfer parameters 326^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 327 328+--------------------+-----------------+ 329| transfer direction | migration phase | 330+--------------------+-----------------+ 331 332:transfer direction: a 32-bit enum, describing the direction in which 333 the state is transferred: 334 335 - 0: Save: Transfer the state from the back-end to the front-end, 336 which happens on the source side of migration 337 - 1: Load: Transfer the state from the front-end to the back-end, 338 which happens on the destination side of migration 339 340:migration phase: a 32-bit enum, describing the state in which the VM 341 guest and devices are: 342 343 - 0: Stopped (in the period after the transfer of memory-mapped 344 regions before switch-over to the destination): The VM guest is 345 stopped, and the vhost-user device is suspended (see 346 :ref:`Suspended device state <suspended_device_state>`). 347 348 In the future, additional phases might be added e.g. to allow 349 iterative migration while the device is running. 350 351C structure 352----------- 353 354In QEMU the vhost-user message is implemented with the following struct: 355 356.. code:: c 357 358 typedef struct VhostUserMsg { 359 VhostUserRequest request; 360 uint32_t flags; 361 uint32_t size; 362 union { 363 uint64_t u64; 364 struct vhost_vring_state state; 365 struct vhost_vring_addr addr; 366 VhostUserMemory memory; 367 VhostUserLog log; 368 struct vhost_iotlb_msg iotlb; 369 VhostUserConfig config; 370 VhostUserVringArea area; 371 VhostUserInflight inflight; 372 }; 373 } QEMU_PACKED VhostUserMsg; 374 375Communication 376============= 377 378The protocol for vhost-user is based on the existing implementation of 379vhost for the Linux Kernel. Most messages that can be sent via the 380Unix domain socket implementing vhost-user have an equivalent ioctl to 381the kernel implementation. 382 383The communication consists of the *front-end* sending message requests and 384the *back-end* sending message replies. Most of the requests don't require 385replies. Here is a list of the ones that do: 386 387* ``VHOST_USER_GET_FEATURES`` 388* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 389* ``VHOST_USER_GET_VRING_BASE`` 390* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 391* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 392 393.. seealso:: 394 395 :ref:`REPLY_ACK <reply_ack>` 396 The section on ``REPLY_ACK`` protocol extension. 397 398There are several messages that the front-end sends with file descriptors passed 399in the ancillary data: 400 401* ``VHOST_USER_ADD_MEM_REG`` 402* ``VHOST_USER_SET_MEM_TABLE`` 403* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 404* ``VHOST_USER_SET_LOG_FD`` 405* ``VHOST_USER_SET_VRING_KICK`` 406* ``VHOST_USER_SET_VRING_CALL`` 407* ``VHOST_USER_SET_VRING_ERR`` 408* ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) 409* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 410* ``VHOST_USER_SET_DEVICE_STATE_FD`` 411 412If *front-end* is unable to send the full message or receives a wrong 413reply it will close the connection. An optional reconnection mechanism 414can be implemented. 415 416If *back-end* detects some error such as incompatible features, it may also 417close the connection. This should only happen in exceptional circumstances. 418 419Any protocol extensions are gated by protocol feature bits, which 420allows full backwards compatibility on both front-end and back-end. As 421older back-ends don't support negotiating protocol features, a feature 422bit was dedicated for this purpose:: 423 424 #define VHOST_USER_F_PROTOCOL_FEATURES 30 425 426Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature 427bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits 428<https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_. 429VIRTIO devices do not advertise this feature bit and therefore VIRTIO 430drivers cannot negotiate it. 431 432This reserved feature bit was reused by the vhost-user protocol to add 433vhost-user protocol feature negotiation in a backwards compatible 434fashion. Old vhost-user front-end and back-end implementations continue to 435work even though they are not aware of vhost-user protocol feature 436negotiation. 437 438Ring states 439----------- 440 441Rings have two independent states: started/stopped, and enabled/disabled. 442 443* While a ring is stopped, the back-end must not process the ring at 444 all, regardless of whether it is enabled or disabled. The 445 enabled/disabled state should still be tracked, though, so it can come 446 into effect once the ring is started. 447 448* started and disabled: The back-end must process the ring without 449 causing any side effects. For example, for a networking device, 450 in the disabled state the back-end must not supply any new RX packets, 451 but must process and discard any TX packets. 452 453* started and enabled: The back-end must process the ring normally, i.e. 454 process all requests and execute them. 455 456Each ring is initialized in a stopped and disabled state. The back-end 457must start a ring upon receiving a kick (that is, detecting that file 458descriptor is readable) on the descriptor specified by 459``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 460``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving 461``VHOST_USER_GET_VRING_BASE``. 462 463Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. 464 465In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from 466the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the 467back-end must enable all rings immediately. 468 469While processing the rings (whether they are enabled or not), the back-end 470must support changing some configuration aspects on the fly. 471 472.. _suspended_device_state: 473 474Suspended device state 475^^^^^^^^^^^^^^^^^^^^^^ 476 477While all vrings are stopped, the device is *suspended*. In addition to 478not processing any vring (because they are stopped), the device must: 479 480* not write to any guest memory regions, 481* not send any notifications to the guest, 482* not send any messages to the front-end, 483* still process and reply to messages from the front-end. 484 485Multiple queue support 486---------------------- 487 488Many devices have a fixed number of virtqueues. In this case the front-end 489already knows the number of available virtqueues without communicating with the 490back-end. 491 492Some devices do not have a fixed number of virtqueues. Instead the maximum 493number of virtqueues is chosen by the back-end. The number can depend on host 494resource availability or back-end implementation details. Such devices are called 495multiple queue devices. 496 497Multiple queue support allows the back-end to advertise the maximum number of 498queues. This is treated as a protocol extension, hence the back-end has to 499implement protocol features first. The multiple queues feature is supported 500only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 501 502The max number of queues the back-end supports can be queried with message 503``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested 504queues is bigger than that. 505 506As all queues share one connection, the front-end uses a unique index for each 507queue in the sent message to identify a specified queue. 508 509The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 510vhost-user-net has historically automatically enabled the first queue pair. 511 512Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 513feature, even for devices with a fixed number of virtqueues, since it is simple 514to implement and offers a degree of introspection. 515 516Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 517devices with a fixed number of virtqueues. Only true multiqueue devices 518require this protocol feature. 519 520Migration 521--------- 522 523During live migration, the front-end may need to track the modifications 524the back-end makes to the memory mapped regions. The front-end should mark 525the dirty pages in a log. Once it complies to this logging, it may 526declare the ``VHOST_F_LOG_ALL`` vhost feature. 527 528To start/stop logging of data/used ring writes, the front-end may send 529messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 530``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 531flags set to 1/0, respectively. 532 533All the modifications to memory pointed by vring "descriptor" should 534be marked. Modifications to "used" vring should be marked if 535``VHOST_VRING_F_LOG`` is part of ring's flags. 536 537Dirty pages are of size:: 538 539 #define VHOST_LOG_PAGE 0x1000 540 541The log memory fd is provided in the ancillary data of 542``VHOST_USER_SET_LOG_BASE`` message when the back-end has 543``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 544 545The size of the log is supplied as part of ``VhostUserMsg`` which 546should be large enough to cover all known guest addresses. Log starts 547at the supplied offset in the supplied file descriptor. The log 548covers from address 0 to the maximum of guest regions. In pseudo-code, 549to mark page at ``addr`` as dirty:: 550 551 page = addr / VHOST_LOG_PAGE 552 log[page / 8] |= 1 << page % 8 553 554Where ``addr`` is the guest physical address. 555 556Use atomic operations, as the log may be concurrently manipulated. 557 558Note that when logging modifications to the used ring (when 559``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 560be used to calculate the log offset: the write to first byte of the 561used ring is logged at this offset from log start. Also note that this 562value might be outside the legal guest physical address range 563(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 564the bit offset of the last byte of the ring must fall within the size 565supplied by ``VhostUserLog``. 566 567``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 568ancillary data, it may be used to inform the front-end that the log has 569been modified. 570 571Once the source has finished migration, rings will be stopped by the 572source (:ref:`Suspended device state <suspended_device_state>`). No 573further update must be done before rings are restarted. 574 575In postcopy migration the back-end is started before all the memory has 576been received from the source host, and care must be taken to avoid 577accessing pages that have yet to be received. The back-end opens a 578'userfault'-fd and registers the memory with it; this fd is then 579passed back over to the front-end. The front-end services requests on the 580userfaultfd for pages that are accessed and when the page is available 581it performs WAKE ioctl's on the userfaultfd to wake the stalled 582back-end. The front-end indicates support for this via the 583``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 584 585.. _migrating_backend_state: 586 587Migrating back-end state 588^^^^^^^^^^^^^^^^^^^^^^^^ 589 590Migrating device state involves transferring the state from one 591back-end, called the source, to another back-end, called the 592destination. After migration, the destination transparently resumes 593operation without requiring the driver to re-initialize the device at 594the VIRTIO level. If the migration fails, then the source can 595transparently resume operation until another migration attempt is made. 596 597Generally, the front-end is connected to a virtual machine guest (which 598contains the driver), which has its own state to transfer between source 599and destination, and therefore will have an implementation-specific 600mechanism to do so. The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature 601provides functionality to have the front-end include the back-end's 602state in this transfer operation so the back-end does not need to 603implement its own mechanism, and so the virtual machine may have its 604complete state, including vhost-user devices' states, contained within a 605single stream of data. 606 607To do this, the back-end state is transferred from back-end to front-end 608on the source side, and vice versa on the destination side. This 609transfer happens over a channel that is negotiated using the 610``VHOST_USER_SET_DEVICE_STATE_FD`` message. This message has two 611parameters: 612 613* Direction of transfer: On the source, the data is saved, transferring 614 it from the back-end to the front-end. On the destination, the data 615 is loaded, transferring it from the front-end to the back-end. 616 617* Migration phase: Currently, the only supported phase is the period 618 after the transfer of memory-mapped regions before switch-over to the 619 destination, when both the source and destination devices are 620 suspended (:ref:`Suspended device state <suspended_device_state>`). 621 In the future, additional phases might be supported to allow iterative 622 migration while the device is running. 623 624The nature of the channel is implementation-defined, but it must 625generally behave like a pipe: The writing end will write all the data it 626has into it, signalling the end of data by closing its end. The reading 627end must read all of this data (until encountering the end of file) and 628process it. 629 630* When saving, the writing end is the source back-end, and the reading 631 end is the source front-end. After reading the state data from the 632 channel, the source front-end must transfer it to the destination 633 front-end through an implementation-defined mechanism. 634 635* When loading, the writing end is the destination front-end, and the 636 reading end is the destination back-end. After reading the state data 637 from the channel, the destination back-end must deserialize its 638 internal state from that data and set itself up to allow the driver to 639 seamlessly resume operation on the VIRTIO level. 640 641Seamlessly resuming operation means that the migration must be 642transparent to the guest driver, which operates on the VIRTIO level. 643This driver will not perform any re-initialization steps, but continue 644to use the device as if no migration had occurred. The vhost-user 645front-end, however, will re-initialize the vhost state on the 646destination, following the usual protocol for establishing a connection 647to a vhost-user back-end: This includes, for example, setting up memory 648mappings and kick and call FDs as necessary, negotiating protocol 649features, or setting the initial vring base indices (to the same value 650as on the source side, so that operation can resume). 651 652Both on the source and on the destination side, after the respective 653front-end has seen all data transferred (when the transfer FD has been 654closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to 655verify that data transfer was successful in the back-end, too. The 656back-end responds once it knows whether the transfer and processing was 657successful or not. 658 659Memory access 660------------- 661 662The front-end sends a list of vhost memory regions to the back-end using the 663``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 664addresses: a guest address and a user address. 665 666Messages contain guest addresses and/or user addresses to reference locations 667within the shared memory. The mapping of these addresses works as follows. 668 669User addresses map to the vhost memory region containing that user address. 670 671When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 672 673* Guest addresses map to the vhost memory region containing that guest 674 address. 675 676When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 677 678* Guest addresses are also called I/O virtual addresses (IOVAs). They are 679 translated to user addresses via the IOTLB. 680 681* The vhost memory region guest address is not used. 682 683IOMMU support 684------------- 685 686When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 687front-end sends IOTLB entries update & invalidation by sending 688``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct 689vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 690has to be filled with the update message type (2), the I/O virtual 691address, the size, the user virtual address, and the permissions 692flags. Addresses and size must be within vhost memory regions set via 693the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 694``iotlb`` payload has to be filled with the invalidation message type 695(3), the I/O virtual address and the size. On success, the back-end is 696expected to reply with a zero payload, non-zero otherwise. 697 698The back-end relies on the back-end communication channel (see :ref:`Back-end 699communication <backend_communication>` section below) to send IOTLB miss 700and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG`` 701requests to the front-end with a ``struct vhost_iotlb_msg`` as 702payload. For miss events, the iotlb payload has to be filled with the 703miss message type (1), the I/O virtual address and the permissions 704flags. For access failure event, the iotlb payload has to be filled 705with the access failure message type (4), the I/O virtual address and 706the permissions flags. For synchronization purpose, the back-end may 707rely on the reply-ack feature, so the front-end may send a reply when 708operation is completed if the reply-ack feature is negotiated and 709back-ends requests a reply. For miss events, completed operation means 710either front-end sent an update message containing the IOTLB entry 711containing requested address and permission, or front-end sent nothing if 712the IOTLB miss message is invalid (invalid IOVA or permission). 713 714The front-end isn't expected to take the initiative to send IOTLB update 715messages, as the back-end sends IOTLB miss messages for the guest virtual 716memory areas it needs to access. 717 718.. _backend_communication: 719 720Back-end communication 721---------------------- 722 723An optional communication channel is provided if the back-end declares 724``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the 725back-end to make requests to the front-end. 726 727The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data. 728 729A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end 730using this fd communication channel. 731 732If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is 733negotiated, back-end can send file descriptors (at most 8 descriptors in 734each message) to front-end via ancillary data using this fd communication 735channel. 736 737Inflight I/O tracking 738--------------------- 739 740To support reconnecting after restart or crash, back-end may need to 741resubmit inflight I/Os. If virtqueue is processed in order, we can 742easily achieve that by getting the inflight descriptors from 743descriptor table (split virtqueue) or descriptor ring (packed 744virtqueue). However, it can't work when we process descriptors 745out-of-order because some entries which store the information of 746inflight descriptors in available ring (split virtqueue) or descriptor 747ring (packed virtqueue) might be overridden by new entries. To solve 748this problem, the back-end need to allocate an extra buffer to store this 749information of inflight descriptors and share it with front-end for 750persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 751``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 752between front-end and back-end. And the format of this buffer is described 753below: 754 755+---------------+---------------+-----+---------------+ 756| queue0 region | queue1 region | ... | queueN region | 757+---------------+---------------+-----+---------------+ 758 759N is the number of available virtqueues. The back-end could get it from num 760queues field of ``VhostUserInflight``. 761 762For split virtqueue, queue region can be implemented as: 763 764.. code:: c 765 766 typedef struct DescStateSplit { 767 /* Indicate whether this descriptor is inflight or not. 768 * Only available for head-descriptor. */ 769 uint8_t inflight; 770 771 /* Padding */ 772 uint8_t padding[5]; 773 774 /* Maintain a list for the last batch of used descriptors. 775 * Only available when batching is used for submitting */ 776 uint16_t next; 777 778 /* Used to preserve the order of fetching available descriptors. 779 * Only available for head-descriptor. */ 780 uint64_t counter; 781 } DescStateSplit; 782 783 typedef struct QueueRegionSplit { 784 /* The feature flags of this region. Now it's initialized to 0. */ 785 uint64_t features; 786 787 /* The version of this region. It's 1 currently. 788 * Zero value indicates an uninitialized buffer */ 789 uint16_t version; 790 791 /* The size of DescStateSplit array. It's equal to the virtqueue size. 792 * The back-end could get it from queue size field of VhostUserInflight. */ 793 uint16_t desc_num; 794 795 /* The head of list that track the last batch of used descriptors. */ 796 uint16_t last_batch_head; 797 798 /* Store the idx value of used ring */ 799 uint16_t used_idx; 800 801 /* Used to track the state of each descriptor in descriptor table */ 802 DescStateSplit desc[]; 803 } QueueRegionSplit; 804 805To track inflight I/O, the queue region should be processed as follows: 806 807When receiving available buffers from the driver: 808 809#. Get the next available head-descriptor index from available ring, ``i`` 810 811#. Set ``desc[i].counter`` to the value of global counter 812 813#. Increase global counter by 1 814 815#. Set ``desc[i].inflight`` to 1 816 817When supplying used buffers to the driver: 818 8191. Get corresponding used head-descriptor index, i 820 8212. Set ``desc[i].next`` to ``last_batch_head`` 822 8233. Set ``last_batch_head`` to ``i`` 824 825#. Steps 1,2,3 may be performed repeatedly if batching is possible 826 827#. Increase the ``idx`` value of used ring by the size of the batch 828 829#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 830 831#. Set ``used_idx`` to the ``idx`` value of used ring 832 833When reconnecting: 834 835#. If the value of ``used_idx`` does not match the ``idx`` value of 836 used ring (means the inflight field of ``DescStateSplit`` entries in 837 last batch may be incorrect), 838 839 a. Subtract the value of ``used_idx`` from the ``idx`` value of 840 used ring to get last batch size of ``DescStateSplit`` entries 841 842 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 843 list which starts from ``last_batch_head`` 844 845 #. Set ``used_idx`` to the ``idx`` value of used ring 846 847#. Resubmit inflight ``DescStateSplit`` entries in order of their 848 counter value 849 850For packed virtqueue, queue region can be implemented as: 851 852.. code:: c 853 854 typedef struct DescStatePacked { 855 /* Indicate whether this descriptor is inflight or not. 856 * Only available for head-descriptor. */ 857 uint8_t inflight; 858 859 /* Padding */ 860 uint8_t padding; 861 862 /* Link to the next free entry */ 863 uint16_t next; 864 865 /* Link to the last entry of descriptor list. 866 * Only available for head-descriptor. */ 867 uint16_t last; 868 869 /* The length of descriptor list. 870 * Only available for head-descriptor. */ 871 uint16_t num; 872 873 /* Used to preserve the order of fetching available descriptors. 874 * Only available for head-descriptor. */ 875 uint64_t counter; 876 877 /* The buffer id */ 878 uint16_t id; 879 880 /* The descriptor flags */ 881 uint16_t flags; 882 883 /* The buffer length */ 884 uint32_t len; 885 886 /* The buffer address */ 887 uint64_t addr; 888 } DescStatePacked; 889 890 typedef struct QueueRegionPacked { 891 /* The feature flags of this region. Now it's initialized to 0. */ 892 uint64_t features; 893 894 /* The version of this region. It's 1 currently. 895 * Zero value indicates an uninitialized buffer */ 896 uint16_t version; 897 898 /* The size of DescStatePacked array. It's equal to the virtqueue size. 899 * The back-end could get it from queue size field of VhostUserInflight. */ 900 uint16_t desc_num; 901 902 /* The head of free DescStatePacked entry list */ 903 uint16_t free_head; 904 905 /* The old head of free DescStatePacked entry list */ 906 uint16_t old_free_head; 907 908 /* The used index of descriptor ring */ 909 uint16_t used_idx; 910 911 /* The old used index of descriptor ring */ 912 uint16_t old_used_idx; 913 914 /* Device ring wrap counter */ 915 uint8_t used_wrap_counter; 916 917 /* The old device ring wrap counter */ 918 uint8_t old_used_wrap_counter; 919 920 /* Padding */ 921 uint8_t padding[7]; 922 923 /* Used to track the state of each descriptor fetched from descriptor ring */ 924 DescStatePacked desc[]; 925 } QueueRegionPacked; 926 927To track inflight I/O, the queue region should be processed as follows: 928 929When receiving available buffers from the driver: 930 931#. Get the next available descriptor entry from descriptor ring, ``d`` 932 933#. If ``d`` is head descriptor, 934 935 a. Set ``desc[old_free_head].num`` to 0 936 937 #. Set ``desc[old_free_head].counter`` to the value of global counter 938 939 #. Increase global counter by 1 940 941 #. Set ``desc[old_free_head].inflight`` to 1 942 943#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 944 ``free_head`` 945 946#. Increase ``desc[old_free_head].num`` by 1 947 948#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 949 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 950 ``d.len``, ``d.flags``, ``d.id`` 951 952#. Set ``free_head`` to ``desc[free_head].next`` 953 954#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 955 956When supplying used buffers to the driver: 957 9581. Get corresponding used head-descriptor entry from descriptor ring, 959 ``d`` 960 9612. Get corresponding ``DescStatePacked`` entry, ``e`` 962 9633. Set ``desc[e.last].next`` to ``free_head`` 964 9654. Set ``free_head`` to the index of ``e`` 966 967#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 968 969#. Increase ``used_idx`` by the size of the batch and update 970 ``used_wrap_counter`` if needed 971 972#. Update ``d.flags`` 973 974#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 975 in the batch to 0 976 977#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 978 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 979 980When reconnecting: 981 982#. If ``used_idx`` does not match ``old_used_idx`` (means the 983 ``inflight`` field of ``DescStatePacked`` entries in last batch may 984 be incorrect), 985 986 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 987 988 #. Use ``old_used_wrap_counter`` to calculate the available flags 989 990 #. If ``d.flags`` is not equal to the calculated flags value (means 991 back-end has submitted the buffer to guest driver before crash, so 992 it has to commit the in-progres update), set ``old_free_head``, 993 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 994 ``used_idx``, ``used_wrap_counter`` 995 996#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 997 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 998 (roll back any in-progress update) 999 1000#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 1001 free list to 0 1002 1003#. Resubmit inflight ``DescStatePacked`` entries in order of their 1004 counter value 1005 1006In-band notifications 1007--------------------- 1008 1009In some limited situations (e.g. for simulation) it is desirable to 1010have the kick, call and error (if used) signals done via in-band 1011messages instead of asynchronous eventfd notifications. This can be 1012done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 1013protocol feature. 1014 1015Note that due to the fact that too many messages on the sockets can 1016cause the sending application(s) to block, it is not advised to use 1017this feature unless absolutely necessary. It is also considered an 1018error to negotiate this feature without also negotiating 1019``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 1020the former is necessary for getting a message channel from the back-end 1021to the front-end, while the latter needs to be used with the in-band 1022notification messages to block until they are processed, both to avoid 1023blocking later and for proper processing (at least in the simulation 1024use case.) As it has no other way of signalling this error, the back-end 1025should close the connection as a response to a 1026``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 1027notifications feature flag without the other two. 1028 1029Protocol features 1030----------------- 1031 1032.. code:: c 1033 1034 #define VHOST_USER_PROTOCOL_F_MQ 0 1035 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 1036 #define VHOST_USER_PROTOCOL_F_RARP 2 1037 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 1038 #define VHOST_USER_PROTOCOL_F_MTU 4 1039 #define VHOST_USER_PROTOCOL_F_BACKEND_REQ 5 1040 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 1041 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 1042 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 1043 #define VHOST_USER_PROTOCOL_F_CONFIG 9 1044 #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD 10 1045 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 1046 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 1047 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 1048 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 1049 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 1050 #define VHOST_USER_PROTOCOL_F_STATUS 16 1051 #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 1052 #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18 1053 #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19 1054 1055Front-end message types 1056----------------------- 1057 1058``VHOST_USER_GET_FEATURES`` 1059 :id: 1 1060 :equivalent ioctl: ``VHOST_GET_FEATURES`` 1061 :request payload: N/A 1062 :reply payload: ``u64`` 1063 1064 Get from the underlying vhost implementation the features bitmask. 1065 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support 1066 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 1067 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 1068 1069``VHOST_USER_SET_FEATURES`` 1070 :id: 2 1071 :equivalent ioctl: ``VHOST_SET_FEATURES`` 1072 :request payload: ``u64`` 1073 :reply payload: N/A 1074 1075 Enable features in the underlying vhost implementation using a 1076 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 1077 back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 1078 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 1079 1080``VHOST_USER_GET_PROTOCOL_FEATURES`` 1081 :id: 15 1082 :equivalent ioctl: ``VHOST_GET_FEATURES`` 1083 :request payload: N/A 1084 :reply payload: ``u64`` 1085 1086 Get the protocol feature bitmask from the underlying vhost 1087 implementation. Only legal if feature bit 1088 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 1089 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 1090 ``VHOST_USER_SET_FEATURES``. 1091 1092.. Note:: 1093 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must 1094 support this message even before ``VHOST_USER_SET_FEATURES`` was 1095 called. 1096 1097``VHOST_USER_SET_PROTOCOL_FEATURES`` 1098 :id: 16 1099 :equivalent ioctl: ``VHOST_SET_FEATURES`` 1100 :request payload: ``u64`` 1101 :reply payload: N/A 1102 1103 Enable protocol features in the underlying vhost implementation. 1104 1105 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 1106 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 1107 ``VHOST_USER_SET_FEATURES``. 1108 1109.. Note:: 1110 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 1111 this message even before ``VHOST_USER_SET_FEATURES`` was called. 1112 1113``VHOST_USER_SET_OWNER`` 1114 :id: 3 1115 :equivalent ioctl: ``VHOST_SET_OWNER`` 1116 :request payload: N/A 1117 :reply payload: N/A 1118 1119 Issued when a new connection is established. It marks the sender 1120 as the front-end that owns of the session. This can be used on the *back-end* 1121 as a "session start" flag. 1122 1123``VHOST_USER_RESET_OWNER`` 1124 :id: 4 1125 :request payload: N/A 1126 :reply payload: N/A 1127 1128.. admonition:: Deprecated 1129 1130 This is no longer used. Used to be sent to request disabling all 1131 rings, but some back-ends interpreted it to also discard connection 1132 state (this interpretation would lead to bugs). It is recommended 1133 that back-ends either ignore this message, or use it to disable all 1134 rings. 1135 1136``VHOST_USER_SET_MEM_TABLE`` 1137 :id: 5 1138 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 1139 :request payload: multiple memory regions description 1140 :reply payload: (postcopy only) multiple memory regions description 1141 1142 Sets the memory map regions on the back-end so it can translate the 1143 vring addresses. In the ancillary data there is an array of file 1144 descriptors for each memory mapped region. The size and ordering of 1145 the fds matches the number and ordering of memory regions. 1146 1147 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 1148 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 1149 regions to the front-end. The back-end must have mmap'd the regions but 1150 not yet accessed them and should not yet generate a userfault 1151 event. 1152 1153.. Note:: 1154 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 1155 reply back to the list of mappings with an empty 1156 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 1157 reception of this message may the guest start accessing the memory 1158 and generating faults. 1159 1160``VHOST_USER_SET_LOG_BASE`` 1161 :id: 6 1162 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 1163 :request payload: u64 1164 :reply payload: N/A 1165 1166 Sets logging shared memory space. 1167 1168 When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 1169 the log memory fd is provided in the ancillary data of 1170 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 1171 memory area provided in the message. 1172 1173``VHOST_USER_SET_LOG_FD`` 1174 :id: 7 1175 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 1176 :request payload: N/A 1177 :reply payload: N/A 1178 1179 Sets the logging file descriptor, which is passed as ancillary data. 1180 1181``VHOST_USER_SET_VRING_NUM`` 1182 :id: 8 1183 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 1184 :request payload: vring state description 1185 :reply payload: N/A 1186 1187 Set the size of the queue. 1188 1189``VHOST_USER_SET_VRING_ADDR`` 1190 :id: 9 1191 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 1192 :request payload: vring address description 1193 :reply payload: N/A 1194 1195 Sets the addresses of the different aspects of the vring. 1196 1197``VHOST_USER_SET_VRING_BASE`` 1198 :id: 10 1199 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 1200 :request payload: vring descriptor index/indices 1201 :reply payload: N/A 1202 1203 Sets the next index to use for descriptors in this vring: 1204 1205 * For a split virtqueue, sets only the next descriptor index to 1206 process in the *Available Ring*. The device is supposed to read the 1207 next index in the *Used Ring* from the respective vring structure in 1208 guest memory. 1209 1210 * For a packed virtqueue, both indices are supplied, as they are not 1211 explicitly available in memory. 1212 1213 Consequently, the payload type is specific to the type of virt queue 1214 (*a vring descriptor index for split virtqueues* vs. *vring descriptor 1215 indices for packed virtqueues*). 1216 1217``VHOST_USER_GET_VRING_BASE`` 1218 :id: 11 1219 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 1220 :request payload: vring state description 1221 :reply payload: vring descriptor index/indices 1222 1223 Stops the vring and returns the current descriptor index or indices: 1224 1225 * For a split virtqueue, returns only the 16-bit next descriptor 1226 index to process in the *Available Ring*. Note that this may 1227 differ from the available ring index in the vring structure in 1228 memory, which points to where the driver will put new available 1229 descriptors. For the *Used Ring*, the device only needs the next 1230 descriptor index at which to put new descriptors, which is the 1231 value in the vring structure in memory, so this value is not 1232 covered by this message. 1233 1234 * For a packed virtqueue, neither index is explicitly available to 1235 read from memory, so both indices (as maintained by the device) are 1236 returned. 1237 1238 Consequently, the payload type is specific to the type of virt queue 1239 (*a vring descriptor index for split virtqueues* vs. *vring descriptor 1240 indices for packed virtqueues*). 1241 1242 When and as long as all of a device’s vrings are stopped, it is 1243 *suspended*, see :ref:`Suspended device state 1244 <suspended_device_state>`. 1245 1246 The request payload’s *num* field is currently reserved and must be 1247 set to 0. 1248 1249``VHOST_USER_SET_VRING_KICK`` 1250 :id: 12 1251 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 1252 :request payload: ``u64`` 1253 :reply payload: N/A 1254 1255 Set the event file descriptor for adding buffers to the vring. It is 1256 passed in the ancillary data. 1257 1258 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1259 invalid FD flag. This flag is set when there is no file descriptor 1260 in the ancillary data. This signals that polling should be used 1261 instead of waiting for the kick. Note that if the protocol feature 1262 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1263 this message isn't necessary as the ring is also started on the 1264 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1265 set an event file descriptor (which will be preferred over the 1266 message) or to enable polling. 1267 1268``VHOST_USER_SET_VRING_CALL`` 1269 :id: 13 1270 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1271 :request payload: ``u64`` 1272 :reply payload: N/A 1273 1274 Set the event file descriptor to signal when buffers are used. It is 1275 passed in the ancillary data. 1276 1277 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1278 invalid FD flag. This flag is set when there is no file descriptor 1279 in the ancillary data. This signals that polling will be used 1280 instead of waiting for the call. Note that if the protocol features 1281 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1282 ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message 1283 isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be 1284 used, it may however still be used to set an event file descriptor 1285 or to enable polling. 1286 1287``VHOST_USER_SET_VRING_ERR`` 1288 :id: 14 1289 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1290 :request payload: ``u64`` 1291 :reply payload: N/A 1292 1293 Set the event file descriptor to signal when error occurs. It is 1294 passed in the ancillary data. 1295 1296 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1297 invalid FD flag. This flag is set when there is no file descriptor 1298 in the ancillary data. Note that if the protocol features 1299 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1300 ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message 1301 isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be 1302 used, it may however still be used to set an event file descriptor 1303 (which will be preferred over the message). 1304 1305``VHOST_USER_GET_QUEUE_NUM`` 1306 :id: 17 1307 :equivalent ioctl: N/A 1308 :request payload: N/A 1309 :reply payload: u64 1310 1311 Query how many queues the back-end supports. 1312 1313 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1314 is set in queried protocol features by 1315 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1316 1317``VHOST_USER_SET_VRING_ENABLE`` 1318 :id: 18 1319 :equivalent ioctl: N/A 1320 :request payload: vring state description 1321 :reply payload: N/A 1322 1323 Signal the back-end to enable or disable corresponding vring. 1324 1325 This request should be sent only when 1326 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1327 1328``VHOST_USER_SEND_RARP`` 1329 :id: 19 1330 :equivalent ioctl: N/A 1331 :request payload: ``u64`` 1332 :reply payload: N/A 1333 1334 Ask vhost user back-end to broadcast a fake RARP to notify the migration 1335 is terminated for guest that does not support GUEST_ANNOUNCE. 1336 1337 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1338 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1339 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1340 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1341 payload contain the mac address of the guest to allow the vhost user 1342 back-end to construct and broadcast the fake RARP. 1343 1344``VHOST_USER_NET_SET_MTU`` 1345 :id: 20 1346 :equivalent ioctl: N/A 1347 :request payload: ``u64`` 1348 :reply payload: N/A 1349 1350 Set host MTU value exposed to the guest. 1351 1352 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1353 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1354 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1355 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1356 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1357 1358 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1359 respond with zero in case the specified MTU is valid, or non-zero 1360 otherwise. 1361 1362``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) 1363 :id: 21 1364 :equivalent ioctl: N/A 1365 :request payload: N/A 1366 :reply payload: N/A 1367 1368 Set the socket file descriptor for back-end initiated requests. It is passed 1369 in the ancillary data. 1370 1371 This request should be sent only when 1372 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1373 feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in 1374 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1375 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1376 respond with zero for success, non-zero otherwise. 1377 1378``VHOST_USER_IOTLB_MSG`` 1379 :id: 22 1380 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1381 :request payload: ``struct vhost_iotlb_msg`` 1382 :reply payload: ``u64`` 1383 1384 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1385 1386 The front-end sends such requests to update and invalidate entries in the 1387 device IOTLB. The back-end has to acknowledge the request with sending 1388 zero as ``u64`` payload for success, non-zero otherwise. 1389 1390 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1391 feature has been successfully negotiated. 1392 1393``VHOST_USER_SET_VRING_ENDIAN`` 1394 :id: 23 1395 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1396 :request payload: vring state description 1397 :reply payload: N/A 1398 1399 Set the endianness of a VQ for legacy devices. Little-endian is 1400 indicated with state.num set to 0 and big-endian is indicated with 1401 state.num set to 1. Other values are invalid. 1402 1403 This request should be sent only when 1404 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1405 Backends that negotiated this feature should handle both 1406 endiannesses and expect this message once (per VQ) during device 1407 configuration (ie. before the front-end starts the VQ). 1408 1409``VHOST_USER_GET_CONFIG`` 1410 :id: 24 1411 :equivalent ioctl: N/A 1412 :request payload: virtio device config space 1413 :reply payload: virtio device config space 1414 1415 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1416 submitted by the vhost-user front-end to fetch the contents of the 1417 virtio device configuration space, vhost-user back-end's payload size 1418 MUST match the front-end's request, vhost-user back-end uses zero length of 1419 payload to indicate an error to the vhost-user front-end. The vhost-user 1420 front-end may cache the contents to avoid repeated 1421 ``VHOST_USER_GET_CONFIG`` calls. 1422 1423``VHOST_USER_SET_CONFIG`` 1424 :id: 25 1425 :equivalent ioctl: N/A 1426 :request payload: virtio device config space 1427 :reply payload: N/A 1428 1429 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1430 submitted by the vhost-user front-end when the Guest changes the virtio 1431 device configuration space and also can be used for live migration 1432 on the destination host. The vhost-user back-end must check the flags 1433 field, and back-ends MUST NOT accept SET_CONFIG for read-only 1434 configuration space fields unless the live migration bit is set. 1435 1436``VHOST_USER_CREATE_CRYPTO_SESSION`` 1437 :id: 26 1438 :equivalent ioctl: N/A 1439 :request payload: crypto session description 1440 :reply payload: crypto session description 1441 1442 Create a session for crypto operation. The back-end must return 1443 the session id, 0 or positive for success, negative for failure. 1444 This request should be sent only when 1445 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1446 successfully negotiated. It's a required feature for crypto 1447 devices. 1448 1449``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1450 :id: 27 1451 :equivalent ioctl: N/A 1452 :request payload: ``u64`` 1453 :reply payload: N/A 1454 1455 Close a session for crypto operation which was previously 1456 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1457 1458 This request should be sent only when 1459 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1460 successfully negotiated. It's a required feature for crypto 1461 devices. 1462 1463``VHOST_USER_POSTCOPY_ADVISE`` 1464 :id: 28 1465 :request payload: N/A 1466 :reply payload: userfault fd 1467 1468 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end 1469 advises back-end that a migration with postcopy enabled is underway, 1470 the back-end must open a userfaultfd for later use. Note that at this 1471 stage the migration is still in precopy mode. 1472 1473``VHOST_USER_POSTCOPY_LISTEN`` 1474 :id: 29 1475 :request payload: N/A 1476 :reply payload: N/A 1477 1478 The front-end advises back-end that a transition to postcopy mode has 1479 happened. The back-end must ensure that shared memory is registered 1480 with userfaultfd to cause faulting of non-present pages. 1481 1482 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1483 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1484 1485``VHOST_USER_POSTCOPY_END`` 1486 :id: 30 1487 :request payload: N/A 1488 :reply payload: ``u64`` 1489 1490 The front-end advises that postcopy migration has now completed. The back-end 1491 must disable the userfaultfd. The reply is an acknowledgement 1492 only. 1493 1494 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1495 is sent at the end of the migration, after 1496 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1497 1498 The value returned is an error indication; 0 is success. 1499 1500``VHOST_USER_GET_INFLIGHT_FD`` 1501 :id: 31 1502 :equivalent ioctl: N/A 1503 :request payload: inflight description 1504 :reply payload: N/A 1505 1506 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1507 been successfully negotiated, this message is submitted by the front-end to 1508 get a shared buffer from back-end. The shared buffer will be used to 1509 track inflight I/O by back-end. QEMU should retrieve a new one when vm 1510 reset. 1511 1512``VHOST_USER_SET_INFLIGHT_FD`` 1513 :id: 32 1514 :equivalent ioctl: N/A 1515 :request payload: inflight description 1516 :reply payload: N/A 1517 1518 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1519 been successfully negotiated, this message is submitted by the front-end to 1520 send the shared inflight buffer back to the back-end so that the back-end 1521 could get inflight I/O after a crash or restart. 1522 1523``VHOST_USER_GPU_SET_SOCKET`` 1524 :id: 33 1525 :equivalent ioctl: N/A 1526 :request payload: N/A 1527 :reply payload: N/A 1528 1529 Sets the GPU protocol socket file descriptor, which is passed as 1530 ancillary data. The GPU protocol is used to inform the front-end of 1531 rendering state and updates. See vhost-user-gpu.rst for details. 1532 1533``VHOST_USER_RESET_DEVICE`` 1534 :id: 34 1535 :equivalent ioctl: N/A 1536 :request payload: N/A 1537 :reply payload: N/A 1538 1539 Ask the vhost user back-end to disable all rings and reset all 1540 internal device state to the initial state, ready to be 1541 reinitialized. The back-end retains ownership of the device 1542 throughout the reset operation. 1543 1544 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1545 feature is set by the back-end. 1546 1547``VHOST_USER_VRING_KICK`` 1548 :id: 35 1549 :equivalent ioctl: N/A 1550 :request payload: vring state description 1551 :reply payload: N/A 1552 1553 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1554 feature has been successfully negotiated, this message may be 1555 submitted by the front-end to indicate that a buffer was added to 1556 the vring instead of signalling it using the vring's kick file 1557 descriptor or having the back-end rely on polling. 1558 1559 The state.num field is currently reserved and must be set to 0. 1560 1561``VHOST_USER_GET_MAX_MEM_SLOTS`` 1562 :id: 36 1563 :equivalent ioctl: N/A 1564 :request payload: N/A 1565 :reply payload: u64 1566 1567 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1568 feature has been successfully negotiated, this message is submitted 1569 by the front-end to the back-end. The back-end should return the message with a 1570 u64 payload containing the maximum number of memory slots for 1571 QEMU to expose to the guest. The value returned by the back-end 1572 will be capped at the maximum number of ram slots which can be 1573 supported by the target platform. 1574 1575``VHOST_USER_ADD_MEM_REG`` 1576 :id: 37 1577 :equivalent ioctl: N/A 1578 :request payload: N/A 1579 :reply payload: single memory region description 1580 1581 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1582 feature has been successfully negotiated, this message is submitted 1583 by the front-end to the back-end. The message payload contains a memory 1584 region descriptor struct, describing a region of guest memory which 1585 the back-end device must map in. When the 1586 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1587 been successfully negotiated, along with the 1588 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1589 update the memory tables of the back-end device. 1590 1591 Exactly one file descriptor from which the memory is mapped is 1592 passed in the ancillary data. 1593 1594 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end 1595 replies with the bases of the memory mapped region to the front-end. 1596 For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. 1597 They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. 1598 1599``VHOST_USER_REM_MEM_REG`` 1600 :id: 38 1601 :equivalent ioctl: N/A 1602 :request payload: N/A 1603 :reply payload: single memory region description 1604 1605 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1606 feature has been successfully negotiated, this message is submitted 1607 by the front-end to the back-end. The message payload contains a memory 1608 region descriptor struct, describing a region of guest memory which 1609 the back-end device must unmap. When the 1610 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1611 been successfully negotiated, along with the 1612 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1613 update the memory tables of the back-end device. 1614 1615 The memory region to be removed is identified by its guest address, 1616 user address and size. The mmap offset is ignored. 1617 1618 No file descriptors SHOULD be passed in the ancillary data. For 1619 compatibility with existing incorrect implementations, the back-end MAY 1620 accept messages with one file descriptor. If a file descriptor is 1621 passed, the back-end MUST close it without using it otherwise. 1622 1623``VHOST_USER_SET_STATUS`` 1624 :id: 39 1625 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1626 :request payload: ``u64`` 1627 :reply payload: N/A 1628 1629 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1630 successfully negotiated, this message is submitted by the front-end to 1631 notify the back-end with updated device status as defined in the Virtio 1632 specification. 1633 1634``VHOST_USER_GET_STATUS`` 1635 :id: 40 1636 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1637 :request payload: N/A 1638 :reply payload: ``u64`` 1639 1640 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1641 successfully negotiated, this message is submitted by the front-end to 1642 query the back-end for its device status as defined in the Virtio 1643 specification. 1644 1645``VHOST_USER_GET_SHARED_OBJECT`` 1646 :id: 41 1647 :equivalent ioctl: N/A 1648 :request payload: ``struct VhostUserShared`` 1649 :reply payload: dmabuf fd 1650 1651 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1652 feature has been successfully negotiated, and the UUID is found 1653 in the exporters cache, this message is submitted by the front-end 1654 to retrieve a given dma-buf fd from a given back-end, determined by 1655 the requested UUID. Back-end will reply passing the fd when the operation 1656 is successful, or no fd otherwise. 1657 1658``VHOST_USER_SET_DEVICE_STATE_FD`` 1659 :id: 42 1660 :equivalent ioctl: N/A 1661 :request payload: device state transfer parameters 1662 :reply payload: ``u64`` 1663 1664 Front-end and back-end negotiate a channel over which to transfer the 1665 back-end’s internal state during migration. Either side (front-end or 1666 back-end) may create the channel. The nature of this channel is not 1667 restricted or defined in this document, but whichever side creates it 1668 must create a file descriptor that is provided to the respectively 1669 other side, allowing access to the channel. This FD must behave as 1670 follows: 1671 1672 * For the writing end, it must allow writing the whole back-end state 1673 sequentially. Closing the file descriptor signals the end of 1674 transfer. 1675 1676 * For the reading end, it must allow reading the whole back-end state 1677 sequentially. The end of file signals the end of the transfer. 1678 1679 For example, the channel may be a pipe, in which case the two ends of 1680 the pipe fulfill these requirements respectively. 1681 1682 Initially, the front-end creates a channel along with such an FD. It 1683 passes the FD to the back-end as ancillary data of a 1684 ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a 1685 different transfer channel, passing the respective FD back to the 1686 front-end as ancillary data of the reply. If so, the front-end must 1687 then discard its channel and use the one provided by the back-end. 1688 1689 Whether the back-end should decide to use its own channel is decided 1690 based on efficiency: If the channel is a pipe, both ends will most 1691 likely need to copy data into and out of it. Any channel that allows 1692 for more efficient processing on at least one end, e.g. through 1693 zero-copy, is considered more efficient and thus preferred. If the 1694 back-end can provide such a channel, it should decide to use it. 1695 1696 The request payload contains parameters for the subsequent data 1697 transfer, as described in the :ref:`Migrating back-end state 1698 <migrating_backend_state>` section. 1699 1700 The value returned is both an indication for success, and whether a 1701 file descriptor for a back-end-provided channel is returned: Bits 0–7 1702 are 0 on success, and non-zero on error. Bit 8 is the invalid FD 1703 flag; this flag is set when there is no file descriptor returned. 1704 When this flag is not set, the front-end must use the returned file 1705 descriptor as its end of the transfer channel. The back-end must not 1706 both indicate an error and return a file descriptor. 1707 1708 Using this function requires prior negotiation of the 1709 ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. 1710 1711``VHOST_USER_CHECK_DEVICE_STATE`` 1712 :id: 43 1713 :equivalent ioctl: N/A 1714 :request payload: N/A 1715 :reply payload: ``u64`` 1716 1717 After transferring the back-end’s internal state during migration (see 1718 the :ref:`Migrating back-end state <migrating_backend_state>` 1719 section), check whether the back-end was able to successfully fully 1720 process the state. 1721 1722 The value returned indicates success or error; 0 is success, any 1723 non-zero value is an error. 1724 1725 Using this function requires prior negotiation of the 1726 ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. 1727 1728Back-end message types 1729---------------------- 1730 1731For this type of message, the request is sent by the back-end and the reply 1732is sent by the front-end. 1733 1734``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``) 1735 :id: 1 1736 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1737 :request payload: ``struct vhost_iotlb_msg`` 1738 :reply payload: N/A 1739 1740 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1741 The back-end sends such requests to notify of an IOTLB miss, or an IOTLB 1742 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1743 negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end 1744 must respond with zero when operation is successfully completed, or 1745 non-zero otherwise. This request should be send only when 1746 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1747 negotiated. 1748 1749``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``) 1750 :id: 2 1751 :equivalent ioctl: N/A 1752 :request payload: N/A 1753 :reply payload: N/A 1754 1755 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1756 back-end sends such messages to notify that the virtio device's 1757 configuration space has changed, for those host devices which can 1758 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1759 message to the back-end to get the latest content. If 1760 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the 1761 ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when 1762 operation is successfully completed, or non-zero otherwise. 1763 1764``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``) 1765 :id: 3 1766 :equivalent ioctl: N/A 1767 :request payload: vring area description 1768 :reply payload: N/A 1769 1770 Sets host notifier for a specified queue. The queue index is 1771 contained in the ``u64`` field of the vring area description. The 1772 host notifier is described by the file descriptor (typically it's a 1773 VFIO device fd) which is passed as ancillary data and the size 1774 (which is mmap size and should be the same as host page size) and 1775 offset (which is mmap offset) carried in the vring area 1776 description. QEMU can mmap the file descriptor based on the size and 1777 offset to get a memory range. Registering a host notifier means 1778 mapping this memory range to the VM as the specified queue's notify 1779 MMIO region. The back-end sends this request to tell QEMU to de-register 1780 the existing notifier if any and register the new notifier if the 1781 request is sent with a file descriptor. 1782 1783 This request should be sent only when 1784 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1785 successfully negotiated. 1786 1787``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``) 1788 :id: 4 1789 :equivalent ioctl: N/A 1790 :request payload: vring state description 1791 :reply payload: N/A 1792 1793 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1794 feature has been successfully negotiated, this message may be 1795 submitted by the back-end to indicate that a buffer was used from 1796 the vring instead of signalling this using the vring's call file 1797 descriptor or having the front-end relying on polling. 1798 1799 The state.num field is currently reserved and must be set to 0. 1800 1801``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``) 1802 :id: 5 1803 :equivalent ioctl: N/A 1804 :request payload: vring state description 1805 :reply payload: N/A 1806 1807 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1808 feature has been successfully negotiated, this message may be 1809 submitted by the back-end to indicate that an error occurred on the 1810 specific vring, instead of signalling the error file descriptor 1811 set by the front-end via ``VHOST_USER_SET_VRING_ERR``. 1812 1813 The state.num field is currently reserved and must be set to 0. 1814 1815``VHOST_USER_BACKEND_SHARED_OBJECT_ADD`` 1816 :id: 6 1817 :equivalent ioctl: N/A 1818 :request payload: ``struct VhostUserShared`` 1819 :reply payload: N/A 1820 1821 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1822 feature has been successfully negotiated, this message can be submitted 1823 by the backends to add themselves as exporters to the virtio shared lookup 1824 table. The back-end device gets associated with a UUID in the shared table. 1825 The back-end is responsible of keeping its own table with exported dma-buf fds. 1826 When another back-end tries to import the resource associated with the UUID, 1827 it will send a message to the front-end, which will act as a proxy to the 1828 exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and 1829 the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must 1830 respond with zero when operation is successfully completed, or non-zero 1831 otherwise. 1832 1833``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE`` 1834 :id: 7 1835 :equivalent ioctl: N/A 1836 :request payload: ``struct VhostUserShared`` 1837 :reply payload: N/A 1838 1839 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1840 feature has been successfully negotiated, this message can be submitted 1841 by the backend to remove themselves from to the virtio-dmabuf shared 1842 table API. The shared table will remove the back-end device associated with 1843 the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the 1844 back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond 1845 with zero when operation is successfully completed, or non-zero otherwise. 1846 1847``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP`` 1848 :id: 8 1849 :equivalent ioctl: N/A 1850 :request payload: ``struct VhostUserShared`` 1851 :reply payload: dmabuf fd and ``u64`` 1852 1853 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1854 feature has been successfully negotiated, this message can be submitted 1855 by the backends to retrieve a given dma-buf fd from the virtio-dmabuf 1856 shared table given a UUID. Frontend will reply passing the fd and a zero 1857 when the operation is successful, or non-zero otherwise. Note that if the 1858 operation fails, no fd is sent to the backend. 1859 1860.. _reply_ack: 1861 1862VHOST_USER_PROTOCOL_F_REPLY_ACK 1863------------------------------- 1864 1865The original vhost-user specification only demands replies for certain 1866commands. This differs from the vhost protocol implementation where 1867commands are sent over an ``ioctl()`` call and block until the back-end 1868has completed. 1869 1870With this protocol extension negotiated, the sender (QEMU) can set the 1871``need_reply`` [Bit 3] flag to any command. This indicates that the 1872back-end MUST respond with a Payload ``VhostUserMsg`` indicating success 1873or failure. The payload should be set to zero on success or non-zero 1874on failure, unless the message already has an explicit reply body. 1875 1876The reply payload gives QEMU a deterministic indication of the result 1877of the command. Today, QEMU is expected to terminate the main vhost-user 1878loop upon receiving such errors. In future, qemu could be taught to be more 1879resilient for selective requests. 1880 1881For the message types that already solicit a reply from the back-end, 1882the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1883being set brings no behavioural change. (See the Communication_ 1884section for details.) 1885 1886.. _backend_conventions: 1887 1888Backend program conventions 1889=========================== 1890 1891vhost-user back-ends can provide various devices & services and may 1892need to be configured manually depending on the use case. However, it 1893is a good idea to follow the conventions listed here when 1894possible. Users, QEMU or libvirt, can then rely on some common 1895behaviour to avoid heterogeneous configuration and management of the 1896back-end programs and facilitate interoperability. 1897 1898Each back-end installed on a host system should come with at least one 1899JSON file that conforms to the vhost-user.json schema. Each file 1900informs the management applications about the back-end type, and binary 1901location. In addition, it defines rules for management apps for 1902picking the highest priority back-end when multiple match the search 1903criteria (see ``@VhostUserBackend`` documentation in the schema file). 1904 1905If the back-end is not capable of enabling a requested feature on the 1906host (such as 3D acceleration with virgl), or the initialization 1907failed, the back-end should fail to start early and exit with a status 1908!= 0. It may also print a message to stderr for further details. 1909 1910The back-end program must not daemonize itself, but it may be 1911daemonized by the management layer. It may also have a restricted 1912access to the system. 1913 1914File descriptors 0, 1 and 2 will exist, and have regular 1915stdin/stdout/stderr usage (they may have been redirected to /dev/null 1916by the management layer, or to a log handler). 1917 1918The back-end program must end (as quickly and cleanly as possible) when 1919the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1920the management layer after a few seconds. 1921 1922The following command line options have an expected behaviour. They 1923are mandatory, unless explicitly said differently: 1924 1925--socket-path=PATH 1926 1927 This option specify the location of the vhost-user Unix domain socket. 1928 It is incompatible with --fd. 1929 1930--fd=FDNUM 1931 1932 When this argument is given, the back-end program is started with the 1933 vhost-user socket as file descriptor FDNUM. It is incompatible with 1934 --socket-path. 1935 1936--print-capabilities 1937 1938 Output to stdout the back-end capabilities in JSON format, and then 1939 exit successfully. Other options and arguments should be ignored, and 1940 the back-end program should not perform its normal function. The 1941 capabilities can be reported dynamically depending on the host 1942 capabilities. 1943 1944The JSON output is described in the ``vhost-user.json`` schema, by 1945```@VHostUserBackendCapabilities``. Example: 1946 1947.. code:: json 1948 1949 { 1950 "type": "foo", 1951 "features": [ 1952 "feature-a", 1953 "feature-b" 1954 ] 1955 } 1956 1957vhost-user-input 1958---------------- 1959 1960Command line options: 1961 1962--evdev-path=PATH 1963 1964 Specify the linux input device. 1965 1966 (optional) 1967 1968--no-grab 1969 1970 Do no request exclusive access to the input device. 1971 1972 (optional) 1973 1974vhost-user-gpu 1975-------------- 1976 1977Command line options: 1978 1979--render-node=PATH 1980 1981 Specify the GPU DRM render node. 1982 1983 (optional) 1984 1985--virgl 1986 1987 Enable virgl rendering support. 1988 1989 (optional) 1990 1991vhost-user-blk 1992-------------- 1993 1994Command line options: 1995 1996--blk-file=PATH 1997 1998 Specify block device or file path. 1999 2000 (optional) 2001 2002--read-only 2003 2004 Enable read-only. 2005 2006 (optional) 2007