1.. _vhost_user_proto: 2 3=================== 4Vhost-user Protocol 5=================== 6 7.. 8 Copyright 2014 Virtual Open Systems Sarl. 9 Copyright 2019 Intel Corporation 10 Licence: This work is licensed under the terms of the GNU GPL, 11 version 2 or later. See the COPYING file in the top-level 12 directory. 13 14.. contents:: Table of Contents 15 16Introduction 17============ 18 19This protocol is aiming to complement the ``ioctl`` interface used to 20control the vhost implementation in the Linux kernel. It implements 21the control plane needed to establish virtqueue sharing with a user 22space process on the same host. It uses communication over a Unix 23domain socket to share file descriptors in the ancillary data of the 24message. 25 26The protocol defines 2 sides of the communication, *front-end* and 27*back-end*. The *front-end* is the application that shares its virtqueues, in 28our case QEMU. The *back-end* is the consumer of the virtqueues. 29 30In the current implementation QEMU is the *front-end*, and the *back-end* 31is the external process consuming the virtio queues, for example a 32software Ethernet switch running in user space, such as Snabbswitch, 33or a block device back-end processing read & write to a virtual 34disk. In order to facilitate interoperability between various back-end 35implementations, it is recommended to follow the :ref:`Backend program 36conventions <backend_conventions>`. 37 38The *front-end* and *back-end* can be either a client (i.e. connecting) or 39server (listening) in the socket communication. 40 41Support for platforms other than Linux 42-------------------------------------- 43 44While vhost-user was initially developed targeting Linux, nowadays it 45is supported on any platform that provides the following features: 46 47- A way for requesting shared memory represented by a file descriptor 48 so it can be passed over a UNIX domain socket and then mapped by the 49 other process. 50 51- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can 52 exchange messages through it, including ancillary data when needed. 53 54- Either eventfd or pipe/pipe2. On platforms where eventfd is not 55 available, QEMU will automatically fall back to pipe2 or, as a last 56 resort, pipe. Each file descriptor will be used for receiving or 57 sending events by reading or writing (respectively) an 8-byte value 58 to the corresponding it. The 8-value itself has no meaning and 59 should not be interpreted. 60 61Message Specification 62===================== 63 64.. Note:: All numbers are in the machine native byte order. 65 66A vhost-user message consists of 3 header fields and a payload. 67 68+---------+-------+------+---------+ 69| request | flags | size | payload | 70+---------+-------+------+---------+ 71 72Header 73------ 74 75:request: 32-bit type of the request 76 77:flags: 32-bit bit field 78 79- Lower 2 bits are the version (currently 0x01) 80- Bit 2 is the reply flag - needs to be sent on each reply from the back-end 81- Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for 82 details. 83 84:size: 32-bit size of the payload 85 86Payload 87------- 88 89Depending on the request type, **payload** can be: 90 91A single 64-bit integer 92^^^^^^^^^^^^^^^^^^^^^^^ 93 94+-----+ 95| u64 | 96+-----+ 97 98:u64: a 64-bit unsigned integer 99 100A vring state description 101^^^^^^^^^^^^^^^^^^^^^^^^^ 102 103+-------+-----+ 104| index | num | 105+-------+-----+ 106 107:index: a 32-bit index 108 109:num: a 32-bit number 110 111A vring descriptor index for split virtqueues 112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 113 114+-------------+---------------------+ 115| vring index | index in avail ring | 116+-------------+---------------------+ 117 118:vring index: 32-bit index of the respective virtqueue 119 120:index in avail ring: 32-bit value, of which currently only the lower 16 121 bits are used: 122 123 - Bits 0–15: Index of the next *Available Ring* descriptor that the 124 back-end will process. This is a free-running index that is not 125 wrapped by the ring size. 126 - Bits 16–31: Reserved (set to zero) 127 128Vring descriptor indices for packed virtqueues 129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 130 131+-------------+--------------------+ 132| vring index | descriptor indices | 133+-------------+--------------------+ 134 135:vring index: 32-bit index of the respective virtqueue 136 137:descriptor indices: 32-bit value: 138 139 - Bits 0–14: Index of the next *Available Ring* descriptor that the 140 back-end will process. This is a free-running index that is not 141 wrapped by the ring size. 142 - Bit 15: Driver (Available) Ring Wrap Counter 143 - Bits 16–30: Index of the entry in the *Used Ring* where the back-end 144 will place the next descriptor. This is a free-running index that 145 is not wrapped by the ring size. 146 - Bit 31: Device (Used) Ring Wrap Counter 147 148A vring address description 149^^^^^^^^^^^^^^^^^^^^^^^^^^^ 150 151+-------+-------+------------+------+-----------+-----+ 152| index | flags | descriptor | used | available | log | 153+-------+-------+------------+------+-----------+-----+ 154 155:index: a 32-bit vring index 156 157:flags: a 32-bit vring flags 158 159:descriptor: a 64-bit ring address of the vring descriptor table 160 161:used: a 64-bit ring address of the vring used ring 162 163:available: a 64-bit ring address of the vring available ring 164 165:log: a 64-bit guest address for logging 166 167Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has 168been negotiated. Otherwise it is a user address. 169 170.. _memory_region_description: 171 172Memory region description 173^^^^^^^^^^^^^^^^^^^^^^^^^ 174 175+---------------+------+--------------+-------------+ 176| guest address | size | user address | mmap offset | 177+---------------+------+--------------+-------------+ 178 179:guest address: a 64-bit guest address of the region 180 181:size: a 64-bit size 182 183:user address: a 64-bit user address 184 185:mmap offset: a 64-bit offset where region starts in the mapped memory 186 187When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been 188successfully negotiated, the memory region description contains two extra 189fields at the end. 190 191+---------------+------+--------------+-------------+----------------+-------+ 192| guest address | size | user address | mmap offset | xen mmap flags | domid | 193+---------------+------+--------------+-------------+----------------+-------+ 194 195:xen mmap flags: a 32-bit bit field 196 197- Bit 0 is set for Xen foreign memory mapping. 198- Bit 1 is set for Xen grant memory mapping. 199- Bit 8 is set if the memory region can not be mapped in advance, and memory 200 areas within this region must be mapped / unmapped only when required by the 201 back-end. The back-end shouldn't try to map the entire region at once, as the 202 front-end may not allow it. The back-end should rather map only the required 203 amount of memory at once and unmap it after it is used. 204 205:domid: a 32-bit Xen hypervisor specific domain id. 206 207Single memory region description 208^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 209 210+---------+--------+ 211| padding | region | 212+---------+--------+ 213 214:padding: 64-bit 215 216:region: region is represented by :ref:`Memory region description <memory_region_description>`. 217 218Multiple Memory regions description 219^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 220 221+-------------+---------+---------+-----+---------+ 222| num regions | padding | region0 | ... | region7 | 223+-------------+---------+---------+-----+---------+ 224 225:num regions: a 32-bit number of regions 226 227:padding: 32-bit 228 229:regions: regions field contains 8 regions of type :ref:`Memory region description <memory_region_description>`. 230 231Log description 232^^^^^^^^^^^^^^^ 233 234+----------+------------+ 235| log size | log offset | 236+----------+------------+ 237 238:log size: a 64-bit size of area used for logging 239 240:log offset: a 64-bit offset from start of supplied file descriptor where 241 logging starts (i.e. where guest address 0 would be 242 logged) 243 244An IOTLB message 245^^^^^^^^^^^^^^^^ 246 247+------+------+--------------+-------------------+------+ 248| iova | size | user address | permissions flags | type | 249+------+------+--------------+-------------------+------+ 250 251:iova: a 64-bit I/O virtual address programmed by the guest 252 253:size: a 64-bit size 254 255:user address: a 64-bit user address 256 257:permissions flags: an 8-bit value: 258 - 0: No access 259 - 1: Read access 260 - 2: Write access 261 - 3: Read/Write access 262 263:type: an 8-bit IOTLB message type: 264 - 1: IOTLB miss 265 - 2: IOTLB update 266 - 3: IOTLB invalidate 267 - 4: IOTLB access fail 268 269Virtio device config space 270^^^^^^^^^^^^^^^^^^^^^^^^^^ 271 272+--------+------+-------+---------+ 273| offset | size | flags | payload | 274+--------+------+-------+---------+ 275 276:offset: a 32-bit offset of virtio device's configuration space 277 278:size: a 32-bit configuration space access size in bytes 279 280:flags: a 32-bit value: 281 - 0: Vhost front-end messages used for writable fields 282 - 1: Vhost front-end messages used for live migration 283 284:payload: Size bytes array holding the contents of the virtio 285 device's configuration space 286 287Vring area description 288^^^^^^^^^^^^^^^^^^^^^^ 289 290+-----+------+--------+ 291| u64 | size | offset | 292+-----+------+--------+ 293 294:u64: a 64-bit integer contains vring index and flags 295 296:size: a 64-bit size of this area 297 298:offset: a 64-bit offset of this area from the start of the 299 supplied file descriptor 300 301Inflight description 302^^^^^^^^^^^^^^^^^^^^ 303 304+-----------+-------------+------------+------------+ 305| mmap size | mmap offset | num queues | queue size | 306+-----------+-------------+------------+------------+ 307 308:mmap size: a 64-bit size of area to track inflight I/O 309 310:mmap offset: a 64-bit offset of this area from the start 311 of the supplied file descriptor 312 313:num queues: a 16-bit number of virtqueues 314 315:queue size: a 16-bit size of virtqueues 316 317VhostUserShared 318^^^^^^^^^^^^^^^ 319 320+------+ 321| UUID | 322+------+ 323 324:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then 325 two 16-bit values) are stored in big endian. 326 327Device state transfer parameters 328^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 329 330+--------------------+-----------------+ 331| transfer direction | migration phase | 332+--------------------+-----------------+ 333 334:transfer direction: a 32-bit enum, describing the direction in which 335 the state is transferred: 336 337 - 0: Save: Transfer the state from the back-end to the front-end, 338 which happens on the source side of migration 339 - 1: Load: Transfer the state from the front-end to the back-end, 340 which happens on the destination side of migration 341 342:migration phase: a 32-bit enum, describing the state in which the VM 343 guest and devices are: 344 345 - 0: Stopped (in the period after the transfer of memory-mapped 346 regions before switch-over to the destination): The VM guest is 347 stopped, and the vhost-user device is suspended (see 348 :ref:`Suspended device state <suspended_device_state>`). 349 350 In the future, additional phases might be added e.g. to allow 351 iterative migration while the device is running. 352 353C structure 354----------- 355 356In QEMU the vhost-user message is implemented with the following struct: 357 358.. code:: c 359 360 typedef struct VhostUserMsg { 361 VhostUserRequest request; 362 uint32_t flags; 363 uint32_t size; 364 union { 365 uint64_t u64; 366 struct vhost_vring_state state; 367 struct vhost_vring_addr addr; 368 VhostUserMemory memory; 369 VhostUserLog log; 370 struct vhost_iotlb_msg iotlb; 371 VhostUserConfig config; 372 VhostUserVringArea area; 373 VhostUserInflight inflight; 374 }; 375 } QEMU_PACKED VhostUserMsg; 376 377Communication 378============= 379 380The protocol for vhost-user is based on the existing implementation of 381vhost for the Linux Kernel. Most messages that can be sent via the 382Unix domain socket implementing vhost-user have an equivalent ioctl to 383the kernel implementation. 384 385The communication consists of the *front-end* sending message requests and 386the *back-end* sending message replies. Most of the requests don't require 387replies, except for the following requests: 388 389* ``VHOST_USER_GET_FEATURES`` 390* ``VHOST_USER_GET_PROTOCOL_FEATURES`` 391* ``VHOST_USER_GET_VRING_BASE`` 392* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 393* ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 394 395.. seealso:: 396 397 :ref:`REPLY_ACK <reply_ack>` 398 The section on ``REPLY_ACK`` protocol extension. 399 400There are several messages that the front-end sends with file descriptors passed 401in the ancillary data: 402 403* ``VHOST_USER_ADD_MEM_REG`` 404* ``VHOST_USER_SET_MEM_TABLE`` 405* ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) 406* ``VHOST_USER_SET_LOG_FD`` 407* ``VHOST_USER_SET_VRING_KICK`` 408* ``VHOST_USER_SET_VRING_CALL`` 409* ``VHOST_USER_SET_VRING_ERR`` 410* ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) 411* ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) 412* ``VHOST_USER_SET_DEVICE_STATE_FD`` 413 414If *front-end* is unable to send the full message or receives a wrong 415reply it will close the connection. An optional reconnection mechanism 416can be implemented. 417 418If *back-end* detects some error such as incompatible features, it may also 419close the connection. This should only happen in exceptional circumstances. 420 421Any protocol extensions are gated by protocol feature bits, which 422allows full backwards compatibility on both front-end and back-end. As 423older back-ends don't support negotiating protocol features, a feature 424bit was dedicated for this purpose:: 425 426 #define VHOST_USER_F_PROTOCOL_FEATURES 30 427 428Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature 429bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits 430<https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_. 431VIRTIO devices do not advertise this feature bit and therefore VIRTIO 432drivers cannot negotiate it. 433 434This reserved feature bit was reused by the vhost-user protocol to add 435vhost-user protocol feature negotiation in a backwards compatible 436fashion. Old vhost-user front-end and back-end implementations continue to 437work even though they are not aware of vhost-user protocol feature 438negotiation. 439 440Ring states 441----------- 442 443Rings have two independent states: started/stopped, and enabled/disabled. 444 445* While a ring is stopped, the back-end must not process the ring at 446 all, regardless of whether it is enabled or disabled. The 447 enabled/disabled state should still be tracked, though, so it can come 448 into effect once the ring is started. 449 450* started and disabled: The back-end must process the ring without 451 causing any side effects. For example, for a networking device, 452 in the disabled state the back-end must not supply any new RX packets, 453 but must process and discard any TX packets. 454 455* started and enabled: The back-end must process the ring normally, i.e. 456 process all requests and execute them. 457 458Each ring is initialized in a stopped and disabled state. The back-end 459must start a ring upon receiving a kick (that is, detecting that file 460descriptor is readable) on the descriptor specified by 461``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message 462``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving 463``VHOST_USER_GET_VRING_BASE``. 464 465Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. 466 467In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from 468the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the 469back-end must enable all rings immediately. 470 471While processing the rings (whether they are enabled or not), the back-end 472must support changing some configuration aspects on the fly. 473 474.. _suspended_device_state: 475 476Suspended device state 477^^^^^^^^^^^^^^^^^^^^^^ 478 479While all vrings are stopped, the device is *suspended*. In addition to 480not processing any vring (because they are stopped), the device must: 481 482* not write to any guest memory regions, 483* not send any notifications to the guest, 484* not send any messages to the front-end, 485* still process and reply to messages from the front-end. 486 487Multiple queue support 488---------------------- 489 490Many devices have a fixed number of virtqueues. In this case the front-end 491already knows the number of available virtqueues without communicating with the 492back-end. 493 494Some devices do not have a fixed number of virtqueues. Instead the maximum 495number of virtqueues is chosen by the back-end. The number can depend on host 496resource availability or back-end implementation details. Such devices are called 497multiple queue devices. 498 499Multiple queue support allows the back-end to advertise the maximum number of 500queues. This is treated as a protocol extension, hence the back-end has to 501implement protocol features first. The multiple queues feature is supported 502only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. 503 504The max number of queues the back-end supports can be queried with message 505``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested 506queues is bigger than that. 507 508As all queues share one connection, the front-end uses a unique index for each 509queue in the sent message to identify a specified queue. 510 511The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. 512vhost-user-net has historically automatically enabled the first queue pair. 513 514Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol 515feature, even for devices with a fixed number of virtqueues, since it is simple 516to implement and offers a degree of introspection. 517 518Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for 519devices with a fixed number of virtqueues. Only true multiqueue devices 520require this protocol feature. 521 522Migration 523--------- 524 525During live migration, the front-end may need to track the modifications 526the back-end makes to the memory mapped regions. The front-end should mark 527the dirty pages in a log. Once it complies to this logging, it may 528declare the ``VHOST_F_LOG_ALL`` vhost feature. 529 530To start/stop logging of data/used ring writes, the front-end may send 531messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and 532``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's 533flags set to 1/0, respectively. 534 535All the modifications to memory pointed by vring "descriptor" should 536be marked. Modifications to "used" vring should be marked if 537``VHOST_VRING_F_LOG`` is part of ring's flags. 538 539Dirty pages are of size:: 540 541 #define VHOST_LOG_PAGE 0x1000 542 543The log memory fd is provided in the ancillary data of 544``VHOST_USER_SET_LOG_BASE`` message when the back-end has 545``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. 546 547The size of the log is supplied as part of ``VhostUserMsg`` which 548should be large enough to cover all known guest addresses. Log starts 549at the supplied offset in the supplied file descriptor. The log 550covers from address 0 to the maximum of guest regions. In pseudo-code, 551to mark page at ``addr`` as dirty:: 552 553 page = addr / VHOST_LOG_PAGE 554 log[page / 8] |= 1 << page % 8 555 556Where ``addr`` is the guest physical address. 557 558Use atomic operations, as the log may be concurrently manipulated. 559 560Note that when logging modifications to the used ring (when 561``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should 562be used to calculate the log offset: the write to first byte of the 563used ring is logged at this offset from log start. Also note that this 564value might be outside the legal guest physical address range 565(i.e. does not have to be covered by the ``VhostUserMemory`` table), but 566the bit offset of the last byte of the ring must fall within the size 567supplied by ``VhostUserLog``. 568 569``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in 570ancillary data, it may be used to inform the front-end that the log has 571been modified. 572 573Once the source has finished migration, rings will be stopped by the 574source (:ref:`Suspended device state <suspended_device_state>`). No 575further update must be done before rings are restarted. 576 577In postcopy migration the back-end is started before all the memory has 578been received from the source host, and care must be taken to avoid 579accessing pages that have yet to be received. The back-end opens a 580'userfault'-fd and registers the memory with it; this fd is then 581passed back over to the front-end. The front-end services requests on the 582userfaultfd for pages that are accessed and when the page is available 583it performs WAKE ioctl's on the userfaultfd to wake the stalled 584back-end. The front-end indicates support for this via the 585``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. 586 587.. _migrating_backend_state: 588 589Migrating back-end state 590^^^^^^^^^^^^^^^^^^^^^^^^ 591 592Migrating device state involves transferring the state from one 593back-end, called the source, to another back-end, called the 594destination. After migration, the destination transparently resumes 595operation without requiring the driver to re-initialize the device at 596the VIRTIO level. If the migration fails, then the source can 597transparently resume operation until another migration attempt is made. 598 599Generally, the front-end is connected to a virtual machine guest (which 600contains the driver), which has its own state to transfer between source 601and destination, and therefore will have an implementation-specific 602mechanism to do so. The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature 603provides functionality to have the front-end include the back-end's 604state in this transfer operation so the back-end does not need to 605implement its own mechanism, and so the virtual machine may have its 606complete state, including vhost-user devices' states, contained within a 607single stream of data. 608 609To do this, the back-end state is transferred from back-end to front-end 610on the source side, and vice versa on the destination side. This 611transfer happens over a channel that is negotiated using the 612``VHOST_USER_SET_DEVICE_STATE_FD`` message. This message has two 613parameters: 614 615* Direction of transfer: On the source, the data is saved, transferring 616 it from the back-end to the front-end. On the destination, the data 617 is loaded, transferring it from the front-end to the back-end. 618 619* Migration phase: Currently, the only supported phase is the period 620 after the transfer of memory-mapped regions before switch-over to the 621 destination, when both the source and destination devices are 622 suspended (:ref:`Suspended device state <suspended_device_state>`). 623 In the future, additional phases might be supported to allow iterative 624 migration while the device is running. 625 626The nature of the channel is implementation-defined, but it must 627generally behave like a pipe: The writing end will write all the data it 628has into it, signalling the end of data by closing its end. The reading 629end must read all of this data (until encountering the end of file) and 630process it. 631 632* When saving, the writing end is the source back-end, and the reading 633 end is the source front-end. After reading the state data from the 634 channel, the source front-end must transfer it to the destination 635 front-end through an implementation-defined mechanism. 636 637* When loading, the writing end is the destination front-end, and the 638 reading end is the destination back-end. After reading the state data 639 from the channel, the destination back-end must deserialize its 640 internal state from that data and set itself up to allow the driver to 641 seamlessly resume operation on the VIRTIO level. 642 643Seamlessly resuming operation means that the migration must be 644transparent to the guest driver, which operates on the VIRTIO level. 645This driver will not perform any re-initialization steps, but continue 646to use the device as if no migration had occurred. The vhost-user 647front-end, however, will re-initialize the vhost state on the 648destination, following the usual protocol for establishing a connection 649to a vhost-user back-end: This includes, for example, setting up memory 650mappings and kick and call FDs as necessary, negotiating protocol 651features, or setting the initial vring base indices (to the same value 652as on the source side, so that operation can resume). 653 654Both on the source and on the destination side, after the respective 655front-end has seen all data transferred (when the transfer FD has been 656closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to 657verify that data transfer was successful in the back-end, too. The 658back-end responds once it knows whether the transfer and processing was 659successful or not. 660 661Memory access 662------------- 663 664The front-end sends a list of vhost memory regions to the back-end using the 665``VHOST_USER_SET_MEM_TABLE`` message. Each region has two base 666addresses: a guest address and a user address. 667 668Messages contain guest addresses and/or user addresses to reference locations 669within the shared memory. The mapping of these addresses works as follows. 670 671User addresses map to the vhost memory region containing that user address. 672 673When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: 674 675* Guest addresses map to the vhost memory region containing that guest 676 address. 677 678When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: 679 680* Guest addresses are also called I/O virtual addresses (IOVAs). They are 681 translated to user addresses via the IOTLB. 682 683* The vhost memory region guest address is not used. 684 685IOMMU support 686------------- 687 688When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the 689front-end sends IOTLB entries update & invalidation by sending 690``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct 691vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload 692has to be filled with the update message type (2), the I/O virtual 693address, the size, the user virtual address, and the permissions 694flags. Addresses and size must be within vhost memory regions set via 695the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the 696``iotlb`` payload has to be filled with the invalidation message type 697(3), the I/O virtual address and the size. On success, the back-end is 698expected to reply with a zero payload, non-zero otherwise. 699 700The back-end relies on the back-end communication channel (see :ref:`Back-end 701communication <backend_communication>` section below) to send IOTLB miss 702and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG`` 703requests to the front-end with a ``struct vhost_iotlb_msg`` as 704payload. For miss events, the iotlb payload has to be filled with the 705miss message type (1), the I/O virtual address and the permissions 706flags. For access failure event, the iotlb payload has to be filled 707with the access failure message type (4), the I/O virtual address and 708the permissions flags. For synchronization purpose, the back-end may 709rely on the reply-ack feature, so the front-end may send a reply when 710operation is completed if the reply-ack feature is negotiated and 711back-ends requests a reply. For miss events, completed operation means 712either front-end sent an update message containing the IOTLB entry 713containing requested address and permission, or front-end sent nothing if 714the IOTLB miss message is invalid (invalid IOVA or permission). 715 716The front-end isn't expected to take the initiative to send IOTLB update 717messages, as the back-end sends IOTLB miss messages for the guest virtual 718memory areas it needs to access. 719 720.. _backend_communication: 721 722Back-end communication 723---------------------- 724 725An optional communication channel is provided if the back-end declares 726``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the 727back-end to make requests to the front-end. 728 729The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data. 730 731A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end 732using this fd communication channel. 733 734If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is 735negotiated, back-end can send file descriptors (at most 8 descriptors in 736each message) to front-end via ancillary data using this fd communication 737channel. 738 739Inflight I/O tracking 740--------------------- 741 742To support reconnecting after restart or crash, back-end may need to 743resubmit inflight I/Os. If virtqueue is processed in order, we can 744easily achieve that by getting the inflight descriptors from 745descriptor table (split virtqueue) or descriptor ring (packed 746virtqueue). However, it can't work when we process descriptors 747out-of-order because some entries which store the information of 748inflight descriptors in available ring (split virtqueue) or descriptor 749ring (packed virtqueue) might be overridden by new entries. To solve 750this problem, the back-end need to allocate an extra buffer to store this 751information of inflight descriptors and share it with front-end for 752persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and 753``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer 754between front-end and back-end. And the format of this buffer is described 755below: 756 757+---------------+---------------+-----+---------------+ 758| queue0 region | queue1 region | ... | queueN region | 759+---------------+---------------+-----+---------------+ 760 761N is the number of available virtqueues. The back-end could get it from num 762queues field of ``VhostUserInflight``. 763 764For split virtqueue, queue region can be implemented as: 765 766.. code:: c 767 768 typedef struct DescStateSplit { 769 /* Indicate whether this descriptor is inflight or not. 770 * Only available for head-descriptor. */ 771 uint8_t inflight; 772 773 /* Padding */ 774 uint8_t padding[5]; 775 776 /* Maintain a list for the last batch of used descriptors. 777 * Only available when batching is used for submitting */ 778 uint16_t next; 779 780 /* Used to preserve the order of fetching available descriptors. 781 * Only available for head-descriptor. */ 782 uint64_t counter; 783 } DescStateSplit; 784 785 typedef struct QueueRegionSplit { 786 /* The feature flags of this region. Now it's initialized to 0. */ 787 uint64_t features; 788 789 /* The version of this region. It's 1 currently. 790 * Zero value indicates an uninitialized buffer */ 791 uint16_t version; 792 793 /* The size of DescStateSplit array. It's equal to the virtqueue size. 794 * The back-end could get it from queue size field of VhostUserInflight. */ 795 uint16_t desc_num; 796 797 /* The head of list that track the last batch of used descriptors. */ 798 uint16_t last_batch_head; 799 800 /* Store the idx value of used ring */ 801 uint16_t used_idx; 802 803 /* Used to track the state of each descriptor in descriptor table */ 804 DescStateSplit desc[]; 805 } QueueRegionSplit; 806 807To track inflight I/O, the queue region should be processed as follows: 808 809When receiving available buffers from the driver: 810 811#. Get the next available head-descriptor index from available ring, ``i`` 812 813#. Set ``desc[i].counter`` to the value of global counter 814 815#. Increase global counter by 1 816 817#. Set ``desc[i].inflight`` to 1 818 819When supplying used buffers to the driver: 820 8211. Get corresponding used head-descriptor index, i 822 8232. Set ``desc[i].next`` to ``last_batch_head`` 824 8253. Set ``last_batch_head`` to ``i`` 826 827#. Steps 1,2,3 may be performed repeatedly if batching is possible 828 829#. Increase the ``idx`` value of used ring by the size of the batch 830 831#. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 832 833#. Set ``used_idx`` to the ``idx`` value of used ring 834 835When reconnecting: 836 837#. If the value of ``used_idx`` does not match the ``idx`` value of 838 used ring (means the inflight field of ``DescStateSplit`` entries in 839 last batch may be incorrect), 840 841 a. Subtract the value of ``used_idx`` from the ``idx`` value of 842 used ring to get last batch size of ``DescStateSplit`` entries 843 844 #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch 845 list which starts from ``last_batch_head`` 846 847 #. Set ``used_idx`` to the ``idx`` value of used ring 848 849#. Resubmit inflight ``DescStateSplit`` entries in order of their 850 counter value 851 852For packed virtqueue, queue region can be implemented as: 853 854.. code:: c 855 856 typedef struct DescStatePacked { 857 /* Indicate whether this descriptor is inflight or not. 858 * Only available for head-descriptor. */ 859 uint8_t inflight; 860 861 /* Padding */ 862 uint8_t padding; 863 864 /* Link to the next free entry */ 865 uint16_t next; 866 867 /* Link to the last entry of descriptor list. 868 * Only available for head-descriptor. */ 869 uint16_t last; 870 871 /* The length of descriptor list. 872 * Only available for head-descriptor. */ 873 uint16_t num; 874 875 /* Used to preserve the order of fetching available descriptors. 876 * Only available for head-descriptor. */ 877 uint64_t counter; 878 879 /* The buffer id */ 880 uint16_t id; 881 882 /* The descriptor flags */ 883 uint16_t flags; 884 885 /* The buffer length */ 886 uint32_t len; 887 888 /* The buffer address */ 889 uint64_t addr; 890 } DescStatePacked; 891 892 typedef struct QueueRegionPacked { 893 /* The feature flags of this region. Now it's initialized to 0. */ 894 uint64_t features; 895 896 /* The version of this region. It's 1 currently. 897 * Zero value indicates an uninitialized buffer */ 898 uint16_t version; 899 900 /* The size of DescStatePacked array. It's equal to the virtqueue size. 901 * The back-end could get it from queue size field of VhostUserInflight. */ 902 uint16_t desc_num; 903 904 /* The head of free DescStatePacked entry list */ 905 uint16_t free_head; 906 907 /* The old head of free DescStatePacked entry list */ 908 uint16_t old_free_head; 909 910 /* The used index of descriptor ring */ 911 uint16_t used_idx; 912 913 /* The old used index of descriptor ring */ 914 uint16_t old_used_idx; 915 916 /* Device ring wrap counter */ 917 uint8_t used_wrap_counter; 918 919 /* The old device ring wrap counter */ 920 uint8_t old_used_wrap_counter; 921 922 /* Padding */ 923 uint8_t padding[7]; 924 925 /* Used to track the state of each descriptor fetched from descriptor ring */ 926 DescStatePacked desc[]; 927 } QueueRegionPacked; 928 929To track inflight I/O, the queue region should be processed as follows: 930 931When receiving available buffers from the driver: 932 933#. Get the next available descriptor entry from descriptor ring, ``d`` 934 935#. If ``d`` is head descriptor, 936 937 a. Set ``desc[old_free_head].num`` to 0 938 939 #. Set ``desc[old_free_head].counter`` to the value of global counter 940 941 #. Increase global counter by 1 942 943 #. Set ``desc[old_free_head].inflight`` to 1 944 945#. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to 946 ``free_head`` 947 948#. Increase ``desc[old_free_head].num`` by 1 949 950#. Set ``desc[free_head].addr``, ``desc[free_head].len``, 951 ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, 952 ``d.len``, ``d.flags``, ``d.id`` 953 954#. Set ``free_head`` to ``desc[free_head].next`` 955 956#. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` 957 958When supplying used buffers to the driver: 959 9601. Get corresponding used head-descriptor entry from descriptor ring, 961 ``d`` 962 9632. Get corresponding ``DescStatePacked`` entry, ``e`` 964 9653. Set ``desc[e.last].next`` to ``free_head`` 966 9674. Set ``free_head`` to the index of ``e`` 968 969#. Steps 1,2,3,4 may be performed repeatedly if batching is possible 970 971#. Increase ``used_idx`` by the size of the batch and update 972 ``used_wrap_counter`` if needed 973 974#. Update ``d.flags`` 975 976#. Set the ``inflight`` field of each head ``DescStatePacked`` entry 977 in the batch to 0 978 979#. Set ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 980 to ``free_head``, ``used_idx``, ``used_wrap_counter`` 981 982When reconnecting: 983 984#. If ``used_idx`` does not match ``old_used_idx`` (means the 985 ``inflight`` field of ``DescStatePacked`` entries in last batch may 986 be incorrect), 987 988 a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` 989 990 #. Use ``old_used_wrap_counter`` to calculate the available flags 991 992 #. If ``d.flags`` is not equal to the calculated flags value (means 993 back-end has submitted the buffer to guest driver before crash, so 994 it has to commit the in-progress update), set ``old_free_head``, 995 ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, 996 ``used_idx``, ``used_wrap_counter`` 997 998#. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to 999 ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` 1000 (roll back any in-progress update) 1001 1002#. Set the ``inflight`` field of each ``DescStatePacked`` entry in 1003 free list to 0 1004 1005#. Resubmit inflight ``DescStatePacked`` entries in order of their 1006 counter value 1007 1008In-band notifications 1009--------------------- 1010 1011In some limited situations (e.g. for simulation) it is desirable to 1012have the kick, call and error (if used) signals done via in-band 1013messages instead of asynchronous eventfd notifications. This can be 1014done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` 1015protocol feature. 1016 1017Note that due to the fact that too many messages on the sockets can 1018cause the sending application(s) to block, it is not advised to use 1019this feature unless absolutely necessary. It is also considered an 1020error to negotiate this feature without also negotiating 1021``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, 1022the former is necessary for getting a message channel from the back-end 1023to the front-end, while the latter needs to be used with the in-band 1024notification messages to block until they are processed, both to avoid 1025blocking later and for proper processing (at least in the simulation 1026use case.) As it has no other way of signalling this error, the back-end 1027should close the connection as a response to a 1028``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band 1029notifications feature flag without the other two. 1030 1031Protocol features 1032----------------- 1033 1034.. code:: c 1035 1036 #define VHOST_USER_PROTOCOL_F_MQ 0 1037 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 1038 #define VHOST_USER_PROTOCOL_F_RARP 2 1039 #define VHOST_USER_PROTOCOL_F_REPLY_ACK 3 1040 #define VHOST_USER_PROTOCOL_F_MTU 4 1041 #define VHOST_USER_PROTOCOL_F_BACKEND_REQ 5 1042 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN 6 1043 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 1044 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 1045 #define VHOST_USER_PROTOCOL_F_CONFIG 9 1046 #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD 10 1047 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11 1048 #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12 1049 #define VHOST_USER_PROTOCOL_F_RESET_DEVICE 13 1050 #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 1051 #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 1052 #define VHOST_USER_PROTOCOL_F_STATUS 16 1053 #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 1054 #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT 18 1055 #define VHOST_USER_PROTOCOL_F_DEVICE_STATE 19 1056 1057Front-end message types 1058----------------------- 1059 1060``VHOST_USER_GET_FEATURES`` 1061 :id: 1 1062 :equivalent ioctl: ``VHOST_GET_FEATURES`` 1063 :request payload: N/A 1064 :reply payload: ``u64`` 1065 1066 Get from the underlying vhost implementation the features bitmask. 1067 Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support 1068 for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 1069 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 1070 1071``VHOST_USER_SET_FEATURES`` 1072 :id: 2 1073 :equivalent ioctl: ``VHOST_SET_FEATURES`` 1074 :request payload: ``u64`` 1075 :reply payload: N/A 1076 1077 Enable features in the underlying vhost implementation using a 1078 bitmask. Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals 1079 back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and 1080 ``VHOST_USER_SET_PROTOCOL_FEATURES``. 1081 1082``VHOST_USER_GET_PROTOCOL_FEATURES`` 1083 :id: 15 1084 :equivalent ioctl: ``VHOST_GET_FEATURES`` 1085 :request payload: N/A 1086 :reply payload: ``u64`` 1087 1088 Get the protocol feature bitmask from the underlying vhost 1089 implementation. Only legal if feature bit 1090 ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 1091 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 1092 ``VHOST_USER_SET_FEATURES``. 1093 1094.. Note:: 1095 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must 1096 support this message even before ``VHOST_USER_SET_FEATURES`` was 1097 called. 1098 1099``VHOST_USER_SET_PROTOCOL_FEATURES`` 1100 :id: 16 1101 :equivalent ioctl: ``VHOST_SET_FEATURES`` 1102 :request payload: ``u64`` 1103 :reply payload: N/A 1104 1105 Enable protocol features in the underlying vhost implementation. 1106 1107 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in 1108 ``VHOST_USER_GET_FEATURES``. It does not need to be acknowledged by 1109 ``VHOST_USER_SET_FEATURES``. 1110 1111.. Note:: 1112 Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support 1113 this message even before ``VHOST_USER_SET_FEATURES`` was called. 1114 1115``VHOST_USER_SET_OWNER`` 1116 :id: 3 1117 :equivalent ioctl: ``VHOST_SET_OWNER`` 1118 :request payload: N/A 1119 :reply payload: N/A 1120 1121 Issued when a new connection is established. It marks the sender 1122 as the front-end that owns of the session. This can be used on the *back-end* 1123 as a "session start" flag. 1124 1125``VHOST_USER_RESET_OWNER`` 1126 :id: 4 1127 :request payload: N/A 1128 :reply payload: N/A 1129 1130.. admonition:: Deprecated 1131 1132 This is no longer used. Used to be sent to request disabling all 1133 rings, but some back-ends interpreted it to also discard connection 1134 state (this interpretation would lead to bugs). It is recommended 1135 that back-ends either ignore this message, or use it to disable all 1136 rings. 1137 1138``VHOST_USER_SET_MEM_TABLE`` 1139 :id: 5 1140 :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` 1141 :request payload: multiple memory regions description 1142 :reply payload: (postcopy only) multiple memory regions description 1143 1144 Sets the memory map regions on the back-end so it can translate the 1145 vring addresses. In the ancillary data there is an array of file 1146 descriptors for each memory mapped region. The size and ordering of 1147 the fds matches the number and ordering of memory regions. 1148 1149 When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, 1150 ``SET_MEM_TABLE`` replies with the bases of the memory mapped 1151 regions to the front-end. The back-end must have mmap'd the regions but 1152 not yet accessed them and should not yet generate a userfault 1153 event. 1154 1155.. Note:: 1156 ``NEED_REPLY_MASK`` is not set in this case. QEMU will then 1157 reply back to the list of mappings with an empty 1158 ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon 1159 reception of this message may the guest start accessing the memory 1160 and generating faults. 1161 1162``VHOST_USER_SET_LOG_BASE`` 1163 :id: 6 1164 :equivalent ioctl: ``VHOST_SET_LOG_BASE`` 1165 :request payload: u64 1166 :reply payload: N/A 1167 1168 Sets logging shared memory space. 1169 1170 When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, 1171 the log memory fd is provided in the ancillary data of 1172 ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared 1173 memory area provided in the message. 1174 1175``VHOST_USER_SET_LOG_FD`` 1176 :id: 7 1177 :equivalent ioctl: ``VHOST_SET_LOG_FD`` 1178 :request payload: N/A 1179 :reply payload: N/A 1180 1181 Sets the logging file descriptor, which is passed as ancillary data. 1182 1183``VHOST_USER_SET_VRING_NUM`` 1184 :id: 8 1185 :equivalent ioctl: ``VHOST_SET_VRING_NUM`` 1186 :request payload: vring state description 1187 :reply payload: N/A 1188 1189 Set the size of the queue. 1190 1191``VHOST_USER_SET_VRING_ADDR`` 1192 :id: 9 1193 :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` 1194 :request payload: vring address description 1195 :reply payload: N/A 1196 1197 Sets the addresses of the different aspects of the vring. 1198 1199``VHOST_USER_SET_VRING_BASE`` 1200 :id: 10 1201 :equivalent ioctl: ``VHOST_SET_VRING_BASE`` 1202 :request payload: vring descriptor index/indices 1203 :reply payload: N/A 1204 1205 Sets the next index to use for descriptors in this vring: 1206 1207 * For a split virtqueue, sets only the next descriptor index to 1208 process in the *Available Ring*. The device is supposed to read the 1209 next index in the *Used Ring* from the respective vring structure in 1210 guest memory. 1211 1212 * For a packed virtqueue, both indices are supplied, as they are not 1213 explicitly available in memory. 1214 1215 Consequently, the payload type is specific to the type of virt queue 1216 (*a vring descriptor index for split virtqueues* vs. *vring descriptor 1217 indices for packed virtqueues*). 1218 1219``VHOST_USER_GET_VRING_BASE`` 1220 :id: 11 1221 :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` 1222 :request payload: vring state description 1223 :reply payload: vring descriptor index/indices 1224 1225 Stops the vring and returns the current descriptor index or indices: 1226 1227 * For a split virtqueue, returns only the 16-bit next descriptor 1228 index to process in the *Available Ring*. Note that this may 1229 differ from the available ring index in the vring structure in 1230 memory, which points to where the driver will put new available 1231 descriptors. For the *Used Ring*, the device only needs the next 1232 descriptor index at which to put new descriptors, which is the 1233 value in the vring structure in memory, so this value is not 1234 covered by this message. 1235 1236 * For a packed virtqueue, neither index is explicitly available to 1237 read from memory, so both indices (as maintained by the device) are 1238 returned. 1239 1240 Consequently, the payload type is specific to the type of virt queue 1241 (*a vring descriptor index for split virtqueues* vs. *vring descriptor 1242 indices for packed virtqueues*). 1243 1244 When and as long as all of a device's vrings are stopped, it is 1245 *suspended*, see :ref:`Suspended device state 1246 <suspended_device_state>`. 1247 1248 The request payload's *num* field is currently reserved and must be 1249 set to 0. 1250 1251``VHOST_USER_SET_VRING_KICK`` 1252 :id: 12 1253 :equivalent ioctl: ``VHOST_SET_VRING_KICK`` 1254 :request payload: ``u64`` 1255 :reply payload: N/A 1256 1257 Set the event file descriptor for adding buffers to the vring. It is 1258 passed in the ancillary data. 1259 1260 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1261 invalid FD flag. This flag is set when there is no file descriptor 1262 in the ancillary data. This signals that polling should be used 1263 instead of waiting for the kick. Note that if the protocol feature 1264 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated 1265 this message isn't necessary as the ring is also started on the 1266 ``VHOST_USER_VRING_KICK`` message, it may however still be used to 1267 set an event file descriptor (which will be preferred over the 1268 message) or to enable polling. 1269 1270``VHOST_USER_SET_VRING_CALL`` 1271 :id: 13 1272 :equivalent ioctl: ``VHOST_SET_VRING_CALL`` 1273 :request payload: ``u64`` 1274 :reply payload: N/A 1275 1276 Set the event file descriptor to signal when buffers are used. It is 1277 passed in the ancillary data. 1278 1279 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1280 invalid FD flag. This flag is set when there is no file descriptor 1281 in the ancillary data. This signals that polling will be used 1282 instead of waiting for the call. Note that if the protocol features 1283 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1284 ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message 1285 isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be 1286 used, it may however still be used to set an event file descriptor 1287 or to enable polling. 1288 1289``VHOST_USER_SET_VRING_ERR`` 1290 :id: 14 1291 :equivalent ioctl: ``VHOST_SET_VRING_ERR`` 1292 :request payload: ``u64`` 1293 :reply payload: N/A 1294 1295 Set the event file descriptor to signal when error occurs. It is 1296 passed in the ancillary data. 1297 1298 Bits (0-7) of the payload contain the vring index. Bit 8 is the 1299 invalid FD flag. This flag is set when there is no file descriptor 1300 in the ancillary data. Note that if the protocol features 1301 ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and 1302 ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message 1303 isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be 1304 used, it may however still be used to set an event file descriptor 1305 (which will be preferred over the message). 1306 1307``VHOST_USER_GET_QUEUE_NUM`` 1308 :id: 17 1309 :equivalent ioctl: N/A 1310 :request payload: N/A 1311 :reply payload: u64 1312 1313 Query how many queues the back-end supports. 1314 1315 This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` 1316 is set in queried protocol features by 1317 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1318 1319``VHOST_USER_SET_VRING_ENABLE`` 1320 :id: 18 1321 :equivalent ioctl: N/A 1322 :request payload: vring state description 1323 :reply payload: N/A 1324 1325 Signal the back-end to enable or disable corresponding vring. 1326 1327 This request should be sent only when 1328 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. 1329 1330``VHOST_USER_SEND_RARP`` 1331 :id: 19 1332 :equivalent ioctl: N/A 1333 :request payload: ``u64`` 1334 :reply payload: N/A 1335 1336 Ask vhost user back-end to broadcast a fake RARP to notify the migration 1337 is terminated for guest that does not support GUEST_ANNOUNCE. 1338 1339 Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is 1340 present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1341 ``VHOST_USER_PROTOCOL_F_RARP`` is present in 1342 ``VHOST_USER_GET_PROTOCOL_FEATURES``. The first 6 bytes of the 1343 payload contain the mac address of the guest to allow the vhost user 1344 back-end to construct and broadcast the fake RARP. 1345 1346``VHOST_USER_NET_SET_MTU`` 1347 :id: 20 1348 :equivalent ioctl: N/A 1349 :request payload: ``u64`` 1350 :reply payload: N/A 1351 1352 Set host MTU value exposed to the guest. 1353 1354 This request should be sent only when ``VIRTIO_NET_F_MTU`` feature 1355 has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` 1356 is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit 1357 ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in 1358 ``VHOST_USER_GET_PROTOCOL_FEATURES``. 1359 1360 If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1361 respond with zero in case the specified MTU is valid, or non-zero 1362 otherwise. 1363 1364``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) 1365 :id: 21 1366 :equivalent ioctl: N/A 1367 :request payload: N/A 1368 :reply payload: N/A 1369 1370 Set the socket file descriptor for back-end initiated requests. It is passed 1371 in the ancillary data. 1372 1373 This request should be sent only when 1374 ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol 1375 feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in 1376 ``VHOST_USER_GET_PROTOCOL_FEATURES``. If 1377 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must 1378 respond with zero for success, non-zero otherwise. 1379 1380``VHOST_USER_IOTLB_MSG`` 1381 :id: 22 1382 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1383 :request payload: ``struct vhost_iotlb_msg`` 1384 :reply payload: ``u64`` 1385 1386 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1387 1388 The front-end sends such requests to update and invalidate entries in the 1389 device IOTLB. The back-end has to acknowledge the request with sending 1390 zero as ``u64`` payload for success, non-zero otherwise. 1391 1392 This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` 1393 feature has been successfully negotiated. 1394 1395``VHOST_USER_SET_VRING_ENDIAN`` 1396 :id: 23 1397 :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` 1398 :request payload: vring state description 1399 :reply payload: N/A 1400 1401 Set the endianness of a VQ for legacy devices. Little-endian is 1402 indicated with state.num set to 0 and big-endian is indicated with 1403 state.num set to 1. Other values are invalid. 1404 1405 This request should be sent only when 1406 ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. 1407 Backends that negotiated this feature should handle both 1408 endiannesses and expect this message once (per VQ) during device 1409 configuration (ie. before the front-end starts the VQ). 1410 1411``VHOST_USER_GET_CONFIG`` 1412 :id: 24 1413 :equivalent ioctl: N/A 1414 :request payload: virtio device config space 1415 :reply payload: virtio device config space 1416 1417 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1418 submitted by the vhost-user front-end to fetch the contents of the 1419 virtio device configuration space, vhost-user back-end's payload size 1420 MUST match the front-end's request, vhost-user back-end uses zero length of 1421 payload to indicate an error to the vhost-user front-end. The vhost-user 1422 front-end may cache the contents to avoid repeated 1423 ``VHOST_USER_GET_CONFIG`` calls. 1424 1425``VHOST_USER_SET_CONFIG`` 1426 :id: 25 1427 :equivalent ioctl: N/A 1428 :request payload: virtio device config space 1429 :reply payload: N/A 1430 1431 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is 1432 submitted by the vhost-user front-end when the Guest changes the virtio 1433 device configuration space and also can be used for live migration 1434 on the destination host. The vhost-user back-end must check the flags 1435 field, and back-ends MUST NOT accept SET_CONFIG for read-only 1436 configuration space fields unless the live migration bit is set. 1437 1438``VHOST_USER_CREATE_CRYPTO_SESSION`` 1439 :id: 26 1440 :equivalent ioctl: N/A 1441 :request payload: crypto session description 1442 :reply payload: crypto session description 1443 1444 Create a session for crypto operation. The back-end must return 1445 the session id, 0 or positive for success, negative for failure. 1446 This request should be sent only when 1447 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1448 successfully negotiated. It's a required feature for crypto 1449 devices. 1450 1451``VHOST_USER_CLOSE_CRYPTO_SESSION`` 1452 :id: 27 1453 :equivalent ioctl: N/A 1454 :request payload: ``u64`` 1455 :reply payload: N/A 1456 1457 Close a session for crypto operation which was previously 1458 created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. 1459 1460 This request should be sent only when 1461 ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been 1462 successfully negotiated. It's a required feature for crypto 1463 devices. 1464 1465``VHOST_USER_POSTCOPY_ADVISE`` 1466 :id: 28 1467 :request payload: N/A 1468 :reply payload: userfault fd 1469 1470 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end 1471 advises back-end that a migration with postcopy enabled is underway, 1472 the back-end must open a userfaultfd for later use. Note that at this 1473 stage the migration is still in precopy mode. 1474 1475``VHOST_USER_POSTCOPY_LISTEN`` 1476 :id: 29 1477 :request payload: N/A 1478 :reply payload: N/A 1479 1480 The front-end advises back-end that a transition to postcopy mode has 1481 happened. The back-end must ensure that shared memory is registered 1482 with userfaultfd to cause faulting of non-present pages. 1483 1484 This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, 1485 and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. 1486 1487``VHOST_USER_POSTCOPY_END`` 1488 :id: 30 1489 :request payload: N/A 1490 :reply payload: ``u64`` 1491 1492 The front-end advises that postcopy migration has now completed. The back-end 1493 must disable the userfaultfd. The reply is an acknowledgement 1494 only. 1495 1496 When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message 1497 is sent at the end of the migration, after 1498 ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. 1499 1500 The value returned is an error indication; 0 is success. 1501 1502``VHOST_USER_GET_INFLIGHT_FD`` 1503 :id: 31 1504 :equivalent ioctl: N/A 1505 :request payload: inflight description 1506 :reply payload: N/A 1507 1508 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1509 been successfully negotiated, this message is submitted by the front-end to 1510 get a shared buffer from back-end. The shared buffer will be used to 1511 track inflight I/O by back-end. QEMU should retrieve a new one when vm 1512 reset. 1513 1514``VHOST_USER_SET_INFLIGHT_FD`` 1515 :id: 32 1516 :equivalent ioctl: N/A 1517 :request payload: inflight description 1518 :reply payload: N/A 1519 1520 When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has 1521 been successfully negotiated, this message is submitted by the front-end to 1522 send the shared inflight buffer back to the back-end so that the back-end 1523 could get inflight I/O after a crash or restart. 1524 1525``VHOST_USER_GPU_SET_SOCKET`` 1526 :id: 33 1527 :equivalent ioctl: N/A 1528 :request payload: N/A 1529 :reply payload: N/A 1530 1531 Sets the GPU protocol socket file descriptor, which is passed as 1532 ancillary data. The GPU protocol is used to inform the front-end of 1533 rendering state and updates. See vhost-user-gpu.rst for details. 1534 1535``VHOST_USER_RESET_DEVICE`` 1536 :id: 34 1537 :equivalent ioctl: N/A 1538 :request payload: N/A 1539 :reply payload: N/A 1540 1541 Ask the vhost user back-end to disable all rings and reset all 1542 internal device state to the initial state, ready to be 1543 reinitialized. The back-end retains ownership of the device 1544 throughout the reset operation. 1545 1546 Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol 1547 feature is set by the back-end. 1548 1549``VHOST_USER_VRING_KICK`` 1550 :id: 35 1551 :equivalent ioctl: N/A 1552 :request payload: vring state description 1553 :reply payload: N/A 1554 1555 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1556 feature has been successfully negotiated, this message may be 1557 submitted by the front-end to indicate that a buffer was added to 1558 the vring instead of signalling it using the vring's kick file 1559 descriptor or having the back-end rely on polling. 1560 1561 The state.num field is currently reserved and must be set to 0. 1562 1563``VHOST_USER_GET_MAX_MEM_SLOTS`` 1564 :id: 36 1565 :equivalent ioctl: N/A 1566 :request payload: N/A 1567 :reply payload: u64 1568 1569 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1570 feature has been successfully negotiated, this message is submitted 1571 by the front-end to the back-end. The back-end should return the message with a 1572 u64 payload containing the maximum number of memory slots for 1573 QEMU to expose to the guest. The value returned by the back-end 1574 will be capped at the maximum number of ram slots which can be 1575 supported by the target platform. 1576 1577``VHOST_USER_ADD_MEM_REG`` 1578 :id: 37 1579 :equivalent ioctl: N/A 1580 :request payload: N/A 1581 :reply payload: single memory region description 1582 1583 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1584 feature has been successfully negotiated, this message is submitted 1585 by the front-end to the back-end. The message payload contains a memory 1586 region descriptor struct, describing a region of guest memory which 1587 the back-end device must map in. When the 1588 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1589 been successfully negotiated, along with the 1590 ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and 1591 update the memory tables of the back-end device. 1592 1593 Exactly one file descriptor from which the memory is mapped is 1594 passed in the ancillary data. 1595 1596 In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end 1597 replies with the bases of the memory mapped region to the front-end. 1598 For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. 1599 They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. 1600 1601``VHOST_USER_REM_MEM_REG`` 1602 :id: 38 1603 :equivalent ioctl: N/A 1604 :request payload: N/A 1605 :reply payload: single memory region description 1606 1607 When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol 1608 feature has been successfully negotiated, this message is submitted 1609 by the front-end to the back-end. The message payload contains a memory 1610 region descriptor struct, describing a region of guest memory which 1611 the back-end device must unmap. When the 1612 ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has 1613 been successfully negotiated, along with the 1614 ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and 1615 update the memory tables of the back-end device. 1616 1617 The memory region to be removed is identified by its guest address, 1618 user address and size. The mmap offset is ignored. 1619 1620 No file descriptors SHOULD be passed in the ancillary data. For 1621 compatibility with existing incorrect implementations, the back-end MAY 1622 accept messages with one file descriptor. If a file descriptor is 1623 passed, the back-end MUST close it without using it otherwise. 1624 1625``VHOST_USER_SET_STATUS`` 1626 :id: 39 1627 :equivalent ioctl: VHOST_VDPA_SET_STATUS 1628 :request payload: ``u64`` 1629 :reply payload: N/A 1630 1631 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1632 successfully negotiated, this message is submitted by the front-end to 1633 notify the back-end with updated device status as defined in the Virtio 1634 specification. 1635 1636``VHOST_USER_GET_STATUS`` 1637 :id: 40 1638 :equivalent ioctl: VHOST_VDPA_GET_STATUS 1639 :request payload: N/A 1640 :reply payload: ``u64`` 1641 1642 When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been 1643 successfully negotiated, this message is submitted by the front-end to 1644 query the back-end for its device status as defined in the Virtio 1645 specification. 1646 1647``VHOST_USER_GET_SHARED_OBJECT`` 1648 :id: 41 1649 :equivalent ioctl: N/A 1650 :request payload: ``struct VhostUserShared`` 1651 :reply payload: dmabuf fd 1652 1653 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1654 feature has been successfully negotiated, and the UUID is found 1655 in the exporters cache, this message is submitted by the front-end 1656 to retrieve a given dma-buf fd from a given back-end, determined by 1657 the requested UUID. Back-end will reply passing the fd when the operation 1658 is successful, or no fd otherwise. 1659 1660``VHOST_USER_SET_DEVICE_STATE_FD`` 1661 :id: 42 1662 :equivalent ioctl: N/A 1663 :request payload: device state transfer parameters 1664 :reply payload: ``u64`` 1665 1666 Front-end and back-end negotiate a channel over which to transfer the 1667 back-end's internal state during migration. Either side (front-end or 1668 back-end) may create the channel. The nature of this channel is not 1669 restricted or defined in this document, but whichever side creates it 1670 must create a file descriptor that is provided to the respectively 1671 other side, allowing access to the channel. This FD must behave as 1672 follows: 1673 1674 * For the writing end, it must allow writing the whole back-end state 1675 sequentially. Closing the file descriptor signals the end of 1676 transfer. 1677 1678 * For the reading end, it must allow reading the whole back-end state 1679 sequentially. The end of file signals the end of the transfer. 1680 1681 For example, the channel may be a pipe, in which case the two ends of 1682 the pipe fulfill these requirements respectively. 1683 1684 Initially, the front-end creates a channel along with such an FD. It 1685 passes the FD to the back-end as ancillary data of a 1686 ``VHOST_USER_SET_DEVICE_STATE_FD`` message. The back-end may create a 1687 different transfer channel, passing the respective FD back to the 1688 front-end as ancillary data of the reply. If so, the front-end must 1689 then discard its channel and use the one provided by the back-end. 1690 1691 Whether the back-end should decide to use its own channel is decided 1692 based on efficiency: If the channel is a pipe, both ends will most 1693 likely need to copy data into and out of it. Any channel that allows 1694 for more efficient processing on at least one end, e.g. through 1695 zero-copy, is considered more efficient and thus preferred. If the 1696 back-end can provide such a channel, it should decide to use it. 1697 1698 The request payload contains parameters for the subsequent data 1699 transfer, as described in the :ref:`Migrating back-end state 1700 <migrating_backend_state>` section. 1701 1702 The value returned is both an indication for success, and whether a 1703 file descriptor for a back-end-provided channel is returned: Bits 0–7 1704 are 0 on success, and non-zero on error. Bit 8 is the invalid FD 1705 flag; this flag is set when there is no file descriptor returned. 1706 When this flag is not set, the front-end must use the returned file 1707 descriptor as its end of the transfer channel. The back-end must not 1708 both indicate an error and return a file descriptor. 1709 1710 Using this function requires prior negotiation of the 1711 ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. 1712 1713``VHOST_USER_CHECK_DEVICE_STATE`` 1714 :id: 43 1715 :equivalent ioctl: N/A 1716 :request payload: N/A 1717 :reply payload: ``u64`` 1718 1719 After transferring the back-end's internal state during migration (see 1720 the :ref:`Migrating back-end state <migrating_backend_state>` 1721 section), check whether the back-end was able to successfully fully 1722 process the state. 1723 1724 The value returned indicates success or error; 0 is success, any 1725 non-zero value is an error. 1726 1727 Using this function requires prior negotiation of the 1728 ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. 1729 1730Back-end message types 1731---------------------- 1732 1733For this type of message, the request is sent by the back-end and the reply 1734is sent by the front-end. 1735 1736``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``) 1737 :id: 1 1738 :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) 1739 :request payload: ``struct vhost_iotlb_msg`` 1740 :reply payload: N/A 1741 1742 Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. 1743 The back-end sends such requests to notify of an IOTLB miss, or an IOTLB 1744 access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is 1745 negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end 1746 must respond with zero when operation is successfully completed, or 1747 non-zero otherwise. This request should be send only when 1748 ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully 1749 negotiated. 1750 1751``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``) 1752 :id: 2 1753 :equivalent ioctl: N/A 1754 :request payload: N/A 1755 :reply payload: N/A 1756 1757 When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user 1758 back-end sends such messages to notify that the virtio device's 1759 configuration space has changed, for those host devices which can 1760 support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` 1761 message to the back-end to get the latest content. If 1762 ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the 1763 ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when 1764 operation is successfully completed, or non-zero otherwise. 1765 1766``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``) 1767 :id: 3 1768 :equivalent ioctl: N/A 1769 :request payload: vring area description 1770 :reply payload: N/A 1771 1772 Sets host notifier for a specified queue. The queue index is 1773 contained in the ``u64`` field of the vring area description. The 1774 host notifier is described by the file descriptor (typically it's a 1775 VFIO device fd) which is passed as ancillary data and the size 1776 (which is mmap size and should be the same as host page size) and 1777 offset (which is mmap offset) carried in the vring area 1778 description. QEMU can mmap the file descriptor based on the size and 1779 offset to get a memory range. Registering a host notifier means 1780 mapping this memory range to the VM as the specified queue's notify 1781 MMIO region. The back-end sends this request to tell QEMU to de-register 1782 the existing notifier if any and register the new notifier if the 1783 request is sent with a file descriptor. 1784 1785 This request should be sent only when 1786 ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been 1787 successfully negotiated. 1788 1789``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``) 1790 :id: 4 1791 :equivalent ioctl: N/A 1792 :request payload: vring state description 1793 :reply payload: N/A 1794 1795 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1796 feature has been successfully negotiated, this message may be 1797 submitted by the back-end to indicate that a buffer was used from 1798 the vring instead of signalling this using the vring's call file 1799 descriptor or having the front-end relying on polling. 1800 1801 The state.num field is currently reserved and must be set to 0. 1802 1803``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``) 1804 :id: 5 1805 :equivalent ioctl: N/A 1806 :request payload: vring state description 1807 :reply payload: N/A 1808 1809 When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol 1810 feature has been successfully negotiated, this message may be 1811 submitted by the back-end to indicate that an error occurred on the 1812 specific vring, instead of signalling the error file descriptor 1813 set by the front-end via ``VHOST_USER_SET_VRING_ERR``. 1814 1815 The state.num field is currently reserved and must be set to 0. 1816 1817``VHOST_USER_BACKEND_SHARED_OBJECT_ADD`` 1818 :id: 6 1819 :equivalent ioctl: N/A 1820 :request payload: ``struct VhostUserShared`` 1821 :reply payload: N/A 1822 1823 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1824 feature has been successfully negotiated, this message can be submitted 1825 by the backends to add themselves as exporters to the virtio shared lookup 1826 table. The back-end device gets associated with a UUID in the shared table. 1827 The back-end is responsible of keeping its own table with exported dma-buf fds. 1828 When another back-end tries to import the resource associated with the UUID, 1829 it will send a message to the front-end, which will act as a proxy to the 1830 exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and 1831 the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must 1832 respond with zero when operation is successfully completed, or non-zero 1833 otherwise. 1834 1835``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE`` 1836 :id: 7 1837 :equivalent ioctl: N/A 1838 :request payload: ``struct VhostUserShared`` 1839 :reply payload: N/A 1840 1841 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1842 feature has been successfully negotiated, this message can be submitted 1843 by the backend to remove themselves from to the virtio-dmabuf shared 1844 table API. Only the back-end owning the entry (i.e., the one that first added 1845 it) will have permission to remove it. Otherwise, the message is ignored. 1846 The shared table will remove the back-end device associated with 1847 the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the 1848 back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond 1849 with zero when operation is successfully completed, or non-zero otherwise. 1850 1851``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP`` 1852 :id: 8 1853 :equivalent ioctl: N/A 1854 :request payload: ``struct VhostUserShared`` 1855 :reply payload: dmabuf fd and ``u64`` 1856 1857 When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol 1858 feature has been successfully negotiated, this message can be submitted 1859 by the backends to retrieve a given dma-buf fd from the virtio-dmabuf 1860 shared table given a UUID. Frontend will reply passing the fd and a zero 1861 when the operation is successful, or non-zero otherwise. Note that if the 1862 operation fails, no fd is sent to the backend. 1863 1864.. _reply_ack: 1865 1866VHOST_USER_PROTOCOL_F_REPLY_ACK 1867------------------------------- 1868 1869The original vhost-user specification only demands replies for certain 1870commands. This differs from the vhost protocol implementation where 1871commands are sent over an ``ioctl()`` call and block until the back-end 1872has completed. 1873 1874With this protocol extension negotiated, the sender (QEMU) can set the 1875``need_reply`` [Bit 3] flag to any command. This indicates that the 1876back-end MUST respond with a Payload ``VhostUserMsg`` indicating success 1877or failure. The payload should be set to zero on success or non-zero 1878on failure, unless the message already has an explicit reply body. 1879 1880The reply payload gives QEMU a deterministic indication of the result 1881of the command. Today, QEMU is expected to terminate the main vhost-user 1882loop upon receiving such errors. In future, qemu could be taught to be more 1883resilient for selective requests. 1884 1885For the message types that already solicit a reply from the back-end, 1886the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit 1887being set brings no behavioural change. (See the Communication_ 1888section for details.) 1889 1890.. _backend_conventions: 1891 1892Backend program conventions 1893=========================== 1894 1895vhost-user back-ends can provide various devices & services and may 1896need to be configured manually depending on the use case. However, it 1897is a good idea to follow the conventions listed here when 1898possible. Users, QEMU or libvirt, can then rely on some common 1899behaviour to avoid heterogeneous configuration and management of the 1900back-end programs and facilitate interoperability. 1901 1902Each back-end installed on a host system should come with at least one 1903JSON file that conforms to the vhost-user.json schema. Each file 1904informs the management applications about the back-end type, and binary 1905location. In addition, it defines rules for management apps for 1906picking the highest priority back-end when multiple match the search 1907criteria (see ``@VhostUserBackend`` documentation in the schema file). 1908 1909If the back-end is not capable of enabling a requested feature on the 1910host (such as 3D acceleration with virgl), or the initialization 1911failed, the back-end should fail to start early and exit with a status 1912!= 0. It may also print a message to stderr for further details. 1913 1914The back-end program must not daemonize itself, but it may be 1915daemonized by the management layer. It may also have a restricted 1916access to the system. 1917 1918File descriptors 0, 1 and 2 will exist, and have regular 1919stdin/stdout/stderr usage (they may have been redirected to /dev/null 1920by the management layer, or to a log handler). 1921 1922The back-end program must end (as quickly and cleanly as possible) when 1923the SIGTERM signal is received. Eventually, it may receive SIGKILL by 1924the management layer after a few seconds. 1925 1926The following command line options have an expected behaviour. They 1927are mandatory, unless explicitly said differently: 1928 1929--socket-path=PATH 1930 1931 This option specify the location of the vhost-user Unix domain socket. 1932 It is incompatible with --fd. 1933 1934--fd=FDNUM 1935 1936 When this argument is given, the back-end program is started with the 1937 vhost-user socket as file descriptor FDNUM. It is incompatible with 1938 --socket-path. 1939 1940--print-capabilities 1941 1942 Output to stdout the back-end capabilities in JSON format, and then 1943 exit successfully. Other options and arguments should be ignored, and 1944 the back-end program should not perform its normal function. The 1945 capabilities can be reported dynamically depending on the host 1946 capabilities. 1947 1948The JSON output is described in the ``vhost-user.json`` schema, by 1949```@VHostUserBackendCapabilities``. Example: 1950 1951.. code:: json 1952 1953 { 1954 "type": "foo", 1955 "features": [ 1956 "feature-a", 1957 "feature-b" 1958 ] 1959 } 1960 1961vhost-user-input 1962---------------- 1963 1964Command line options: 1965 1966--evdev-path=PATH 1967 1968 Specify the linux input device. 1969 1970 (optional) 1971 1972--no-grab 1973 1974 Do no request exclusive access to the input device. 1975 1976 (optional) 1977 1978vhost-user-gpu 1979-------------- 1980 1981Command line options: 1982 1983--render-node=PATH 1984 1985 Specify the GPU DRM render node. 1986 1987 (optional) 1988 1989--virgl 1990 1991 Enable virgl rendering support. 1992 1993 (optional) 1994 1995vhost-user-blk 1996-------------- 1997 1998Command line options: 1999 2000--blk-file=PATH 2001 2002 Specify block device or file path. 2003 2004 (optional) 2005 2006--read-only 2007 2008 Enable read-only. 2009 2010 (optional) 2011