1====================================================== 2Device Specification for Inter-VM shared memory device 3====================================================== 4 5The Inter-VM shared memory device (ivshmem) is designed to share a 6memory region between multiple QEMU processes running different guests 7and the host. In order for all guests to be able to pick up the 8shared memory area, it is modeled by QEMU as a PCI device exposing 9said memory to the guest as a PCI BAR. 10 11The device can use a shared memory object on the host directly, or it 12can obtain one from an ivshmem server. 13 14In the latter case, the device can additionally interrupt its peers, and 15get interrupted by its peers. 16 17For information on configuring the ivshmem device on the QEMU 18command line, see :doc:`../system/devices/ivshmem`. 19 20The ivshmem PCI device's guest interface 21======================================== 22 23The device has vendor ID 1af4, device ID 1110, revision 1. Before 24QEMU 2.6.0, it had revision 0. 25 26PCI BARs 27-------- 28 29The ivshmem PCI device has two or three BARs: 30 31- BAR0 holds device registers (256 Byte MMIO) 32- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) 33- BAR2 maps the shared memory object 34 35There are two ways to use this device: 36 37- If you only need the shared memory part, BAR2 suffices. This way, 38 you have access to the shared memory in the guest and can use it as 39 you see fit. 40 41- If you additionally need the capability for peers to interrupt each 42 other, you need BAR0 and BAR1. You will most likely want to write a 43 kernel driver to handle interrupts. Requires the device to be 44 configured for interrupts, obviously. 45 46Before QEMU 2.6.0, BAR2 can initially be invalid if the device is 47configured for interrupts. It becomes safely accessible only after 48the ivshmem server provided the shared memory. These devices have PCI 49revision 0 rather than 1. Guest software should wait for the 50IVPosition register (described below) to become non-negative before 51accessing BAR2. 52 53Revision 0 of the device is not capable to tell guest software whether 54it is configured for interrupts. 55 56PCI device registers 57-------------------- 58 59BAR 0 contains the following registers: 60 61:: 62 63 Offset Size Access On reset Function 64 0 4 read/write 0 Interrupt Mask 65 bit 0: peer interrupt (rev 0) 66 reserved (rev 1) 67 bit 1..31: reserved 68 4 4 read/write 0 Interrupt Status 69 bit 0: peer interrupt (rev 0) 70 reserved (rev 1) 71 bit 1..31: reserved 72 8 4 read-only 0 or ID IVPosition 73 12 4 write-only N/A Doorbell 74 bit 0..15: vector 75 bit 16..31: peer ID 76 16 240 none N/A reserved 77 78Software should only access the registers as specified in column 79"Access". Reserved bits should be ignored on read, and preserved on 80write. 81 82In revision 0 of the device, Interrupt Status and Mask Register 83together control the legacy INTx interrupt when the device has no 84MSI-X capability: INTx is asserted when the bit-wise AND of Status and 85Mask is non-zero and the device has no MSI-X capability. Interrupt 86Status Register bit 0 becomes 1 when an interrupt request from a peer 87is received. Reading the register clears it. 88 89IVPosition Register: if the device is not configured for interrupts, 90this is zero. Else, it is the device's ID (between 0 and 65535). 91 92Before QEMU 2.6.0, the register may read -1 for a short while after 93reset. These devices have PCI revision 0 rather than 1. 94 95There is no good way for software to find out whether the device is 96configured for interrupts. A positive IVPosition means interrupts, 97but zero could be either. 98 99Doorbell Register: writing this register requests to interrupt a peer. 100The written value's high 16 bits are the ID of the peer to interrupt, 101and its low 16 bits select an interrupt vector. 102 103If the device is not configured for interrupts, the write is ignored. 104 105If the interrupt hasn't completed setup, the write is ignored. The 106device is not capable to tell guest software whether setup is 107complete. Interrupts can regress to this state on migration. 108 109If the peer with the requested ID isn't connected, or it has fewer 110interrupt vectors connected, the write is ignored. The device is not 111capable to tell guest software what peers are connected, or how many 112interrupt vectors are connected. 113 114The peer's interrupt for this vector then becomes pending. There is 115no way for software to clear the pending bit, and a polling mode of 116operation is therefore impossible. 117 118If the peer is a revision 0 device without MSI-X capability, its 119Interrupt Status register is set to 1. This asserts INTx unless 120masked by the Interrupt Mask register. The device is not capable to 121communicate the interrupt vector to guest software then. 122 123With multiple MSI-X vectors, different vectors can be used to indicate 124different events have occurred. The semantics of interrupt vectors 125are left to the application. 126 127Interrupt infrastructure 128======================== 129 130When configured for interrupts, the peers share eventfd objects in 131addition to shared memory. The shared resources are managed by an 132ivshmem server. 133 134The ivshmem server 135------------------ 136 137The server listens on a UNIX domain socket. 138 139For each new client that connects to the server, the server 140 141- picks an ID, 142- creates eventfd file descriptors for the interrupt vectors, 143- sends the ID and the file descriptor for the shared memory to the 144 new client, 145- sends connect notifications for the new client to the other clients 146 (these contain file descriptors for sending interrupts), 147- sends connect notifications for the other clients to the new client, 148 and 149- sends interrupt setup messages to the new client (these contain file 150 descriptors for receiving interrupts). 151 152The first client to connect to the server receives ID zero. 153 154When a client disconnects from the server, the server sends disconnect 155notifications to the other clients. 156 157The next section describes the protocol in detail. 158 159If the server terminates without sending disconnect notifications for 160its connected clients, the clients can elect to continue. They can 161communicate with each other normally, but won't receive disconnect 162notification on disconnect, and no new clients can connect. There is 163no way for the clients to connect to a restarted server. The device 164is not capable to tell guest software whether the server is still up. 165 166Example server code is in contrib/ivshmem-server/. Not to be used in 167production. It assumes all clients use the same number of interrupt 168vectors. 169 170A standalone client is in contrib/ivshmem-client/. It can be useful 171for debugging. 172 173The ivshmem Client-Server Protocol 174---------------------------------- 175 176An ivshmem device configured for interrupts connects to an ivshmem 177server. This section details the protocol between the two. 178 179The connection is one-way: the server sends messages to the client. 180Each message consists of a single 8 byte little-endian signed number, 181and may be accompanied by a file descriptor via SCM_RIGHTS. Both 182client and server close the connection on error. 183 184Note: QEMU currently doesn't close the connection right on error, but 185only when the character device is destroyed. 186 187On connect, the server sends the following messages in order: 188 1891. The protocol version number, currently zero. The client should 190 close the connection on receipt of versions it can't handle. 191 1922. The client's ID. This is unique among all clients of this server. 193 IDs must be between 0 and 65535, because the Doorbell register 194 provides only 16 bits for them. 195 1963. The number -1, accompanied by the file descriptor for the shared 197 memory. 198 1994. Connect notifications for existing other clients, if any. This is 200 a peer ID (number between 0 and 65535 other than the client's ID), 201 repeated N times. Each repetition is accompanied by one file 202 descriptor. These are for interrupting the peer with that ID using 203 vector 0,..,N-1, in order. If the client is configured for fewer 204 vectors, it closes the extra file descriptors. If it is configured 205 for more, the extra vectors remain unconnected. 206 2075. Interrupt setup. This is the client's own ID, repeated N times. 208 Each repetition is accompanied by one file descriptor. These are 209 for receiving interrupts from peers using vector 0,..,N-1, in 210 order. If the client is configured for fewer vectors, it closes 211 the extra file descriptors. If it is configured for more, the 212 extra vectors remain unconnected. 213 214From then on, the server sends these kinds of messages: 215 2166. Connection / disconnection notification. This is a peer ID. 217 218 - If the number comes with a file descriptor, it's a connection 219 notification, exactly like in step 4. 220 221 - Else, it's a disconnection notification for the peer with that ID. 222 223Known bugs: 224 225* The protocol changed incompatibly in QEMU 2.5. Before, messages 226 were native endian long, and there was no version number. 227 228* The protocol is poorly designed. 229 230The ivshmem Client-Client Protocol 231---------------------------------- 232 233An ivshmem device configured for interrupts receives eventfd file 234descriptors for interrupting peers and getting interrupted by peers 235from the server, as explained in the previous section. 236 237To interrupt a peer, the device writes the 8-byte integer 1 in native 238byte order to the respective file descriptor. 239 240To receive an interrupt, the device reads and discards as many 8-byte 241integers as it can. 242