1*bb1cff6eSPeter Maydell====================================================== 2*bb1cff6eSPeter MaydellDevice Specification for Inter-VM shared memory device 3*bb1cff6eSPeter Maydell====================================================== 4*bb1cff6eSPeter Maydell 5*bb1cff6eSPeter MaydellThe Inter-VM shared memory device (ivshmem) is designed to share a 6*bb1cff6eSPeter Maydellmemory region between multiple QEMU processes running different guests 7*bb1cff6eSPeter Maydelland the host. In order for all guests to be able to pick up the 8*bb1cff6eSPeter Maydellshared memory area, it is modeled by QEMU as a PCI device exposing 9*bb1cff6eSPeter Maydellsaid memory to the guest as a PCI BAR. 10*bb1cff6eSPeter Maydell 11*bb1cff6eSPeter MaydellThe device can use a shared memory object on the host directly, or it 12*bb1cff6eSPeter Maydellcan obtain one from an ivshmem server. 13*bb1cff6eSPeter Maydell 14*bb1cff6eSPeter MaydellIn the latter case, the device can additionally interrupt its peers, and 15*bb1cff6eSPeter Maydellget interrupted by its peers. 16*bb1cff6eSPeter Maydell 17*bb1cff6eSPeter MaydellFor information on configuring the ivshmem device on the QEMU 18*bb1cff6eSPeter Maydellcommand line, see :doc:`../system/devices/ivshmem`. 19*bb1cff6eSPeter Maydell 20*bb1cff6eSPeter MaydellThe ivshmem PCI device's guest interface 21*bb1cff6eSPeter Maydell======================================== 22*bb1cff6eSPeter Maydell 23*bb1cff6eSPeter MaydellThe device has vendor ID 1af4, device ID 1110, revision 1. Before 24*bb1cff6eSPeter MaydellQEMU 2.6.0, it had revision 0. 25*bb1cff6eSPeter Maydell 26*bb1cff6eSPeter MaydellPCI BARs 27*bb1cff6eSPeter Maydell-------- 28*bb1cff6eSPeter Maydell 29*bb1cff6eSPeter MaydellThe ivshmem PCI device has two or three BARs: 30*bb1cff6eSPeter Maydell 31*bb1cff6eSPeter Maydell- BAR0 holds device registers (256 Byte MMIO) 32*bb1cff6eSPeter Maydell- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) 33*bb1cff6eSPeter Maydell- BAR2 maps the shared memory object 34*bb1cff6eSPeter Maydell 35*bb1cff6eSPeter MaydellThere are two ways to use this device: 36*bb1cff6eSPeter Maydell 37*bb1cff6eSPeter Maydell- If you only need the shared memory part, BAR2 suffices. This way, 38*bb1cff6eSPeter Maydell you have access to the shared memory in the guest and can use it as 39*bb1cff6eSPeter Maydell you see fit. 40*bb1cff6eSPeter Maydell 41*bb1cff6eSPeter Maydell- If you additionally need the capability for peers to interrupt each 42*bb1cff6eSPeter Maydell other, you need BAR0 and BAR1. You will most likely want to write a 43*bb1cff6eSPeter Maydell kernel driver to handle interrupts. Requires the device to be 44*bb1cff6eSPeter Maydell configured for interrupts, obviously. 45*bb1cff6eSPeter Maydell 46*bb1cff6eSPeter MaydellBefore QEMU 2.6.0, BAR2 can initially be invalid if the device is 47*bb1cff6eSPeter Maydellconfigured for interrupts. It becomes safely accessible only after 48*bb1cff6eSPeter Maydellthe ivshmem server provided the shared memory. These devices have PCI 49*bb1cff6eSPeter Maydellrevision 0 rather than 1. Guest software should wait for the 50*bb1cff6eSPeter MaydellIVPosition register (described below) to become non-negative before 51*bb1cff6eSPeter Maydellaccessing BAR2. 52*bb1cff6eSPeter Maydell 53*bb1cff6eSPeter MaydellRevision 0 of the device is not capable to tell guest software whether 54*bb1cff6eSPeter Maydellit is configured for interrupts. 55*bb1cff6eSPeter Maydell 56*bb1cff6eSPeter MaydellPCI device registers 57*bb1cff6eSPeter Maydell-------------------- 58*bb1cff6eSPeter Maydell 59*bb1cff6eSPeter MaydellBAR 0 contains the following registers: 60*bb1cff6eSPeter Maydell 61*bb1cff6eSPeter Maydell:: 62*bb1cff6eSPeter Maydell 63*bb1cff6eSPeter Maydell Offset Size Access On reset Function 64*bb1cff6eSPeter Maydell 0 4 read/write 0 Interrupt Mask 65*bb1cff6eSPeter Maydell bit 0: peer interrupt (rev 0) 66*bb1cff6eSPeter Maydell reserved (rev 1) 67*bb1cff6eSPeter Maydell bit 1..31: reserved 68*bb1cff6eSPeter Maydell 4 4 read/write 0 Interrupt Status 69*bb1cff6eSPeter Maydell bit 0: peer interrupt (rev 0) 70*bb1cff6eSPeter Maydell reserved (rev 1) 71*bb1cff6eSPeter Maydell bit 1..31: reserved 72*bb1cff6eSPeter Maydell 8 4 read-only 0 or ID IVPosition 73*bb1cff6eSPeter Maydell 12 4 write-only N/A Doorbell 74*bb1cff6eSPeter Maydell bit 0..15: vector 75*bb1cff6eSPeter Maydell bit 16..31: peer ID 76*bb1cff6eSPeter Maydell 16 240 none N/A reserved 77*bb1cff6eSPeter Maydell 78*bb1cff6eSPeter MaydellSoftware should only access the registers as specified in column 79*bb1cff6eSPeter Maydell"Access". Reserved bits should be ignored on read, and preserved on 80*bb1cff6eSPeter Maydellwrite. 81*bb1cff6eSPeter Maydell 82*bb1cff6eSPeter MaydellIn revision 0 of the device, Interrupt Status and Mask Register 83*bb1cff6eSPeter Maydelltogether control the legacy INTx interrupt when the device has no 84*bb1cff6eSPeter MaydellMSI-X capability: INTx is asserted when the bit-wise AND of Status and 85*bb1cff6eSPeter MaydellMask is non-zero and the device has no MSI-X capability. Interrupt 86*bb1cff6eSPeter MaydellStatus Register bit 0 becomes 1 when an interrupt request from a peer 87*bb1cff6eSPeter Maydellis received. Reading the register clears it. 88*bb1cff6eSPeter Maydell 89*bb1cff6eSPeter MaydellIVPosition Register: if the device is not configured for interrupts, 90*bb1cff6eSPeter Maydellthis is zero. Else, it is the device's ID (between 0 and 65535). 91*bb1cff6eSPeter Maydell 92*bb1cff6eSPeter MaydellBefore QEMU 2.6.0, the register may read -1 for a short while after 93*bb1cff6eSPeter Maydellreset. These devices have PCI revision 0 rather than 1. 94*bb1cff6eSPeter Maydell 95*bb1cff6eSPeter MaydellThere is no good way for software to find out whether the device is 96*bb1cff6eSPeter Maydellconfigured for interrupts. A positive IVPosition means interrupts, 97*bb1cff6eSPeter Maydellbut zero could be either. 98*bb1cff6eSPeter Maydell 99*bb1cff6eSPeter MaydellDoorbell Register: writing this register requests to interrupt a peer. 100*bb1cff6eSPeter MaydellThe written value's high 16 bits are the ID of the peer to interrupt, 101*bb1cff6eSPeter Maydelland its low 16 bits select an interrupt vector. 102*bb1cff6eSPeter Maydell 103*bb1cff6eSPeter MaydellIf the device is not configured for interrupts, the write is ignored. 104*bb1cff6eSPeter Maydell 105*bb1cff6eSPeter MaydellIf the interrupt hasn't completed setup, the write is ignored. The 106*bb1cff6eSPeter Maydelldevice is not capable to tell guest software whether setup is 107*bb1cff6eSPeter Maydellcomplete. Interrupts can regress to this state on migration. 108*bb1cff6eSPeter Maydell 109*bb1cff6eSPeter MaydellIf the peer with the requested ID isn't connected, or it has fewer 110*bb1cff6eSPeter Maydellinterrupt vectors connected, the write is ignored. The device is not 111*bb1cff6eSPeter Maydellcapable to tell guest software what peers are connected, or how many 112*bb1cff6eSPeter Maydellinterrupt vectors are connected. 113*bb1cff6eSPeter Maydell 114*bb1cff6eSPeter MaydellThe peer's interrupt for this vector then becomes pending. There is 115*bb1cff6eSPeter Maydellno way for software to clear the pending bit, and a polling mode of 116*bb1cff6eSPeter Maydelloperation is therefore impossible. 117*bb1cff6eSPeter Maydell 118*bb1cff6eSPeter MaydellIf the peer is a revision 0 device without MSI-X capability, its 119*bb1cff6eSPeter MaydellInterrupt Status register is set to 1. This asserts INTx unless 120*bb1cff6eSPeter Maydellmasked by the Interrupt Mask register. The device is not capable to 121*bb1cff6eSPeter Maydellcommunicate the interrupt vector to guest software then. 122*bb1cff6eSPeter Maydell 123*bb1cff6eSPeter MaydellWith multiple MSI-X vectors, different vectors can be used to indicate 124*bb1cff6eSPeter Maydelldifferent events have occurred. The semantics of interrupt vectors 125*bb1cff6eSPeter Maydellare left to the application. 126*bb1cff6eSPeter Maydell 127*bb1cff6eSPeter MaydellInterrupt infrastructure 128*bb1cff6eSPeter Maydell======================== 129*bb1cff6eSPeter Maydell 130*bb1cff6eSPeter MaydellWhen configured for interrupts, the peers share eventfd objects in 131*bb1cff6eSPeter Maydelladdition to shared memory. The shared resources are managed by an 132*bb1cff6eSPeter Maydellivshmem server. 133*bb1cff6eSPeter Maydell 134*bb1cff6eSPeter MaydellThe ivshmem server 135*bb1cff6eSPeter Maydell------------------ 136*bb1cff6eSPeter Maydell 137*bb1cff6eSPeter MaydellThe server listens on a UNIX domain socket. 138*bb1cff6eSPeter Maydell 139*bb1cff6eSPeter MaydellFor each new client that connects to the server, the server 140*bb1cff6eSPeter Maydell 141*bb1cff6eSPeter Maydell- picks an ID, 142*bb1cff6eSPeter Maydell- creates eventfd file descriptors for the interrupt vectors, 143*bb1cff6eSPeter Maydell- sends the ID and the file descriptor for the shared memory to the 144*bb1cff6eSPeter Maydell new client, 145*bb1cff6eSPeter Maydell- sends connect notifications for the new client to the other clients 146*bb1cff6eSPeter Maydell (these contain file descriptors for sending interrupts), 147*bb1cff6eSPeter Maydell- sends connect notifications for the other clients to the new client, 148*bb1cff6eSPeter Maydell and 149*bb1cff6eSPeter Maydell- sends interrupt setup messages to the new client (these contain file 150*bb1cff6eSPeter Maydell descriptors for receiving interrupts). 151*bb1cff6eSPeter Maydell 152*bb1cff6eSPeter MaydellThe first client to connect to the server receives ID zero. 153*bb1cff6eSPeter Maydell 154*bb1cff6eSPeter MaydellWhen a client disconnects from the server, the server sends disconnect 155*bb1cff6eSPeter Maydellnotifications to the other clients. 156*bb1cff6eSPeter Maydell 157*bb1cff6eSPeter MaydellThe next section describes the protocol in detail. 158*bb1cff6eSPeter Maydell 159*bb1cff6eSPeter MaydellIf the server terminates without sending disconnect notifications for 160*bb1cff6eSPeter Maydellits connected clients, the clients can elect to continue. They can 161*bb1cff6eSPeter Maydellcommunicate with each other normally, but won't receive disconnect 162*bb1cff6eSPeter Maydellnotification on disconnect, and no new clients can connect. There is 163*bb1cff6eSPeter Maydellno way for the clients to connect to a restarted server. The device 164*bb1cff6eSPeter Maydellis not capable to tell guest software whether the server is still up. 165*bb1cff6eSPeter Maydell 166*bb1cff6eSPeter MaydellExample server code is in contrib/ivshmem-server/. Not to be used in 167*bb1cff6eSPeter Maydellproduction. It assumes all clients use the same number of interrupt 168*bb1cff6eSPeter Maydellvectors. 169*bb1cff6eSPeter Maydell 170*bb1cff6eSPeter MaydellA standalone client is in contrib/ivshmem-client/. It can be useful 171*bb1cff6eSPeter Maydellfor debugging. 172*bb1cff6eSPeter Maydell 173*bb1cff6eSPeter MaydellThe ivshmem Client-Server Protocol 174*bb1cff6eSPeter Maydell---------------------------------- 175*bb1cff6eSPeter Maydell 176*bb1cff6eSPeter MaydellAn ivshmem device configured for interrupts connects to an ivshmem 177*bb1cff6eSPeter Maydellserver. This section details the protocol between the two. 178*bb1cff6eSPeter Maydell 179*bb1cff6eSPeter MaydellThe connection is one-way: the server sends messages to the client. 180*bb1cff6eSPeter MaydellEach message consists of a single 8 byte little-endian signed number, 181*bb1cff6eSPeter Maydelland may be accompanied by a file descriptor via SCM_RIGHTS. Both 182*bb1cff6eSPeter Maydellclient and server close the connection on error. 183*bb1cff6eSPeter Maydell 184*bb1cff6eSPeter MaydellNote: QEMU currently doesn't close the connection right on error, but 185*bb1cff6eSPeter Maydellonly when the character device is destroyed. 186*bb1cff6eSPeter Maydell 187*bb1cff6eSPeter MaydellOn connect, the server sends the following messages in order: 188*bb1cff6eSPeter Maydell 189*bb1cff6eSPeter Maydell1. The protocol version number, currently zero. The client should 190*bb1cff6eSPeter Maydell close the connection on receipt of versions it can't handle. 191*bb1cff6eSPeter Maydell 192*bb1cff6eSPeter Maydell2. The client's ID. This is unique among all clients of this server. 193*bb1cff6eSPeter Maydell IDs must be between 0 and 65535, because the Doorbell register 194*bb1cff6eSPeter Maydell provides only 16 bits for them. 195*bb1cff6eSPeter Maydell 196*bb1cff6eSPeter Maydell3. The number -1, accompanied by the file descriptor for the shared 197*bb1cff6eSPeter Maydell memory. 198*bb1cff6eSPeter Maydell 199*bb1cff6eSPeter Maydell4. Connect notifications for existing other clients, if any. This is 200*bb1cff6eSPeter Maydell a peer ID (number between 0 and 65535 other than the client's ID), 201*bb1cff6eSPeter Maydell repeated N times. Each repetition is accompanied by one file 202*bb1cff6eSPeter Maydell descriptor. These are for interrupting the peer with that ID using 203*bb1cff6eSPeter Maydell vector 0,..,N-1, in order. If the client is configured for fewer 204*bb1cff6eSPeter Maydell vectors, it closes the extra file descriptors. If it is configured 205*bb1cff6eSPeter Maydell for more, the extra vectors remain unconnected. 206*bb1cff6eSPeter Maydell 207*bb1cff6eSPeter Maydell5. Interrupt setup. This is the client's own ID, repeated N times. 208*bb1cff6eSPeter Maydell Each repetition is accompanied by one file descriptor. These are 209*bb1cff6eSPeter Maydell for receiving interrupts from peers using vector 0,..,N-1, in 210*bb1cff6eSPeter Maydell order. If the client is configured for fewer vectors, it closes 211*bb1cff6eSPeter Maydell the extra file descriptors. If it is configured for more, the 212*bb1cff6eSPeter Maydell extra vectors remain unconnected. 213*bb1cff6eSPeter Maydell 214*bb1cff6eSPeter MaydellFrom then on, the server sends these kinds of messages: 215*bb1cff6eSPeter Maydell 216*bb1cff6eSPeter Maydell6. Connection / disconnection notification. This is a peer ID. 217*bb1cff6eSPeter Maydell 218*bb1cff6eSPeter Maydell - If the number comes with a file descriptor, it's a connection 219*bb1cff6eSPeter Maydell notification, exactly like in step 4. 220*bb1cff6eSPeter Maydell 221*bb1cff6eSPeter Maydell - Else, it's a disconnection notification for the peer with that ID. 222*bb1cff6eSPeter Maydell 223*bb1cff6eSPeter MaydellKnown bugs: 224*bb1cff6eSPeter Maydell 225*bb1cff6eSPeter Maydell* The protocol changed incompatibly in QEMU 2.5. Before, messages 226*bb1cff6eSPeter Maydell were native endian long, and there was no version number. 227*bb1cff6eSPeter Maydell 228*bb1cff6eSPeter Maydell* The protocol is poorly designed. 229*bb1cff6eSPeter Maydell 230*bb1cff6eSPeter MaydellThe ivshmem Client-Client Protocol 231*bb1cff6eSPeter Maydell---------------------------------- 232*bb1cff6eSPeter Maydell 233*bb1cff6eSPeter MaydellAn ivshmem device configured for interrupts receives eventfd file 234*bb1cff6eSPeter Maydelldescriptors for interrupting peers and getting interrupted by peers 235*bb1cff6eSPeter Maydellfrom the server, as explained in the previous section. 236*bb1cff6eSPeter Maydell 237*bb1cff6eSPeter MaydellTo interrupt a peer, the device writes the 8-byte integer 1 in native 238*bb1cff6eSPeter Maydellbyte order to the respective file descriptor. 239*bb1cff6eSPeter Maydell 240*bb1cff6eSPeter MaydellTo receive an interrupt, the device reads and discards as many 8-byte 241*bb1cff6eSPeter Maydellintegers as it can. 242