1.. SPDX-License-Identifier: GPL-2.0 2 3=========================================================== 4POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 5=========================================================== 6 7Device types supported: 8 - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 9 10This device acts as a VM interrupt controller. It provides the KVM 11interface to configure the interrupt sources of a VM in the underlying 12POWER9 XIVE interrupt controller. 13 14Only one XIVE instance may be instantiated. A guest XIVE device 15requires a POWER9 host and the guest OS should have support for the 16XIVE native exploitation interrupt mode. If not, it should run using 17the legacy interrupt mode, referred as XICS (POWER7/8). 18 19* Device Mappings 20 21 The KVM device exposes different MMIO ranges of the XIVE HW which 22 are required for interrupt management. These are exposed to the 23 guest in VMAs populated with a custom VM fault handler. 24 25 1. Thread Interrupt Management Area (TIMA) 26 27 Each thread has an associated Thread Interrupt Management context 28 composed of a set of registers. These registers let the thread 29 handle priority management and interrupt acknowledgment. The most 30 important are : 31 32 - Interrupt Pending Buffer (IPB) 33 - Current Processor Priority (CPPR) 34 - Notification Source Register (NSR) 35 36 They are exposed to software in four different pages each proposing 37 a view with a different privilege. The first page is for the 38 physical thread context and the second for the hypervisor. Only the 39 third (operating system) and the fourth (user level) are exposed the 40 guest. 41 42 2. Event State Buffer (ESB) 43 44 Each source is associated with an Event State Buffer (ESB) with 45 either a pair of even/odd pair of pages which provides commands to 46 manage the source: to trigger, to EOI, to turn off the source for 47 instance. 48 49 3. Device pass-through 50 51 When a device is passed-through into the guest, the source 52 interrupts are from a different HW controller (PHB4) and the ESB 53 pages exposed to the guest should accommadate this change. 54 55 The passthru_irq helpers, kvmppc_xive_set_mapped() and 56 kvmppc_xive_clr_mapped() are called when the device HW irqs are 57 mapped into or unmapped from the guest IRQ number space. The KVM 58 device extends these helpers to clear the ESB pages of the guest IRQ 59 number being mapped and then lets the VM fault handler repopulate. 60 The handler will insert the ESB page corresponding to the HW 61 interrupt of the device being passed-through or the initial IPI ESB 62 page if the device has being removed. 63 64 The ESB remapping is fully transparent to the guest and the OS 65 device driver. All handling is done within VFIO and the above 66 helpers in KVM-PPC. 67 68* Groups: 69 701. KVM_DEV_XIVE_GRP_CTRL 71 Provides global controls on the device 72 73 Attributes: 74 1.1 KVM_DEV_XIVE_RESET (write only) 75 Resets the interrupt controller configuration for sources and event 76 queues. To be used by kexec and kdump. 77 78 Errors: none 79 80 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 81 Sync all the sources and queues and mark the EQ pages dirty. This 82 to make sure that a consistent memory state is captured when 83 migrating the VM. 84 85 Errors: none 86 87 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 88 The kvm_device_attr.addr points to a __u32 value which is the number of 89 interrupt server numbers (ie, highest possible vcpu id plus one). 90 91 Errors: 92 93 ======= ========================================== 94 -EINVAL Value greater than KVM_MAX_VCPU_ID. 95 -EFAULT Invalid user pointer for attr->addr. 96 -EBUSY A vCPU is already connected to the device. 97 ======= ========================================== 98 992. KVM_DEV_XIVE_GRP_SOURCE (write only) 100 Initializes a new source in the XIVE device and mask it. 101 102 Attributes: 103 Interrupt source number (64-bit) 104 105 The kvm_device_attr.addr points to a __u64 value:: 106 107 bits: | 63 .... 2 | 1 | 0 108 values: | unused | level | type 109 110 - type: 0:MSI 1:LSI 111 - level: assertion level in case of an LSI. 112 113 Errors: 114 115 ======= ========================================== 116 -E2BIG Interrupt source number is out of range 117 -ENOMEM Could not create a new source block 118 -EFAULT Invalid user pointer for attr->addr. 119 -ENXIO Could not allocate underlying HW interrupt 120 ======= ========================================== 121 1223. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 123 Configures source targeting 124 125 Attributes: 126 Interrupt source number (64-bit) 127 128 The kvm_device_attr.addr points to a __u64 value:: 129 130 bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 131 values: | eisn | mask | server | priority 132 133 - priority: 0-7 interrupt priority level 134 - server: CPU number chosen to handle the interrupt 135 - mask: mask flag (unused) 136 - eisn: Effective Interrupt Source Number 137 138 Errors: 139 140 ======= ======================================================= 141 -ENOENT Unknown source number 142 -EINVAL Not initialized source number 143 -EINVAL Invalid priority 144 -EINVAL Invalid CPU number. 145 -EFAULT Invalid user pointer for attr->addr. 146 -ENXIO CPU event queues not configured or configuration of the 147 underlying HW interrupt failed 148 -EBUSY No CPU available to serve interrupt 149 ======= ======================================================= 150 1514. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 152 Configures an event queue of a CPU 153 154 Attributes: 155 EQ descriptor identifier (64-bit) 156 157 The EQ descriptor identifier is a tuple (server, priority):: 158 159 bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 160 values: | unused | server | priority 161 162 The kvm_device_attr.addr points to:: 163 164 struct kvm_ppc_xive_eq { 165 __u32 flags; 166 __u32 qshift; 167 __u64 qaddr; 168 __u32 qtoggle; 169 __u32 qindex; 170 __u8 pad[40]; 171 }; 172 173 - flags: queue flags 174 KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 175 forces notification without using the coalescing mechanism 176 provided by the XIVE END ESBs. 177 - qshift: queue size (power of 2) 178 - qaddr: real address of queue 179 - qtoggle: current queue toggle bit 180 - qindex: current queue index 181 - pad: reserved for future use 182 183 Errors: 184 185 ======= ========================================= 186 -ENOENT Invalid CPU number 187 -EINVAL Invalid priority 188 -EINVAL Invalid flags 189 -EINVAL Invalid queue size 190 -EINVAL Invalid queue address 191 -EFAULT Invalid user pointer for attr->addr. 192 -EIO Configuration of the underlying HW failed 193 ======= ========================================= 194 1955. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 196 Synchronize the source to flush event notifications 197 198 Attributes: 199 Interrupt source number (64-bit) 200 201 Errors: 202 203 ======= ============================= 204 -ENOENT Unknown source number 205 -EINVAL Not initialized source number 206 ======= ============================= 207 208* VCPU state 209 210 The XIVE IC maintains VP interrupt state in an internal structure 211 called the NVT. When a VP is not dispatched on a HW processor 212 thread, this structure can be updated by HW if the VP is the target 213 of an event notification. 214 215 It is important for migration to capture the cached IPB from the NVT 216 as it synthesizes the priorities of the pending interrupts. We 217 capture a bit more to report debug information. 218 219 KVM_REG_PPC_VP_STATE (2 * 64bits):: 220 221 bits: | 63 .... 32 | 31 .... 0 | 222 values: | TIMA word0 | TIMA word1 | 223 bits: | 127 .......... 64 | 224 values: | unused | 225 226* Migration: 227 228 Saving the state of a VM using the XIVE native exploitation mode 229 should follow a specific sequence. When the VM is stopped : 230 231 1. Mask all sources (PQ=01) to stop the flow of events. 232 233 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 234 flush any in-flight event notification and to stabilize the EQs. At 235 this stage, the EQ pages are marked dirty to make sure they are 236 transferred in the migration sequence. 237 238 3. Capture the state of the source targeting, the EQs configuration 239 and the state of thread interrupt context registers. 240 241 Restore is similar: 242 243 1. Restore the EQ configuration. As targeting depends on it. 244 2. Restore targeting 245 3. Restore the thread interrupt contexts 246 4. Restore the source states 247 5. Let the vCPU run 248