1d3b52e49SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2d3b52e49SMauro Carvalho Chehab 3d3b52e49SMauro Carvalho Chehab=========================================================== 4d3b52e49SMauro Carvalho ChehabPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 5d3b52e49SMauro Carvalho Chehab=========================================================== 6d3b52e49SMauro Carvalho Chehab 7d3b52e49SMauro Carvalho ChehabDevice types supported: 8d3b52e49SMauro Carvalho Chehab - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 9d3b52e49SMauro Carvalho Chehab 10d3b52e49SMauro Carvalho ChehabThis device acts as a VM interrupt controller. It provides the KVM 11d3b52e49SMauro Carvalho Chehabinterface to configure the interrupt sources of a VM in the underlying 12d3b52e49SMauro Carvalho ChehabPOWER9 XIVE interrupt controller. 13d3b52e49SMauro Carvalho Chehab 14d3b52e49SMauro Carvalho ChehabOnly one XIVE instance may be instantiated. A guest XIVE device 15d3b52e49SMauro Carvalho Chehabrequires a POWER9 host and the guest OS should have support for the 16d3b52e49SMauro Carvalho ChehabXIVE native exploitation interrupt mode. If not, it should run using 17d3b52e49SMauro Carvalho Chehabthe legacy interrupt mode, referred as XICS (POWER7/8). 18d3b52e49SMauro Carvalho Chehab 19d3b52e49SMauro Carvalho Chehab* Device Mappings 20d3b52e49SMauro Carvalho Chehab 21d3b52e49SMauro Carvalho Chehab The KVM device exposes different MMIO ranges of the XIVE HW which 22d3b52e49SMauro Carvalho Chehab are required for interrupt management. These are exposed to the 23d3b52e49SMauro Carvalho Chehab guest in VMAs populated with a custom VM fault handler. 24d3b52e49SMauro Carvalho Chehab 25d3b52e49SMauro Carvalho Chehab 1. Thread Interrupt Management Area (TIMA) 26d3b52e49SMauro Carvalho Chehab 27d3b52e49SMauro Carvalho Chehab Each thread has an associated Thread Interrupt Management context 28d3b52e49SMauro Carvalho Chehab composed of a set of registers. These registers let the thread 29d3b52e49SMauro Carvalho Chehab handle priority management and interrupt acknowledgment. The most 30d3b52e49SMauro Carvalho Chehab important are : 31d3b52e49SMauro Carvalho Chehab 32d3b52e49SMauro Carvalho Chehab - Interrupt Pending Buffer (IPB) 33d3b52e49SMauro Carvalho Chehab - Current Processor Priority (CPPR) 34d3b52e49SMauro Carvalho Chehab - Notification Source Register (NSR) 35d3b52e49SMauro Carvalho Chehab 36d3b52e49SMauro Carvalho Chehab They are exposed to software in four different pages each proposing 37d3b52e49SMauro Carvalho Chehab a view with a different privilege. The first page is for the 38d3b52e49SMauro Carvalho Chehab physical thread context and the second for the hypervisor. Only the 39d3b52e49SMauro Carvalho Chehab third (operating system) and the fourth (user level) are exposed the 40d3b52e49SMauro Carvalho Chehab guest. 41d3b52e49SMauro Carvalho Chehab 42d3b52e49SMauro Carvalho Chehab 2. Event State Buffer (ESB) 43d3b52e49SMauro Carvalho Chehab 44d3b52e49SMauro Carvalho Chehab Each source is associated with an Event State Buffer (ESB) with 45d3b52e49SMauro Carvalho Chehab either a pair of even/odd pair of pages which provides commands to 46d3b52e49SMauro Carvalho Chehab manage the source: to trigger, to EOI, to turn off the source for 47d3b52e49SMauro Carvalho Chehab instance. 48d3b52e49SMauro Carvalho Chehab 49d3b52e49SMauro Carvalho Chehab 3. Device pass-through 50d3b52e49SMauro Carvalho Chehab 51d3b52e49SMauro Carvalho Chehab When a device is passed-through into the guest, the source 52d3b52e49SMauro Carvalho Chehab interrupts are from a different HW controller (PHB4) and the ESB 53*d56b699dSBjorn Helgaas pages exposed to the guest should accommodate this change. 54d3b52e49SMauro Carvalho Chehab 55d3b52e49SMauro Carvalho Chehab The passthru_irq helpers, kvmppc_xive_set_mapped() and 56d3b52e49SMauro Carvalho Chehab kvmppc_xive_clr_mapped() are called when the device HW irqs are 57d3b52e49SMauro Carvalho Chehab mapped into or unmapped from the guest IRQ number space. The KVM 58d3b52e49SMauro Carvalho Chehab device extends these helpers to clear the ESB pages of the guest IRQ 59d3b52e49SMauro Carvalho Chehab number being mapped and then lets the VM fault handler repopulate. 60d3b52e49SMauro Carvalho Chehab The handler will insert the ESB page corresponding to the HW 61d3b52e49SMauro Carvalho Chehab interrupt of the device being passed-through or the initial IPI ESB 62d3b52e49SMauro Carvalho Chehab page if the device has being removed. 63d3b52e49SMauro Carvalho Chehab 64d3b52e49SMauro Carvalho Chehab The ESB remapping is fully transparent to the guest and the OS 65d3b52e49SMauro Carvalho Chehab device driver. All handling is done within VFIO and the above 66d3b52e49SMauro Carvalho Chehab helpers in KVM-PPC. 67d3b52e49SMauro Carvalho Chehab 68d3b52e49SMauro Carvalho Chehab* Groups: 69d3b52e49SMauro Carvalho Chehab 70d3b52e49SMauro Carvalho Chehab1. KVM_DEV_XIVE_GRP_CTRL 71d3b52e49SMauro Carvalho Chehab Provides global controls on the device 72d3b52e49SMauro Carvalho Chehab 73d3b52e49SMauro Carvalho Chehab Attributes: 74d3b52e49SMauro Carvalho Chehab 1.1 KVM_DEV_XIVE_RESET (write only) 75d3b52e49SMauro Carvalho Chehab Resets the interrupt controller configuration for sources and event 76d3b52e49SMauro Carvalho Chehab queues. To be used by kexec and kdump. 77d3b52e49SMauro Carvalho Chehab 78d3b52e49SMauro Carvalho Chehab Errors: none 79d3b52e49SMauro Carvalho Chehab 80d3b52e49SMauro Carvalho Chehab 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 81d3b52e49SMauro Carvalho Chehab Sync all the sources and queues and mark the EQ pages dirty. This 82d3b52e49SMauro Carvalho Chehab to make sure that a consistent memory state is captured when 83d3b52e49SMauro Carvalho Chehab migrating the VM. 84d3b52e49SMauro Carvalho Chehab 85d3b52e49SMauro Carvalho Chehab Errors: none 86d3b52e49SMauro Carvalho Chehab 87d3b52e49SMauro Carvalho Chehab 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 88d3b52e49SMauro Carvalho Chehab The kvm_device_attr.addr points to a __u32 value which is the number of 89d3b52e49SMauro Carvalho Chehab interrupt server numbers (ie, highest possible vcpu id plus one). 90d3b52e49SMauro Carvalho Chehab 91d3b52e49SMauro Carvalho Chehab Errors: 92d3b52e49SMauro Carvalho Chehab 93d3b52e49SMauro Carvalho Chehab ======= ========================================== 94a1c42ddeSJuergen Gross -EINVAL Value greater than KVM_MAX_VCPU_IDS. 95d3b52e49SMauro Carvalho Chehab -EFAULT Invalid user pointer for attr->addr. 96d3b52e49SMauro Carvalho Chehab -EBUSY A vCPU is already connected to the device. 97d3b52e49SMauro Carvalho Chehab ======= ========================================== 98d3b52e49SMauro Carvalho Chehab 99d3b52e49SMauro Carvalho Chehab2. KVM_DEV_XIVE_GRP_SOURCE (write only) 100d3b52e49SMauro Carvalho Chehab Initializes a new source in the XIVE device and mask it. 101d3b52e49SMauro Carvalho Chehab 102d3b52e49SMauro Carvalho Chehab Attributes: 103d3b52e49SMauro Carvalho Chehab Interrupt source number (64-bit) 104d3b52e49SMauro Carvalho Chehab 105d3b52e49SMauro Carvalho Chehab The kvm_device_attr.addr points to a __u64 value:: 106d3b52e49SMauro Carvalho Chehab 107d3b52e49SMauro Carvalho Chehab bits: | 63 .... 2 | 1 | 0 108d3b52e49SMauro Carvalho Chehab values: | unused | level | type 109d3b52e49SMauro Carvalho Chehab 110d3b52e49SMauro Carvalho Chehab - type: 0:MSI 1:LSI 111d3b52e49SMauro Carvalho Chehab - level: assertion level in case of an LSI. 112d3b52e49SMauro Carvalho Chehab 113d3b52e49SMauro Carvalho Chehab Errors: 114d3b52e49SMauro Carvalho Chehab 115d3b52e49SMauro Carvalho Chehab ======= ========================================== 116d3b52e49SMauro Carvalho Chehab -E2BIG Interrupt source number is out of range 117d3b52e49SMauro Carvalho Chehab -ENOMEM Could not create a new source block 118d3b52e49SMauro Carvalho Chehab -EFAULT Invalid user pointer for attr->addr. 119d3b52e49SMauro Carvalho Chehab -ENXIO Could not allocate underlying HW interrupt 120d3b52e49SMauro Carvalho Chehab ======= ========================================== 121d3b52e49SMauro Carvalho Chehab 122d3b52e49SMauro Carvalho Chehab3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 123d3b52e49SMauro Carvalho Chehab Configures source targeting 124d3b52e49SMauro Carvalho Chehab 125d3b52e49SMauro Carvalho Chehab Attributes: 126d3b52e49SMauro Carvalho Chehab Interrupt source number (64-bit) 127d3b52e49SMauro Carvalho Chehab 128d3b52e49SMauro Carvalho Chehab The kvm_device_attr.addr points to a __u64 value:: 129d3b52e49SMauro Carvalho Chehab 130d3b52e49SMauro Carvalho Chehab bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 131d3b52e49SMauro Carvalho Chehab values: | eisn | mask | server | priority 132d3b52e49SMauro Carvalho Chehab 133d3b52e49SMauro Carvalho Chehab - priority: 0-7 interrupt priority level 134d3b52e49SMauro Carvalho Chehab - server: CPU number chosen to handle the interrupt 135d3b52e49SMauro Carvalho Chehab - mask: mask flag (unused) 136d3b52e49SMauro Carvalho Chehab - eisn: Effective Interrupt Source Number 137d3b52e49SMauro Carvalho Chehab 138d3b52e49SMauro Carvalho Chehab Errors: 139d3b52e49SMauro Carvalho Chehab 140d3b52e49SMauro Carvalho Chehab ======= ======================================================= 141d3b52e49SMauro Carvalho Chehab -ENOENT Unknown source number 142d3b52e49SMauro Carvalho Chehab -EINVAL Not initialized source number 143d3b52e49SMauro Carvalho Chehab -EINVAL Invalid priority 144d3b52e49SMauro Carvalho Chehab -EINVAL Invalid CPU number. 145d3b52e49SMauro Carvalho Chehab -EFAULT Invalid user pointer for attr->addr. 146d3b52e49SMauro Carvalho Chehab -ENXIO CPU event queues not configured or configuration of the 147d3b52e49SMauro Carvalho Chehab underlying HW interrupt failed 148d3b52e49SMauro Carvalho Chehab -EBUSY No CPU available to serve interrupt 149d3b52e49SMauro Carvalho Chehab ======= ======================================================= 150d3b52e49SMauro Carvalho Chehab 151d3b52e49SMauro Carvalho Chehab4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 152d3b52e49SMauro Carvalho Chehab Configures an event queue of a CPU 153d3b52e49SMauro Carvalho Chehab 154d3b52e49SMauro Carvalho Chehab Attributes: 155d3b52e49SMauro Carvalho Chehab EQ descriptor identifier (64-bit) 156d3b52e49SMauro Carvalho Chehab 157d3b52e49SMauro Carvalho Chehab The EQ descriptor identifier is a tuple (server, priority):: 158d3b52e49SMauro Carvalho Chehab 159d3b52e49SMauro Carvalho Chehab bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 160d3b52e49SMauro Carvalho Chehab values: | unused | server | priority 161d3b52e49SMauro Carvalho Chehab 162d3b52e49SMauro Carvalho Chehab The kvm_device_attr.addr points to:: 163d3b52e49SMauro Carvalho Chehab 164d3b52e49SMauro Carvalho Chehab struct kvm_ppc_xive_eq { 165d3b52e49SMauro Carvalho Chehab __u32 flags; 166d3b52e49SMauro Carvalho Chehab __u32 qshift; 167d3b52e49SMauro Carvalho Chehab __u64 qaddr; 168d3b52e49SMauro Carvalho Chehab __u32 qtoggle; 169d3b52e49SMauro Carvalho Chehab __u32 qindex; 170d3b52e49SMauro Carvalho Chehab __u8 pad[40]; 171d3b52e49SMauro Carvalho Chehab }; 172d3b52e49SMauro Carvalho Chehab 173d3b52e49SMauro Carvalho Chehab - flags: queue flags 174d3b52e49SMauro Carvalho Chehab KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 175d3b52e49SMauro Carvalho Chehab forces notification without using the coalescing mechanism 176d3b52e49SMauro Carvalho Chehab provided by the XIVE END ESBs. 177d3b52e49SMauro Carvalho Chehab - qshift: queue size (power of 2) 178d3b52e49SMauro Carvalho Chehab - qaddr: real address of queue 179d3b52e49SMauro Carvalho Chehab - qtoggle: current queue toggle bit 180d3b52e49SMauro Carvalho Chehab - qindex: current queue index 181d3b52e49SMauro Carvalho Chehab - pad: reserved for future use 182d3b52e49SMauro Carvalho Chehab 183d3b52e49SMauro Carvalho Chehab Errors: 184d3b52e49SMauro Carvalho Chehab 185d3b52e49SMauro Carvalho Chehab ======= ========================================= 186d3b52e49SMauro Carvalho Chehab -ENOENT Invalid CPU number 187d3b52e49SMauro Carvalho Chehab -EINVAL Invalid priority 188d3b52e49SMauro Carvalho Chehab -EINVAL Invalid flags 189d3b52e49SMauro Carvalho Chehab -EINVAL Invalid queue size 190d3b52e49SMauro Carvalho Chehab -EINVAL Invalid queue address 191d3b52e49SMauro Carvalho Chehab -EFAULT Invalid user pointer for attr->addr. 192d3b52e49SMauro Carvalho Chehab -EIO Configuration of the underlying HW failed 193d3b52e49SMauro Carvalho Chehab ======= ========================================= 194d3b52e49SMauro Carvalho Chehab 195d3b52e49SMauro Carvalho Chehab5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 196d3b52e49SMauro Carvalho Chehab Synchronize the source to flush event notifications 197d3b52e49SMauro Carvalho Chehab 198d3b52e49SMauro Carvalho Chehab Attributes: 199d3b52e49SMauro Carvalho Chehab Interrupt source number (64-bit) 200d3b52e49SMauro Carvalho Chehab 201d3b52e49SMauro Carvalho Chehab Errors: 202d3b52e49SMauro Carvalho Chehab 203d3b52e49SMauro Carvalho Chehab ======= ============================= 204d3b52e49SMauro Carvalho Chehab -ENOENT Unknown source number 205d3b52e49SMauro Carvalho Chehab -EINVAL Not initialized source number 206d3b52e49SMauro Carvalho Chehab ======= ============================= 207d3b52e49SMauro Carvalho Chehab 208d3b52e49SMauro Carvalho Chehab* VCPU state 209d3b52e49SMauro Carvalho Chehab 210d3b52e49SMauro Carvalho Chehab The XIVE IC maintains VP interrupt state in an internal structure 211d3b52e49SMauro Carvalho Chehab called the NVT. When a VP is not dispatched on a HW processor 212d3b52e49SMauro Carvalho Chehab thread, this structure can be updated by HW if the VP is the target 213d3b52e49SMauro Carvalho Chehab of an event notification. 214d3b52e49SMauro Carvalho Chehab 215d3b52e49SMauro Carvalho Chehab It is important for migration to capture the cached IPB from the NVT 216d3b52e49SMauro Carvalho Chehab as it synthesizes the priorities of the pending interrupts. We 217d3b52e49SMauro Carvalho Chehab capture a bit more to report debug information. 218d3b52e49SMauro Carvalho Chehab 219d3b52e49SMauro Carvalho Chehab KVM_REG_PPC_VP_STATE (2 * 64bits):: 220d3b52e49SMauro Carvalho Chehab 221d3b52e49SMauro Carvalho Chehab bits: | 63 .... 32 | 31 .... 0 | 222d3b52e49SMauro Carvalho Chehab values: | TIMA word0 | TIMA word1 | 223d3b52e49SMauro Carvalho Chehab bits: | 127 .......... 64 | 224d3b52e49SMauro Carvalho Chehab values: | unused | 225d3b52e49SMauro Carvalho Chehab 226d3b52e49SMauro Carvalho Chehab* Migration: 227d3b52e49SMauro Carvalho Chehab 228d3b52e49SMauro Carvalho Chehab Saving the state of a VM using the XIVE native exploitation mode 229d3b52e49SMauro Carvalho Chehab should follow a specific sequence. When the VM is stopped : 230d3b52e49SMauro Carvalho Chehab 231d3b52e49SMauro Carvalho Chehab 1. Mask all sources (PQ=01) to stop the flow of events. 232d3b52e49SMauro Carvalho Chehab 233d3b52e49SMauro Carvalho Chehab 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 234d3b52e49SMauro Carvalho Chehab flush any in-flight event notification and to stabilize the EQs. At 235d3b52e49SMauro Carvalho Chehab this stage, the EQ pages are marked dirty to make sure they are 236d3b52e49SMauro Carvalho Chehab transferred in the migration sequence. 237d3b52e49SMauro Carvalho Chehab 238d3b52e49SMauro Carvalho Chehab 3. Capture the state of the source targeting, the EQs configuration 239d3b52e49SMauro Carvalho Chehab and the state of thread interrupt context registers. 240d3b52e49SMauro Carvalho Chehab 241d3b52e49SMauro Carvalho Chehab Restore is similar: 242d3b52e49SMauro Carvalho Chehab 243d3b52e49SMauro Carvalho Chehab 1. Restore the EQ configuration. As targeting depends on it. 244d3b52e49SMauro Carvalho Chehab 2. Restore targeting 245d3b52e49SMauro Carvalho Chehab 3. Restore the thread interrupt contexts 246d3b52e49SMauro Carvalho Chehab 4. Restore the source states 247d3b52e49SMauro Carvalho Chehab 5. Let the vCPU run 248