1XIVE for sPAPR (pseries machines) 2================================= 3 4The POWER9 processor comes with a new interrupt controller 5architecture, called XIVE as "eXternal Interrupt Virtualization 6Engine". It supports a larger number of interrupt sources and offers 7virtualization features which enables the HW to deliver interrupts 8directly to virtual processors without hypervisor assistance. 9 10A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 11processors can run under two interrupt modes: 12 13- *Legacy Compatibility Mode* 14 15 the hypervisor provides identical interfaces and similar 16 functionality to PAPR+ Version 2.7. This is the default mode 17 18 It is also referred as *XICS* in QEMU. 19 20- *XIVE native exploitation mode* 21 22 the hypervisor provides new interfaces to manage the XIVE control 23 structures, and provides direct control for interrupt management 24 through MMIO pages. 25 26Which interrupt modes can be used by the machine is negotiated with 27the guest O/S during the Client Architecture Support negotiation 28sequence. The two modes are mutually exclusive. 29 30Both interrupt mode share the same IRQ number space. See below for the 31layout. 32 33CAS Negotiation 34--------------- 35 36QEMU advertises the supported interrupt modes in the device tree 37property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS 38Selection for XIVE is indicated in the ``ibm,architecture-vec-5`` 39property byte 23. 40 41The interrupt modes supported by the machine depend on the CPU type 42(POWER9 is required for XIVE) but also on the machine property 43``ic-mode`` which can be set on the command line. It can take the 44following values: ``xics``, ``xive``, and ``dual`` which is the 45default mode. ``dual`` means that both modes XICS **and** XIVE are 46supported and if the guest OS supports XIVE, this mode will be 47selected. 48 49The chosen interrupt mode is activated after a reconfiguration done 50in a machine reset. 51 52KVM negotiation 53--------------- 54 55When the guest starts under KVM, the capabilities of the host kernel 56and QEMU are also negotiated. Depending on the version of the host 57kernel, KVM will advertise the XIVE capability to QEMU or not. 58 59Nevertheless, the available interrupt modes in the machine should not 60depend on the XIVE KVM capability of the host. On older kernels 61without XIVE KVM support, QEMU will use the emulated XIVE device as a 62fallback and on newer kernels (>=5.2), the KVM XIVE device. 63 64XIVE native exploitation mode is not supported for KVM nested guests, 65VMs running under a L1 hypervisor (KVM on pSeries). In that case, the 66hypervisor will not advertise the KVM capability and QEMU will use the 67emulated XIVE device, same as for older versions of KVM. 68 69As a final refinement, the user can also switch the use of the KVM 70device with the machine option ``kernel_irqchip``. 71 72 73XIVE support in KVM 74~~~~~~~~~~~~~~~~~~~ 75 76For guest OSes supporting XIVE, the resulting interrupt modes on host 77kernels with XIVE KVM support are the following: 78 79============== ============= ============= ================ 80ic-mode kernel_irqchip 81-------------- ---------------------------------------------- 82/ allowed off on 83 (default) 84============== ============= ============= ================ 85dual (default) XIVE KVM XIVE emul. XIVE KVM 86xive XIVE KVM XIVE emul. XIVE KVM 87xics XICS KVM XICS emul. XICS KVM 88============== ============= ============= ================ 89 90For legacy guest OSes without XIVE support, the resulting interrupt 91modes are the following: 92 93============== ============= ============= ================ 94ic-mode kernel_irqchip 95-------------- ---------------------------------------------- 96/ allowed off on 97 (default) 98============== ============= ============= ================ 99dual (default) XICS KVM XICS emul. XICS KVM 100xive QEMU error(3) QEMU error(3) QEMU error(3) 101xics XICS KVM XICS emul. XICS KVM 102============== ============= ============= ================ 103 104(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 105 mode (XICS), either don't set the ic-mode machine property or try 106 ic-mode=xics or ic-mode=dual`` 107 108 109No XIVE support in KVM 110~~~~~~~~~~~~~~~~~~~~~~ 111 112For guest OSes supporting XIVE, the resulting interrupt modes on host 113kernels without XIVE KVM support are the following: 114 115============== ============= ============= ================ 116ic-mode kernel_irqchip 117-------------- ---------------------------------------------- 118/ allowed off on 119 (default) 120============== ============= ============= ================ 121dual (default) XIVE emul.(1) XIVE emul. QEMU error (2) 122xive XIVE emul.(1) XIVE emul. QEMU error (2) 123xics XICS KVM XICS emul. XICS KVM 124============== ============= ============= ================ 125 126 127(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable: 128 IRQ_XIVE capability must be present for KVM`` 129 In some cases (old host kernels or KVM nested guests), one may hit a 130 QEMU/KVM incompatibility due to device destruction in reset. QEMU fails 131 with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on`` 132(2) QEMU fails with ``kernel_irqchip requested but unavailable: 133 IRQ_XIVE capability must be present for KVM`` 134 135 136For legacy guest OSes without XIVE support, the resulting interrupt 137modes are the following: 138 139============== ============= ============= ================ 140ic-mode kernel_irqchip 141-------------- ---------------------------------------------- 142/ allowed off on 143 (default) 144============== ============= ============= ================ 145dual (default) QEMU error(4) XICS emul. QEMU error(4) 146xive QEMU error(3) QEMU error(3) QEMU error(3) 147xics XICS KVM XICS emul. XICS KVM 148============== ============= ============= ================ 149 150(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 151 mode (XICS), either don't set the ic-mode machine property or try 152 ic-mode=xics or ic-mode=dual`` 153(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails 154 with ``KVM is incompatible with ic-mode=dual,kernel-irqchip=on`` 155 156 157XIVE Device tree properties 158--------------------------- 159 160The properties for the PAPR interrupt controller node when the *XIVE 161native exploitation mode* is selected should contain: 162 163- ``device_type`` 164 165 value should be "power-ivpe". 166 167- ``compatible`` 168 169 value should be "ibm,power-ivpe". 170 171- ``reg`` 172 173 contains the base address and size of the thread interrupt 174 managnement areas (TIMA), for the User level and for the Guest OS 175 level. Only the Guest OS level is taken into account today. 176 177- ``ibm,xive-eq-sizes`` 178 179 the size of the event queues. One cell per size supported, contains 180 log2 of size, in ascending order. 181 182- ``ibm,xive-lisn-ranges`` 183 184 the IRQ interrupt number ranges assigned to the guest for the IPIs. 185 186The root node also exports : 187 188- ``ibm,plat-res-int-priorities`` 189 190 contains a list of priorities that the hypervisor has reserved for 191 its own use. 192 193IRQ number space 194---------------- 195 196IRQ Number space of the ``pseries`` machine is 8K wide and is the same 197for both interrupt mode. The different ranges are defined as follow : 198 199- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) 200- ``0x1000 .. 0x1000`` 1 EPOW 201- ``0x1001 .. 0x1001`` 1 HOTPLUG 202- ``0x1002 .. 0x10FF`` unused 203- ``0x1100 .. 0x11FF`` 256 VIO devices 204- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices 205- ``0x1280 .. 0x12FF`` unused 206- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated) 207 208Monitoring XIVE 209--------------- 210 211The state of the XIVE interrupt controller can be queried through the 212monitor commands ``info pic``. The output comes in two parts. 213 214First, the state of the thread interrupt context registers is dumped 215for each CPU : 216 217:: 218 219 (qemu) info pic 220 CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 221 CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 222 CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 223 CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 224 CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 225 ... 226 227In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only 228the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM 229line which is set to the VP identifier. 230 231Then comes the routing information which aggregates the EAS and the 232END configuration: 233 234:: 235 236 ... 237 LISN PQ EISN CPU/PRIO EQ 238 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 239 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 240 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 241 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 242 00000004 MSI -Q M 00000000 243 00000005 MSI -Q M 00000000 244 00000006 MSI -Q M 00000000 245 00000007 MSI -Q M 00000000 246 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 247 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 248 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 249 00001101 MSI -Q M 00000000 250 00001200 LSI -Q M 00000000 251 00001201 LSI -Q M 00000000 252 00001202 LSI -Q M 00000000 253 00001203 LSI -Q M 00000000 254 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 255 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 256 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 257 258The source information and configuration: 259 260- The ``LISN`` column outputs the interrupt number of the source in 261 range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` 262- The ``PQ`` column reflects the state of the PQ bits of the source : 263 264 - ``--`` source is ready to take events 265 - ``P-`` an event was sent and an EOI is PENDING 266 - ``PQ`` an event was QUEUED 267 - ``-Q`` source is OFF 268 269 a ``M`` indicates that source is *MASKED* at the EAS level, 270 271The targeting configuration : 272 273- The ``EISN`` column is the event data that will be queued in the event 274 queue of the O/S. 275- The ``CPU/PRIO`` column is the tuple defining the CPU number and 276 priority queue serving the source. 277- The ``EQ`` column outputs : 278 279 - the current index of the event queue/ the max number of entries 280 - the O/S event queue address 281 - the toggle bit 282 - the last entries that were pushed in the event queue. 283