1XIVE for sPAPR (pseries machines) 2================================= 3 4The POWER9 processor comes with a new interrupt controller 5architecture, called XIVE as "eXternal Interrupt Virtualization 6Engine". It supports a larger number of interrupt sources and offers 7virtualization features which enables the HW to deliver interrupts 8directly to virtual processors without hypervisor assistance. 9 10A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 11processors can run under two interrupt modes: 12 13- *Legacy Compatibility Mode* 14 15 the hypervisor provides identical interfaces and similar 16 functionality to PAPR+ Version 2.7. This is the default mode 17 18 It is also referred as *XICS* in QEMU. 19 20- *XIVE native exploitation mode* 21 22 the hypervisor provides new interfaces to manage the XIVE control 23 structures, and provides direct control for interrupt management 24 through MMIO pages. 25 26Which interrupt modes can be used by the machine is negotiated with 27the guest O/S during the Client Architecture Support negotiation 28sequence. The two modes are mutually exclusive. 29 30Both interrupt mode share the same IRQ number space. See below for the 31layout. 32 33CAS Negotiation 34--------------- 35 36QEMU advertises the supported interrupt modes in the device tree 37property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS 38Selection for XIVE is indicated in the ``ibm,architecture-vec-5`` 39property byte 23. 40 41The interrupt modes supported by the machine depend on the CPU type 42(POWER9 is required for XIVE) but also on the machine property 43``ic-mode`` which can be set on the command line. It can take the 44following values: ``xics``, ``xive``, and ``dual`` which is the 45default mode. ``dual`` means that both modes XICS **and** XIVE are 46supported and if the guest OS supports XIVE, this mode will be 47selected. 48 49The choosen interrupt mode is activated after a reconfiguration done 50in a machine reset. 51 52KVM negotiation 53--------------- 54 55When the guest starts under KVM, the capabilities of the host kernel 56and QEMU are also negotiated. Depending on the version of the host 57kernel, KVM will advertise the XIVE capability to QEMU or not. 58 59Nevertheless, the available interrupt modes in the machine should not 60depend on the XIVE KVM capability of the host. On older kernels 61without XIVE KVM support, QEMU will use the emulated XIVE device as a 62fallback and on newer kernels (>=5.2), the KVM XIVE device. 63 64As a final refinement, the user can also switch the use of the KVM 65device with the machine option ``kernel_irqchip``. 66 67 68XIVE support in KVM 69~~~~~~~~~~~~~~~~~~~ 70 71For guest OSes supporting XIVE, the resulting interrupt modes on host 72kernels with XIVE KVM support are the following: 73 74============== ============= ============= ================ 75ic-mode kernel_irqchip 76-------------- ---------------------------------------------- 77/ allowed off on 78 (default) 79============== ============= ============= ================ 80dual (default) XIVE KVM XIVE emul. XIVE KVM 81xive XIVE KVM XIVE emul. XIVE KVM 82xics XICS KVM XICS emul. XICS KVM 83============== ============= ============= ================ 84 85For legacy guest OSes without XIVE support, the resulting interrupt 86modes are the following: 87 88============== ============= ============= ================ 89ic-mode kernel_irqchip 90-------------- ---------------------------------------------- 91/ allowed off on 92 (default) 93============== ============= ============= ================ 94dual (default) XICS KVM XICS emul. XICS KVM 95xive QEMU error(3) QEMU error(3) QEMU error(3) 96xics XICS KVM XICS emul. XICS KVM 97============== ============= ============= ================ 98 99(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 100 mode (XICS), either don't set the ic-mode machine property or try 101 ic-mode=xics or ic-mode=dual`` 102 103 104No XIVE support in KVM 105~~~~~~~~~~~~~~~~~~~~~~ 106 107For guest OSes supporting XIVE, the resulting interrupt modes on host 108kernels without XIVE KVM support are the following: 109 110============== ============= ============= ================ 111ic-mode kernel_irqchip 112-------------- ---------------------------------------------- 113/ allowed off on 114 (default) 115============== ============= ============= ================ 116dual (default) XIVE emul.(1) XIVE emul. QEMU error (2) 117xive XIVE emul.(1) XIVE emul. QEMU error (2) 118xics XICS KVM XICS emul. XICS KVM 119============== ============= ============= ================ 120 121 122(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable: 123 IRQ_XIVE capability must be present for KVM`` 124(2) QEMU fails with ``kernel_irqchip requested but unavailable: 125 IRQ_XIVE capability must be present for KVM`` 126 127 128For legacy guest OSes without XIVE support, the resulting interrupt 129modes are the following: 130 131============== ============= ============= ================ 132ic-mode kernel_irqchip 133-------------- ---------------------------------------------- 134/ allowed off on 135 (default) 136============== ============= ============= ================ 137dual (default) QEMU error(4) XICS emul. QEMU error(4) 138xive QEMU error(3) QEMU error(3) QEMU error(3) 139xics XICS KVM XICS emul. XICS KVM 140============== ============= ============= ================ 141 142(3) QEMU fails at CAS with ``Guest requested unavailable interrupt 143 mode (XICS), either don't set the ic-mode machine property or try 144 ic-mode=xics or ic-mode=dual`` 145(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails 146 with ``KVM is too old to support ic-mode=dual,kernel-irqchip=on`` 147 148 149XIVE Device tree properties 150--------------------------- 151 152The properties for the PAPR interrupt controller node when the *XIVE 153native exploitation mode* is selected shoud contain: 154 155- ``device_type`` 156 157 value should be "power-ivpe". 158 159- ``compatible`` 160 161 value should be "ibm,power-ivpe". 162 163- ``reg`` 164 165 contains the base address and size of the thread interrupt 166 managnement areas (TIMA), for the User level and for the Guest OS 167 level. Only the Guest OS level is taken into account today. 168 169- ``ibm,xive-eq-sizes`` 170 171 the size of the event queues. One cell per size supported, contains 172 log2 of size, in ascending order. 173 174- ``ibm,xive-lisn-ranges`` 175 176 the IRQ interrupt number ranges assigned to the guest for the IPIs. 177 178The root node also exports : 179 180- ``ibm,plat-res-int-priorities`` 181 182 contains a list of priorities that the hypervisor has reserved for 183 its own use. 184 185IRQ number space 186---------------- 187 188IRQ Number space of the ``pseries`` machine is 8K wide and is the same 189for both interrupt mode. The different ranges are defined as follow : 190 191- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) 192- ``0x1000 .. 0x1000`` 1 EPOW 193- ``0x1001 .. 0x1001`` 1 HOTPLUG 194- ``0x1002 .. 0x10FF`` unused 195- ``0x1100 .. 0x11FF`` 256 VIO devices 196- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices 197- ``0x1280 .. 0x12FF`` unused 198- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated) 199 200Monitoring XIVE 201--------------- 202 203The state of the XIVE interrupt controller can be queried through the 204monitor commands ``info pic``. The output comes in two parts. 205 206First, the state of the thread interrupt context registers is dumped 207for each CPU : 208 209:: 210 211 (qemu) info pic 212 CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 213 CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 214 CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 215 CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 216 CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 217 ... 218 219In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only 220the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM 221line which is set to the VP identifier. 222 223Then comes the routing information which aggregates the EAS and the 224END configuration: 225 226:: 227 228 ... 229 LISN PQ EISN CPU/PRIO EQ 230 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 231 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 232 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 233 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 234 00000004 MSI -Q M 00000000 235 00000005 MSI -Q M 00000000 236 00000006 MSI -Q M 00000000 237 00000007 MSI -Q M 00000000 238 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 239 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 240 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 241 00001101 MSI -Q M 00000000 242 00001200 LSI -Q M 00000000 243 00001201 LSI -Q M 00000000 244 00001202 LSI -Q M 00000000 245 00001203 LSI -Q M 00000000 246 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 247 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 248 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 249 250The source information and configuration: 251 252- The ``LISN`` column outputs the interrupt number of the source in 253 range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` 254- The ``PQ`` column reflects the state of the PQ bits of the source : 255 256 - ``--`` source is ready to take events 257 - ``P-`` an event was sent and an EOI is PENDING 258 - ``PQ`` an event was QUEUED 259 - ``-Q`` source is OFF 260 261 a ``M`` indicates that source is *MASKED* at the EAS level, 262 263The targeting configuration : 264 265- The ``EISN`` column is the event data that will be queued in the event 266 queue of the O/S. 267- The ``CPU/PRIO`` column is the tuple defining the CPU number and 268 priority queue serving the source. 269- The ``EQ`` column outputs : 270 271 - the current index of the event queue/ the max number of entries 272 - the O/S event queue address 273 - the toggle bit 274 - the last entries that were pushed in the event queue. 275