1XIVE for sPAPR (pseries machines) 2================================= 3 4The POWER9 processor comes with a new interrupt controller 5architecture, called XIVE as "eXternal Interrupt Virtualization 6Engine". It supports a larger number of interrupt sources and offers 7virtualization features which enables the HW to deliver interrupts 8directly to virtual processors without hypervisor assistance. 9 10A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9 11processors can run under two interrupt modes: 12 13- *Legacy Compatibility Mode* 14 15 the hypervisor provides identical interfaces and similar 16 functionality to PAPR+ Version 2.7. This is the default mode 17 18 It is also referred as *XICS* in QEMU. 19 20- *XIVE native exploitation mode* 21 22 the hypervisor provides new interfaces to manage the XIVE control 23 structures, and provides direct control for interrupt management 24 through MMIO pages. 25 26Which interrupt modes can be used by the machine is negotiated with 27the guest O/S during the Client Architecture Support negotiation 28sequence. The two modes are mutually exclusive. 29 30Both interrupt mode share the same IRQ number space. See below for the 31layout. 32 33CAS Negotiation 34--------------- 35 36QEMU advertises the supported interrupt modes in the device tree 37property "ibm,arch-vec-5-platform-support" in byte 23 and the OS 38Selection for XIVE is indicated in the "ibm,architecture-vec-5" 39property byte 23. 40 41The interrupt modes supported by the machine depend on the CPU type 42(POWER9 is required for XIVE) but also on the machine property 43``ic-mode`` which can be set on the command line. It can take the 44following values: ``xics``, ``xive``, ``dual`` and currently ``xics`` 45is the default but it may change in the future. 46 47The choosen interrupt mode is activated after a reconfiguration done 48in a machine reset. 49 50XIVE Device tree properties 51--------------------------- 52 53The properties for the PAPR interrupt controller node when the *XIVE 54native exploitation mode* is selected shoud contain: 55 56- ``device_type`` 57 58 value should be "power-ivpe". 59 60- ``compatible`` 61 62 value should be "ibm,power-ivpe". 63 64- ``reg`` 65 66 contains the base address and size of the thread interrupt 67 managnement areas (TIMA), for the User level and for the Guest OS 68 level. Only the Guest OS level is taken into account today. 69 70- ``ibm,xive-eq-sizes`` 71 72 the size of the event queues. One cell per size supported, contains 73 log2 of size, in ascending order. 74 75- ``ibm,xive-lisn-ranges`` 76 77 the IRQ interrupt number ranges assigned to the guest for the IPIs. 78 79The root node also exports : 80 81- ``ibm,plat-res-int-priorities`` 82 83 contains a list of priorities that the hypervisor has reserved for 84 its own use. 85 86IRQ number space 87---------------- 88 89IRQ Number space of the ``pseries`` machine is 8K wide and is the same 90for both interrupt mode. The different ranges are defined as follow : 91 92- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE) 93- ``0x1000 .. 0x1000`` 1 EPOW 94- ``0x1001 .. 0x1001`` 1 HOTPLUG 95- ``0x1100 .. 0x11FF`` 256 VIO devices 96- ``0x1200 .. 0x127F`` 32 PHBs devices 97- ``0x1280 .. 0x12FF`` unused 98- ``0x1300 .. 0x1FFF`` PHB MSIs 99 100Monitoring XIVE 101--------------- 102 103The state of the XIVE interrupt controller can be queried through the 104monitor commands ``info pic``. The output comes in two parts. 105 106First, the state of the thread interrupt context registers is dumped 107for each CPU : 108 109:: 110 111 (qemu) info pic 112 CPU[0000]: QW NSR CPPR IPB LSMFB ACK# INC AGE PIPR W2 113 CPU[0000]: USER 00 00 00 00 00 00 00 00 00000000 114 CPU[0000]: OS 00 ff 00 00 ff 00 ff ff 80000400 115 CPU[0000]: POOL 00 00 00 00 00 00 00 00 00000000 116 CPU[0000]: PHYS 00 00 00 00 00 00 00 ff 00000000 117 ... 118 119In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only 120the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM 121line which is set to the VP identifier. 122 123Then comes the routing information which aggregates the EAS and the 124END configuration: 125 126:: 127 128 ... 129 LISN PQ EISN CPU/PRIO EQ 130 00000000 MSI -- 00000010 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 131 00000001 MSI -- 00000010 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 132 00000002 MSI -- 00000010 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 133 00000003 MSI -- 00000010 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 134 00000004 MSI -Q M 00000000 135 00000005 MSI -Q M 00000000 136 00000006 MSI -Q M 00000000 137 00000007 MSI -Q M 00000000 138 00001000 MSI -- 00000012 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 139 00001001 MSI -- 00000013 0/6 380/16384 @1fe3e0000 ^1 [ 80000010 ... ] 140 00001100 MSI -- 00000100 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 141 00001101 MSI -Q M 00000000 142 00001200 LSI -Q M 00000000 143 00001201 LSI -Q M 00000000 144 00001202 LSI -Q M 00000000 145 00001203 LSI -Q M 00000000 146 00001300 MSI -- 00000102 1/6 305/16384 @1fc230000 ^1 [ 80000010 ... ] 147 00001301 MSI -- 00000103 2/6 220/16384 @1fc2f0000 ^1 [ 80000010 ... ] 148 00001302 MSI -- 00000104 3/6 201/16384 @1fc390000 ^1 [ 80000010 ... ] 149 150The source information and configuration: 151 152- The ``LISN`` column outputs the interrupt number of the source in 153 range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI`` 154- The ``PQ`` column reflects the state of the PQ bits of the source : 155 156 - ``--`` source is ready to take events 157 - ``P-`` an event was sent and an EOI is PENDING 158 - ``PQ`` an event was QUEUED 159 - ``-Q`` source is OFF 160 161 a ``M`` indicates that source is *MASKED* at the EAS level, 162 163The targeting configuration : 164 165- The ``EISN`` column is the event data that will be queued in the event 166 queue of the O/S. 167- The ``CPU/PRIO`` column is the tuple defining the CPU number and 168 priority queue serving the source. 169- The ``EQ`` column outputs : 170 171 - the current index of the event queue/ the max number of entries 172 - the O/S event queue address 173 - the toggle bit 174 - the last entries that were pushed in the event queue. 175