xref: /openbmc/qemu/docs/specs/ppc-spapr-xive.rst (revision c5a5839856119a3644dcc0775a046ed0ee3081c3)
1XIVE for sPAPR (pseries machines)
2=================================
3
4The POWER9 processor comes with a new interrupt controller
5architecture, called XIVE as "eXternal Interrupt Virtualization
6Engine". It supports a larger number of interrupt sources and offers
7virtualization features which enables the HW to deliver interrupts
8directly to virtual processors without hypervisor assistance.
9
10A QEMU ``pseries`` machine (which is PAPR compliant) using POWER9
11processors can run under two interrupt modes:
12
13- *Legacy Compatibility Mode*
14
15  the hypervisor provides identical interfaces and similar
16  functionality to PAPR+ Version 2.7.  This is the default mode
17
18  It is also referred as *XICS* in QEMU.
19
20- *XIVE native exploitation mode*
21
22  the hypervisor provides new interfaces to manage the XIVE control
23  structures, and provides direct control for interrupt management
24  through MMIO pages.
25
26Which interrupt modes can be used by the machine is negotiated with
27the guest O/S during the Client Architecture Support negotiation
28sequence. The two modes are mutually exclusive.
29
30Both interrupt mode share the same IRQ number space. See below for the
31layout.
32
33CAS Negotiation
34---------------
35
36QEMU advertises the supported interrupt modes in the device tree
37property ``ibm,arch-vec-5-platform-support`` in byte 23 and the OS
38Selection for XIVE is indicated in the ``ibm,architecture-vec-5``
39property byte 23.
40
41The interrupt modes supported by the machine depend on the CPU type
42(POWER9 is required for XIVE) but also on the machine property
43``ic-mode`` which can be set on the command line. It can take the
44following values: ``xics``, ``xive``, and ``dual`` which is the
45default mode. ``dual`` means that both modes XICS **and** XIVE are
46supported and if the guest OS supports XIVE, this mode will be
47selected.
48
49The choosen interrupt mode is activated after a reconfiguration done
50in a machine reset.
51
52KVM negotiation
53---------------
54
55When the guest starts under KVM, the capabilities of the host kernel
56and QEMU are also negotiated. Depending on the version of the host
57kernel, KVM will advertise the XIVE capability to QEMU or not.
58
59Nevertheless, the available interrupt modes in the machine should not
60depend on the XIVE KVM capability of the host. On older kernels
61without XIVE KVM support, QEMU will use the emulated XIVE device as a
62fallback and on newer kernels (>=5.2), the KVM XIVE device.
63
64As a final refinement, the user can also switch the use of the KVM
65device with the machine option ``kernel_irqchip``.
66
67
68XIVE support in KVM
69~~~~~~~~~~~~~~~~~~~
70
71For guest OSes supporting XIVE, the resulting interrupt modes on host
72kernels with XIVE KVM support are the following:
73
74==============  =============  =============  ================
75ic-mode                            kernel_irqchip
76--------------  ----------------------------------------------
77/               allowed        off            on
78                (default)
79==============  =============  =============  ================
80dual (default)  XIVE KVM       XIVE emul.     XIVE KVM
81xive            XIVE KVM       XIVE emul.     XIVE KVM
82xics            XICS KVM       XICS emul.     XICS KVM
83==============  =============  =============  ================
84
85For legacy guest OSes without XIVE support, the resulting interrupt
86modes are the following:
87
88==============  =============  =============  ================
89ic-mode                            kernel_irqchip
90--------------  ----------------------------------------------
91/               allowed        off            on
92                (default)
93==============  =============  =============  ================
94dual (default)  XICS KVM       XICS emul.     XICS KVM
95xive            QEMU error(3)  QEMU error(3)  QEMU error(3)
96xics            XICS KVM       XICS emul.     XICS KVM
97==============  =============  =============  ================
98
99(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
100    mode (XICS), either don't set the ic-mode machine property or try
101    ic-mode=xics or ic-mode=dual``
102
103
104No XIVE support in KVM
105~~~~~~~~~~~~~~~~~~~~~~
106
107For guest OSes supporting XIVE, the resulting interrupt modes on host
108kernels without XIVE KVM support are the following:
109
110==============  =============  =============  ================
111ic-mode                            kernel_irqchip
112--------------  ----------------------------------------------
113/               allowed        off            on
114                (default)
115==============  =============  =============  ================
116dual (default)  XIVE emul.(1)  XIVE emul.     QEMU error (2)
117xive            XIVE emul.(1)  XIVE emul.     QEMU error (2)
118xics            XICS KVM       XICS emul.     XICS KVM
119==============  =============  =============  ================
120
121
122(1) QEMU warns with ``warning: kernel_irqchip requested but unavailable:
123    IRQ_XIVE capability must be present for KVM``
124(2) QEMU fails with ``kernel_irqchip requested but unavailable:
125    IRQ_XIVE capability must be present for KVM``
126
127
128For legacy guest OSes without XIVE support, the resulting interrupt
129modes are the following:
130
131==============  =============  =============  ================
132ic-mode                            kernel_irqchip
133--------------  ----------------------------------------------
134/               allowed        off            on
135                (default)
136==============  =============  =============  ================
137dual (default)  QEMU error(4)  XICS emul.     QEMU error(4)
138xive            QEMU error(3)  QEMU error(3)  QEMU error(3)
139xics            XICS KVM       XICS emul.     XICS KVM
140==============  =============  =============  ================
141
142(3) QEMU fails at CAS with ``Guest requested unavailable interrupt
143    mode (XICS), either don't set the ic-mode machine property or try
144    ic-mode=xics or ic-mode=dual``
145(4) QEMU/KVM incompatibility due to device destruction in reset. QEMU fails
146    with ``KVM is too old to support ic-mode=dual,kernel-irqchip=on``
147
148
149XIVE Device tree properties
150---------------------------
151
152The properties for the PAPR interrupt controller node when the *XIVE
153native exploitation mode* is selected shoud contain:
154
155- ``device_type``
156
157  value should be "power-ivpe".
158
159- ``compatible``
160
161  value should be "ibm,power-ivpe".
162
163- ``reg``
164
165  contains the base address and size of the thread interrupt
166  managnement areas (TIMA), for the User level and for the Guest OS
167  level. Only the Guest OS level is taken into account today.
168
169- ``ibm,xive-eq-sizes``
170
171  the size of the event queues. One cell per size supported, contains
172  log2 of size, in ascending order.
173
174- ``ibm,xive-lisn-ranges``
175
176  the IRQ interrupt number ranges assigned to the guest for the IPIs.
177
178The root node also exports :
179
180- ``ibm,plat-res-int-priorities``
181
182  contains a list of priorities that the hypervisor has reserved for
183  its own use.
184
185IRQ number space
186----------------
187
188IRQ Number space of the ``pseries`` machine is 8K wide and is the same
189for both interrupt mode. The different ranges are defined as follow :
190
191- ``0x0000 .. 0x0FFF`` 4K CPU IPIs (only used under XIVE)
192- ``0x1000 .. 0x1000`` 1 EPOW
193- ``0x1001 .. 0x1001`` 1 HOTPLUG
194- ``0x1002 .. 0x10FF`` unused
195- ``0x1100 .. 0x11FF`` 256 VIO devices
196- ``0x1200 .. 0x127F`` 32x4 LSIs for PHB devices
197- ``0x1280 .. 0x12FF`` unused
198- ``0x1300 .. 0x1FFF`` PHB MSIs (dynamically allocated)
199
200Monitoring XIVE
201---------------
202
203The state of the XIVE interrupt controller can be queried through the
204monitor commands ``info pic``. The output comes in two parts.
205
206First, the state of the thread interrupt context registers is dumped
207for each CPU :
208
209::
210
211   (qemu) info pic
212   CPU[0000]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
213   CPU[0000]: USER    00   00  00    00   00  00  00   00  00000000
214   CPU[0000]:   OS    00   ff  00    00   ff  00  ff   ff  80000400
215   CPU[0000]: POOL    00   00  00    00   00  00  00   00  00000000
216   CPU[0000]: PHYS    00   00  00    00   00  00  00   ff  00000000
217   ...
218
219In the case of a ``pseries`` machine, QEMU acts as the hypervisor and only
220the O/S and USER register rings make sense. ``W2`` contains the vCPU CAM
221line which is set to the VP identifier.
222
223Then comes the routing information which aggregates the EAS and the
224END configuration:
225
226::
227
228   ...
229   LISN         PQ    EISN     CPU/PRIO EQ
230   00000000 MSI --    00000010   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
231   00000001 MSI --    00000010   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
232   00000002 MSI --    00000010   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
233   00000003 MSI --    00000010   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]
234   00000004 MSI -Q  M 00000000
235   00000005 MSI -Q  M 00000000
236   00000006 MSI -Q  M 00000000
237   00000007 MSI -Q  M 00000000
238   00001000 MSI --    00000012   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
239   00001001 MSI --    00000013   0/6    380/16384 @1fe3e0000 ^1 [ 80000010 ... ]
240   00001100 MSI --    00000100   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
241   00001101 MSI -Q  M 00000000
242   00001200 LSI -Q  M 00000000
243   00001201 LSI -Q  M 00000000
244   00001202 LSI -Q  M 00000000
245   00001203 LSI -Q  M 00000000
246   00001300 MSI --    00000102   1/6    305/16384 @1fc230000 ^1 [ 80000010 ... ]
247   00001301 MSI --    00000103   2/6    220/16384 @1fc2f0000 ^1 [ 80000010 ... ]
248   00001302 MSI --    00000104   3/6    201/16384 @1fc390000 ^1 [ 80000010 ... ]
249
250The source information and configuration:
251
252- The ``LISN`` column outputs the interrupt number of the source in
253  range ``[ 0x0 ... 0x1FFF ]`` and its type : ``MSI`` or ``LSI``
254- The ``PQ`` column reflects the state of the PQ bits of the source :
255
256  - ``--`` source is ready to take events
257  - ``P-`` an event was sent and an EOI is PENDING
258  - ``PQ`` an event was QUEUED
259  - ``-Q`` source is OFF
260
261  a ``M`` indicates that source is *MASKED* at the EAS level,
262
263The targeting configuration :
264
265- The ``EISN`` column is the event data that will be queued in the event
266  queue of the O/S.
267- The ``CPU/PRIO`` column is the tuple defining the CPU number and
268  priority queue serving the source.
269- The ``EQ`` column outputs :
270
271  - the current index of the event queue/ the max number of entries
272  - the O/S event queue address
273  - the toggle bit
274  - the last entries that were pushed in the event queue.
275