1d3b52e49SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2d3b52e49SMauro Carvalho Chehab
3d3b52e49SMauro Carvalho Chehab===========================================================
4d3b52e49SMauro Carvalho ChehabPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
5d3b52e49SMauro Carvalho Chehab===========================================================
6d3b52e49SMauro Carvalho Chehab
7d3b52e49SMauro Carvalho ChehabDevice types supported:
8d3b52e49SMauro Carvalho Chehab  - KVM_DEV_TYPE_XIVE     POWER9 XIVE Interrupt Controller generation 1
9d3b52e49SMauro Carvalho Chehab
10d3b52e49SMauro Carvalho ChehabThis device acts as a VM interrupt controller. It provides the KVM
11d3b52e49SMauro Carvalho Chehabinterface to configure the interrupt sources of a VM in the underlying
12d3b52e49SMauro Carvalho ChehabPOWER9 XIVE interrupt controller.
13d3b52e49SMauro Carvalho Chehab
14d3b52e49SMauro Carvalho ChehabOnly one XIVE instance may be instantiated. A guest XIVE device
15d3b52e49SMauro Carvalho Chehabrequires a POWER9 host and the guest OS should have support for the
16d3b52e49SMauro Carvalho ChehabXIVE native exploitation interrupt mode. If not, it should run using
17d3b52e49SMauro Carvalho Chehabthe legacy interrupt mode, referred as XICS (POWER7/8).
18d3b52e49SMauro Carvalho Chehab
19d3b52e49SMauro Carvalho Chehab* Device Mappings
20d3b52e49SMauro Carvalho Chehab
21d3b52e49SMauro Carvalho Chehab  The KVM device exposes different MMIO ranges of the XIVE HW which
22d3b52e49SMauro Carvalho Chehab  are required for interrupt management. These are exposed to the
23d3b52e49SMauro Carvalho Chehab  guest in VMAs populated with a custom VM fault handler.
24d3b52e49SMauro Carvalho Chehab
25d3b52e49SMauro Carvalho Chehab  1. Thread Interrupt Management Area (TIMA)
26d3b52e49SMauro Carvalho Chehab
27d3b52e49SMauro Carvalho Chehab  Each thread has an associated Thread Interrupt Management context
28d3b52e49SMauro Carvalho Chehab  composed of a set of registers. These registers let the thread
29d3b52e49SMauro Carvalho Chehab  handle priority management and interrupt acknowledgment. The most
30d3b52e49SMauro Carvalho Chehab  important are :
31d3b52e49SMauro Carvalho Chehab
32d3b52e49SMauro Carvalho Chehab      - Interrupt Pending Buffer     (IPB)
33d3b52e49SMauro Carvalho Chehab      - Current Processor Priority   (CPPR)
34d3b52e49SMauro Carvalho Chehab      - Notification Source Register (NSR)
35d3b52e49SMauro Carvalho Chehab
36d3b52e49SMauro Carvalho Chehab  They are exposed to software in four different pages each proposing
37d3b52e49SMauro Carvalho Chehab  a view with a different privilege. The first page is for the
38d3b52e49SMauro Carvalho Chehab  physical thread context and the second for the hypervisor. Only the
39d3b52e49SMauro Carvalho Chehab  third (operating system) and the fourth (user level) are exposed the
40d3b52e49SMauro Carvalho Chehab  guest.
41d3b52e49SMauro Carvalho Chehab
42d3b52e49SMauro Carvalho Chehab  2. Event State Buffer (ESB)
43d3b52e49SMauro Carvalho Chehab
44d3b52e49SMauro Carvalho Chehab  Each source is associated with an Event State Buffer (ESB) with
45d3b52e49SMauro Carvalho Chehab  either a pair of even/odd pair of pages which provides commands to
46d3b52e49SMauro Carvalho Chehab  manage the source: to trigger, to EOI, to turn off the source for
47d3b52e49SMauro Carvalho Chehab  instance.
48d3b52e49SMauro Carvalho Chehab
49d3b52e49SMauro Carvalho Chehab  3. Device pass-through
50d3b52e49SMauro Carvalho Chehab
51d3b52e49SMauro Carvalho Chehab  When a device is passed-through into the guest, the source
52d3b52e49SMauro Carvalho Chehab  interrupts are from a different HW controller (PHB4) and the ESB
53*d56b699dSBjorn Helgaas  pages exposed to the guest should accommodate this change.
54d3b52e49SMauro Carvalho Chehab
55d3b52e49SMauro Carvalho Chehab  The passthru_irq helpers, kvmppc_xive_set_mapped() and
56d3b52e49SMauro Carvalho Chehab  kvmppc_xive_clr_mapped() are called when the device HW irqs are
57d3b52e49SMauro Carvalho Chehab  mapped into or unmapped from the guest IRQ number space. The KVM
58d3b52e49SMauro Carvalho Chehab  device extends these helpers to clear the ESB pages of the guest IRQ
59d3b52e49SMauro Carvalho Chehab  number being mapped and then lets the VM fault handler repopulate.
60d3b52e49SMauro Carvalho Chehab  The handler will insert the ESB page corresponding to the HW
61d3b52e49SMauro Carvalho Chehab  interrupt of the device being passed-through or the initial IPI ESB
62d3b52e49SMauro Carvalho Chehab  page if the device has being removed.
63d3b52e49SMauro Carvalho Chehab
64d3b52e49SMauro Carvalho Chehab  The ESB remapping is fully transparent to the guest and the OS
65d3b52e49SMauro Carvalho Chehab  device driver. All handling is done within VFIO and the above
66d3b52e49SMauro Carvalho Chehab  helpers in KVM-PPC.
67d3b52e49SMauro Carvalho Chehab
68d3b52e49SMauro Carvalho Chehab* Groups:
69d3b52e49SMauro Carvalho Chehab
70d3b52e49SMauro Carvalho Chehab1. KVM_DEV_XIVE_GRP_CTRL
71d3b52e49SMauro Carvalho Chehab     Provides global controls on the device
72d3b52e49SMauro Carvalho Chehab
73d3b52e49SMauro Carvalho Chehab  Attributes:
74d3b52e49SMauro Carvalho Chehab    1.1 KVM_DEV_XIVE_RESET (write only)
75d3b52e49SMauro Carvalho Chehab    Resets the interrupt controller configuration for sources and event
76d3b52e49SMauro Carvalho Chehab    queues. To be used by kexec and kdump.
77d3b52e49SMauro Carvalho Chehab
78d3b52e49SMauro Carvalho Chehab    Errors: none
79d3b52e49SMauro Carvalho Chehab
80d3b52e49SMauro Carvalho Chehab    1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
81d3b52e49SMauro Carvalho Chehab    Sync all the sources and queues and mark the EQ pages dirty. This
82d3b52e49SMauro Carvalho Chehab    to make sure that a consistent memory state is captured when
83d3b52e49SMauro Carvalho Chehab    migrating the VM.
84d3b52e49SMauro Carvalho Chehab
85d3b52e49SMauro Carvalho Chehab    Errors: none
86d3b52e49SMauro Carvalho Chehab
87d3b52e49SMauro Carvalho Chehab    1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
88d3b52e49SMauro Carvalho Chehab    The kvm_device_attr.addr points to a __u32 value which is the number of
89d3b52e49SMauro Carvalho Chehab    interrupt server numbers (ie, highest possible vcpu id plus one).
90d3b52e49SMauro Carvalho Chehab
91d3b52e49SMauro Carvalho Chehab    Errors:
92d3b52e49SMauro Carvalho Chehab
93d3b52e49SMauro Carvalho Chehab      =======  ==========================================
94a1c42ddeSJuergen Gross      -EINVAL  Value greater than KVM_MAX_VCPU_IDS.
95d3b52e49SMauro Carvalho Chehab      -EFAULT  Invalid user pointer for attr->addr.
96d3b52e49SMauro Carvalho Chehab      -EBUSY   A vCPU is already connected to the device.
97d3b52e49SMauro Carvalho Chehab      =======  ==========================================
98d3b52e49SMauro Carvalho Chehab
99d3b52e49SMauro Carvalho Chehab2. KVM_DEV_XIVE_GRP_SOURCE (write only)
100d3b52e49SMauro Carvalho Chehab     Initializes a new source in the XIVE device and mask it.
101d3b52e49SMauro Carvalho Chehab
102d3b52e49SMauro Carvalho Chehab  Attributes:
103d3b52e49SMauro Carvalho Chehab    Interrupt source number  (64-bit)
104d3b52e49SMauro Carvalho Chehab
105d3b52e49SMauro Carvalho Chehab  The kvm_device_attr.addr points to a __u64 value::
106d3b52e49SMauro Carvalho Chehab
107d3b52e49SMauro Carvalho Chehab    bits:     | 63   ....  2 |   1   |   0
108d3b52e49SMauro Carvalho Chehab    values:   |    unused    | level | type
109d3b52e49SMauro Carvalho Chehab
110d3b52e49SMauro Carvalho Chehab  - type:  0:MSI 1:LSI
111d3b52e49SMauro Carvalho Chehab  - level: assertion level in case of an LSI.
112d3b52e49SMauro Carvalho Chehab
113d3b52e49SMauro Carvalho Chehab  Errors:
114d3b52e49SMauro Carvalho Chehab
115d3b52e49SMauro Carvalho Chehab    =======  ==========================================
116d3b52e49SMauro Carvalho Chehab    -E2BIG   Interrupt source number is out of range
117d3b52e49SMauro Carvalho Chehab    -ENOMEM  Could not create a new source block
118d3b52e49SMauro Carvalho Chehab    -EFAULT  Invalid user pointer for attr->addr.
119d3b52e49SMauro Carvalho Chehab    -ENXIO   Could not allocate underlying HW interrupt
120d3b52e49SMauro Carvalho Chehab    =======  ==========================================
121d3b52e49SMauro Carvalho Chehab
122d3b52e49SMauro Carvalho Chehab3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
123d3b52e49SMauro Carvalho Chehab     Configures source targeting
124d3b52e49SMauro Carvalho Chehab
125d3b52e49SMauro Carvalho Chehab  Attributes:
126d3b52e49SMauro Carvalho Chehab    Interrupt source number  (64-bit)
127d3b52e49SMauro Carvalho Chehab
128d3b52e49SMauro Carvalho Chehab  The kvm_device_attr.addr points to a __u64 value::
129d3b52e49SMauro Carvalho Chehab
130d3b52e49SMauro Carvalho Chehab    bits:     | 63   ....  33 |  32  | 31 .. 3 |  2 .. 0
131d3b52e49SMauro Carvalho Chehab    values:   |    eisn       | mask |  server | priority
132d3b52e49SMauro Carvalho Chehab
133d3b52e49SMauro Carvalho Chehab  - priority: 0-7 interrupt priority level
134d3b52e49SMauro Carvalho Chehab  - server: CPU number chosen to handle the interrupt
135d3b52e49SMauro Carvalho Chehab  - mask: mask flag (unused)
136d3b52e49SMauro Carvalho Chehab  - eisn: Effective Interrupt Source Number
137d3b52e49SMauro Carvalho Chehab
138d3b52e49SMauro Carvalho Chehab  Errors:
139d3b52e49SMauro Carvalho Chehab
140d3b52e49SMauro Carvalho Chehab    =======  =======================================================
141d3b52e49SMauro Carvalho Chehab    -ENOENT  Unknown source number
142d3b52e49SMauro Carvalho Chehab    -EINVAL  Not initialized source number
143d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid priority
144d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid CPU number.
145d3b52e49SMauro Carvalho Chehab    -EFAULT  Invalid user pointer for attr->addr.
146d3b52e49SMauro Carvalho Chehab    -ENXIO   CPU event queues not configured or configuration of the
147d3b52e49SMauro Carvalho Chehab	     underlying HW interrupt failed
148d3b52e49SMauro Carvalho Chehab    -EBUSY   No CPU available to serve interrupt
149d3b52e49SMauro Carvalho Chehab    =======  =======================================================
150d3b52e49SMauro Carvalho Chehab
151d3b52e49SMauro Carvalho Chehab4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
152d3b52e49SMauro Carvalho Chehab     Configures an event queue of a CPU
153d3b52e49SMauro Carvalho Chehab
154d3b52e49SMauro Carvalho Chehab  Attributes:
155d3b52e49SMauro Carvalho Chehab    EQ descriptor identifier (64-bit)
156d3b52e49SMauro Carvalho Chehab
157d3b52e49SMauro Carvalho Chehab  The EQ descriptor identifier is a tuple (server, priority)::
158d3b52e49SMauro Carvalho Chehab
159d3b52e49SMauro Carvalho Chehab    bits:     | 63   ....  32 | 31 .. 3 |  2 .. 0
160d3b52e49SMauro Carvalho Chehab    values:   |    unused     |  server | priority
161d3b52e49SMauro Carvalho Chehab
162d3b52e49SMauro Carvalho Chehab  The kvm_device_attr.addr points to::
163d3b52e49SMauro Carvalho Chehab
164d3b52e49SMauro Carvalho Chehab    struct kvm_ppc_xive_eq {
165d3b52e49SMauro Carvalho Chehab	__u32 flags;
166d3b52e49SMauro Carvalho Chehab	__u32 qshift;
167d3b52e49SMauro Carvalho Chehab	__u64 qaddr;
168d3b52e49SMauro Carvalho Chehab	__u32 qtoggle;
169d3b52e49SMauro Carvalho Chehab	__u32 qindex;
170d3b52e49SMauro Carvalho Chehab	__u8  pad[40];
171d3b52e49SMauro Carvalho Chehab    };
172d3b52e49SMauro Carvalho Chehab
173d3b52e49SMauro Carvalho Chehab  - flags: queue flags
174d3b52e49SMauro Carvalho Chehab      KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
175d3b52e49SMauro Carvalho Chehab	forces notification without using the coalescing mechanism
176d3b52e49SMauro Carvalho Chehab	provided by the XIVE END ESBs.
177d3b52e49SMauro Carvalho Chehab  - qshift: queue size (power of 2)
178d3b52e49SMauro Carvalho Chehab  - qaddr: real address of queue
179d3b52e49SMauro Carvalho Chehab  - qtoggle: current queue toggle bit
180d3b52e49SMauro Carvalho Chehab  - qindex: current queue index
181d3b52e49SMauro Carvalho Chehab  - pad: reserved for future use
182d3b52e49SMauro Carvalho Chehab
183d3b52e49SMauro Carvalho Chehab  Errors:
184d3b52e49SMauro Carvalho Chehab
185d3b52e49SMauro Carvalho Chehab    =======  =========================================
186d3b52e49SMauro Carvalho Chehab    -ENOENT  Invalid CPU number
187d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid priority
188d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid flags
189d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid queue size
190d3b52e49SMauro Carvalho Chehab    -EINVAL  Invalid queue address
191d3b52e49SMauro Carvalho Chehab    -EFAULT  Invalid user pointer for attr->addr.
192d3b52e49SMauro Carvalho Chehab    -EIO     Configuration of the underlying HW failed
193d3b52e49SMauro Carvalho Chehab    =======  =========================================
194d3b52e49SMauro Carvalho Chehab
195d3b52e49SMauro Carvalho Chehab5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
196d3b52e49SMauro Carvalho Chehab     Synchronize the source to flush event notifications
197d3b52e49SMauro Carvalho Chehab
198d3b52e49SMauro Carvalho Chehab  Attributes:
199d3b52e49SMauro Carvalho Chehab    Interrupt source number  (64-bit)
200d3b52e49SMauro Carvalho Chehab
201d3b52e49SMauro Carvalho Chehab  Errors:
202d3b52e49SMauro Carvalho Chehab
203d3b52e49SMauro Carvalho Chehab    =======  =============================
204d3b52e49SMauro Carvalho Chehab    -ENOENT  Unknown source number
205d3b52e49SMauro Carvalho Chehab    -EINVAL  Not initialized source number
206d3b52e49SMauro Carvalho Chehab    =======  =============================
207d3b52e49SMauro Carvalho Chehab
208d3b52e49SMauro Carvalho Chehab* VCPU state
209d3b52e49SMauro Carvalho Chehab
210d3b52e49SMauro Carvalho Chehab  The XIVE IC maintains VP interrupt state in an internal structure
211d3b52e49SMauro Carvalho Chehab  called the NVT. When a VP is not dispatched on a HW processor
212d3b52e49SMauro Carvalho Chehab  thread, this structure can be updated by HW if the VP is the target
213d3b52e49SMauro Carvalho Chehab  of an event notification.
214d3b52e49SMauro Carvalho Chehab
215d3b52e49SMauro Carvalho Chehab  It is important for migration to capture the cached IPB from the NVT
216d3b52e49SMauro Carvalho Chehab  as it synthesizes the priorities of the pending interrupts. We
217d3b52e49SMauro Carvalho Chehab  capture a bit more to report debug information.
218d3b52e49SMauro Carvalho Chehab
219d3b52e49SMauro Carvalho Chehab  KVM_REG_PPC_VP_STATE (2 * 64bits)::
220d3b52e49SMauro Carvalho Chehab
221d3b52e49SMauro Carvalho Chehab    bits:     |  63  ....  32  |  31  ....  0  |
222d3b52e49SMauro Carvalho Chehab    values:   |   TIMA word0   |   TIMA word1  |
223d3b52e49SMauro Carvalho Chehab    bits:     | 127       ..........       64  |
224d3b52e49SMauro Carvalho Chehab    values:   |            unused              |
225d3b52e49SMauro Carvalho Chehab
226d3b52e49SMauro Carvalho Chehab* Migration:
227d3b52e49SMauro Carvalho Chehab
228d3b52e49SMauro Carvalho Chehab  Saving the state of a VM using the XIVE native exploitation mode
229d3b52e49SMauro Carvalho Chehab  should follow a specific sequence. When the VM is stopped :
230d3b52e49SMauro Carvalho Chehab
231d3b52e49SMauro Carvalho Chehab  1. Mask all sources (PQ=01) to stop the flow of events.
232d3b52e49SMauro Carvalho Chehab
233d3b52e49SMauro Carvalho Chehab  2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
234d3b52e49SMauro Carvalho Chehab  flush any in-flight event notification and to stabilize the EQs. At
235d3b52e49SMauro Carvalho Chehab  this stage, the EQ pages are marked dirty to make sure they are
236d3b52e49SMauro Carvalho Chehab  transferred in the migration sequence.
237d3b52e49SMauro Carvalho Chehab
238d3b52e49SMauro Carvalho Chehab  3. Capture the state of the source targeting, the EQs configuration
239d3b52e49SMauro Carvalho Chehab  and the state of thread interrupt context registers.
240d3b52e49SMauro Carvalho Chehab
241d3b52e49SMauro Carvalho Chehab  Restore is similar:
242d3b52e49SMauro Carvalho Chehab
243d3b52e49SMauro Carvalho Chehab  1. Restore the EQ configuration. As targeting depends on it.
244d3b52e49SMauro Carvalho Chehab  2. Restore targeting
245d3b52e49SMauro Carvalho Chehab  3. Restore the thread interrupt contexts
246d3b52e49SMauro Carvalho Chehab  4. Restore the source states
247d3b52e49SMauro Carvalho Chehab  5. Let the vCPU run
248