xref: /openbmc/linux/Documentation/virt/kvm/devices/vcpu.rst (revision 83f8a81dece8bc4237d8d94af357fb5df0083e63)
1e777a5bdSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2e777a5bdSMauro Carvalho Chehab
3e777a5bdSMauro Carvalho Chehab======================
4e777a5bdSMauro Carvalho ChehabGeneric vcpu interface
5e777a5bdSMauro Carvalho Chehab======================
6e777a5bdSMauro Carvalho Chehab
7e777a5bdSMauro Carvalho ChehabThe virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
8e777a5bdSMauro Carvalho ChehabKVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct
9e777a5bdSMauro Carvalho Chehabkvm_device_attr as other devices, but targets VCPU-wide settings and controls.
10e777a5bdSMauro Carvalho Chehab
11e777a5bdSMauro Carvalho ChehabThe groups and attributes per virtual cpu, if any, are architecture specific.
12e777a5bdSMauro Carvalho Chehab
13e777a5bdSMauro Carvalho Chehab1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL
14e777a5bdSMauro Carvalho Chehab==================================
15e777a5bdSMauro Carvalho Chehab
16e777a5bdSMauro Carvalho Chehab:Architectures: ARM64
17e777a5bdSMauro Carvalho Chehab
18e777a5bdSMauro Carvalho Chehab1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ
19e777a5bdSMauro Carvalho Chehab---------------------------------------
20e777a5bdSMauro Carvalho Chehab
21e777a5bdSMauro Carvalho Chehab:Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
22e777a5bdSMauro Carvalho Chehab	     pointer to an int
23e777a5bdSMauro Carvalho Chehab
24e777a5bdSMauro Carvalho ChehabReturns:
25e777a5bdSMauro Carvalho Chehab
26e777a5bdSMauro Carvalho Chehab	 =======  ========================================================
27e777a5bdSMauro Carvalho Chehab	 -EBUSY   The PMU overflow interrupt is already set
28af130d0aSAlexandru Elisei	 -EFAULT  Error reading interrupt number
2951dd2eb9SAlexandru Elisei	 -ENXIO   PMUv3 not supported or the overflow interrupt not set
3051dd2eb9SAlexandru Elisei		  when attempting to get it
3151dd2eb9SAlexandru Elisei	 -ENODEV  KVM_ARM_VCPU_PMU_V3 feature missing from VCPU
32e777a5bdSMauro Carvalho Chehab	 -EINVAL  Invalid PMU overflow interrupt number supplied or
33e777a5bdSMauro Carvalho Chehab		  trying to set the IRQ number without using an in-kernel
34e777a5bdSMauro Carvalho Chehab		  irqchip.
35e777a5bdSMauro Carvalho Chehab	 =======  ========================================================
36e777a5bdSMauro Carvalho Chehab
37e777a5bdSMauro Carvalho ChehabA value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
38e777a5bdSMauro Carvalho Chehabnumber for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
39e777a5bdSMauro Carvalho Chehabtype must be same for each vcpu. As a PPI, the interrupt number is the same for
40e777a5bdSMauro Carvalho Chehaball vcpus, while as an SPI it must be a separate number per vcpu.
41e777a5bdSMauro Carvalho Chehab
42e777a5bdSMauro Carvalho Chehab1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
43e777a5bdSMauro Carvalho Chehab---------------------------------------
44e777a5bdSMauro Carvalho Chehab
45e777a5bdSMauro Carvalho Chehab:Parameters: no additional parameter in kvm_device_attr.addr
46e777a5bdSMauro Carvalho Chehab
47e777a5bdSMauro Carvalho ChehabReturns:
48e777a5bdSMauro Carvalho Chehab
49e777a5bdSMauro Carvalho Chehab	 =======  ======================================================
50af130d0aSAlexandru Elisei	 -EEXIST  Interrupt number already used
51e777a5bdSMauro Carvalho Chehab	 -ENODEV  PMUv3 not supported or GIC not initialized
5251dd2eb9SAlexandru Elisei	 -ENXIO   PMUv3 not supported, missing VCPU feature or interrupt
5351dd2eb9SAlexandru Elisei		  number not set
54e777a5bdSMauro Carvalho Chehab	 -EBUSY   PMUv3 already initialized
55e777a5bdSMauro Carvalho Chehab	 =======  ======================================================
56e777a5bdSMauro Carvalho Chehab
57e777a5bdSMauro Carvalho ChehabRequest the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
58e777a5bdSMauro Carvalho Chehabvirtual GIC implementation, this must be done after initializing the in-kernel
59e777a5bdSMauro Carvalho Chehabirqchip.
60e777a5bdSMauro Carvalho Chehab
618be86a5eSMarc Zyngier1.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
628be86a5eSMarc Zyngier-----------------------------------------
638be86a5eSMarc Zyngier
648be86a5eSMarc Zyngier:Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
658be86a5eSMarc Zyngier             pointer to a struct kvm_pmu_event_filter
668be86a5eSMarc Zyngier
678be86a5eSMarc Zyngier:Returns:
688be86a5eSMarc Zyngier
698be86a5eSMarc Zyngier	 =======  ======================================================
70030bdf36SMauro Carvalho Chehab	 -ENODEV  PMUv3 not supported or GIC not initialized
71030bdf36SMauro Carvalho Chehab	 -ENXIO   PMUv3 not properly configured or in-kernel irqchip not
728be86a5eSMarc Zyngier	 	  configured as required prior to calling this attribute
735177fe91SMarc Zyngier	 -EBUSY   PMUv3 already initialized or a VCPU has already run
74030bdf36SMauro Carvalho Chehab	 -EINVAL  Invalid filter range
758be86a5eSMarc Zyngier	 =======  ======================================================
768be86a5eSMarc Zyngier
77030bdf36SMauro Carvalho ChehabRequest the installation of a PMU event filter described as follows::
788be86a5eSMarc Zyngier
798be86a5eSMarc Zyngier    struct kvm_pmu_event_filter {
808be86a5eSMarc Zyngier	    __u16	base_event;
818be86a5eSMarc Zyngier	    __u16	nevents;
828be86a5eSMarc Zyngier
838be86a5eSMarc Zyngier    #define KVM_PMU_EVENT_ALLOW	0
848be86a5eSMarc Zyngier    #define KVM_PMU_EVENT_DENY	1
858be86a5eSMarc Zyngier
868be86a5eSMarc Zyngier	    __u8	action;
878be86a5eSMarc Zyngier	    __u8	pad[3];
888be86a5eSMarc Zyngier    };
898be86a5eSMarc Zyngier
908be86a5eSMarc ZyngierA filter range is defined as the range [@base_event, @base_event + @nevents),
918be86a5eSMarc Zyngiertogether with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
928be86a5eSMarc Zyngierfirst registered range defines the global policy (global ALLOW if the first
938be86a5eSMarc Zyngier@action is DENY, global DENY if the first @action is ALLOW). Multiple ranges
948be86a5eSMarc Zyngiercan be programmed, and must fit within the event space defined by the PMU
958be86a5eSMarc Zyngierarchitecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards).
968be86a5eSMarc Zyngier
978be86a5eSMarc ZyngierNote: "Cancelling" a filter by registering the opposite action for the same
988be86a5eSMarc Zyngierrange doesn't change the default action. For example, installing an ALLOW
998be86a5eSMarc Zyngierfilter for event range [0:10) as the first filter and then applying a DENY
1008be86a5eSMarc Zyngieraction for the same range will leave the whole range as disabled.
1018be86a5eSMarc Zyngier
1028be86a5eSMarc ZyngierRestrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a
1038be86a5eSMarc Zyngierhardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
1048be86a5eSMarc Zyngierisn't strictly speaking an event. Filtering the cycle counter is possible
1058be86a5eSMarc Zyngierusing event 0x11 (CPU_CYCLES).
1068be86a5eSMarc Zyngier
1076ee7fca2SAlexandru Elisei1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
1086ee7fca2SAlexandru Elisei------------------------------------------
1096ee7fca2SAlexandru Elisei
1106ee7fca2SAlexandru Elisei:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
1116ee7fca2SAlexandru Elisei             identifier.
1126ee7fca2SAlexandru Elisei
1136ee7fca2SAlexandru Elisei:Returns:
1146ee7fca2SAlexandru Elisei
1156ee7fca2SAlexandru Elisei	 =======  ====================================================
1166ee7fca2SAlexandru Elisei	 -EBUSY   PMUv3 already initialized, a VCPU has already run or
1176ee7fca2SAlexandru Elisei                  an event filter has already been set
1186ee7fca2SAlexandru Elisei	 -EFAULT  Error accessing the PMU identifier
1196ee7fca2SAlexandru Elisei	 -ENXIO   PMU not found
1206ee7fca2SAlexandru Elisei	 -ENODEV  PMUv3 not supported or GIC not initialized
1216ee7fca2SAlexandru Elisei	 -ENOMEM  Could not allocate memory
1226ee7fca2SAlexandru Elisei	 =======  ====================================================
1236ee7fca2SAlexandru Elisei
1246ee7fca2SAlexandru EliseiRequest that the VCPU uses the specified hardware PMU when creating guest events
1256ee7fca2SAlexandru Eliseifor the purpose of PMU emulation. The PMU identifier can be read from the "type"
1266ee7fca2SAlexandru Eliseifile for the desired PMU instance under /sys/devices (or, equivalent,
1276ee7fca2SAlexandru Elisei/sys/bus/even_source). This attribute is particularly useful on heterogeneous
1286ee7fca2SAlexandru Eliseisystems where there are at least two CPU PMUs on the system. The PMU that is set
1296ee7fca2SAlexandru Eliseifor one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
1306ee7fca2SAlexandru Eliseiif a PMU event filter is already present.
1316ee7fca2SAlexandru Elisei
1326ee7fca2SAlexandru EliseiNote that KVM will not make any attempts to run the VCPU on the physical CPUs
1336ee7fca2SAlexandru Eliseiassociated with the PMU specified by this attribute. This is entirely left to
134583cda1bSAlexandru Eliseiuserspace. However, attempting to run the VCPU on a physical CPU not supported
135583cda1bSAlexandru Eliseiby the PMU will fail and KVM_RUN will return with
136583cda1bSAlexandru Eliseiexit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
137583cda1bSAlexandru Eliseihardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
138583cda1bSAlexandru Eliseithe cpu field to the processor id.
139e777a5bdSMauro Carvalho Chehab
140e777a5bdSMauro Carvalho Chehab2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
141e777a5bdSMauro Carvalho Chehab=================================
142e777a5bdSMauro Carvalho Chehab
1433fbf4207SOliver Upton:Architectures: ARM64
144e777a5bdSMauro Carvalho Chehab
145e777a5bdSMauro Carvalho Chehab2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER
146e777a5bdSMauro Carvalho Chehab-----------------------------------------------------------------------------
147e777a5bdSMauro Carvalho Chehab
148e777a5bdSMauro Carvalho Chehab:Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
149e777a5bdSMauro Carvalho Chehab	     pointer to an int
150e777a5bdSMauro Carvalho Chehab
151e777a5bdSMauro Carvalho ChehabReturns:
152e777a5bdSMauro Carvalho Chehab
153e777a5bdSMauro Carvalho Chehab	 =======  =================================
154e777a5bdSMauro Carvalho Chehab	 -EINVAL  Invalid timer interrupt number
155e777a5bdSMauro Carvalho Chehab	 -EBUSY   One or more VCPUs has already run
156e777a5bdSMauro Carvalho Chehab	 =======  =================================
157e777a5bdSMauro Carvalho Chehab
158e777a5bdSMauro Carvalho ChehabA value describing the architected timer interrupt number when connected to an
159e777a5bdSMauro Carvalho Chehabin-kernel virtual GIC.  These must be a PPI (16 <= intid < 32).  Setting the
160e777a5bdSMauro Carvalho Chehabattribute overrides the default values (see below).
161e777a5bdSMauro Carvalho Chehab
162e777a5bdSMauro Carvalho Chehab=============================  ==========================================
163e777a5bdSMauro Carvalho ChehabKVM_ARM_VCPU_TIMER_IRQ_VTIMER  The EL1 virtual timer intid (default: 27)
164e777a5bdSMauro Carvalho ChehabKVM_ARM_VCPU_TIMER_IRQ_PTIMER  The EL1 physical timer intid (default: 30)
165e777a5bdSMauro Carvalho Chehab=============================  ==========================================
166e777a5bdSMauro Carvalho Chehab
167e777a5bdSMauro Carvalho ChehabSetting the same PPI for different timers will prevent the VCPUs from running.
168e777a5bdSMauro Carvalho ChehabSetting the interrupt number on a VCPU configures all VCPUs created at that
169e777a5bdSMauro Carvalho Chehabtime to use the number provided for a given timer, overwriting any previously
170e777a5bdSMauro Carvalho Chehabconfigured values on other VCPUs.  Userspace should configure the interrupt
171e777a5bdSMauro Carvalho Chehabnumbers on at least one VCPU after creating all VCPUs and before running any
172e777a5bdSMauro Carvalho ChehabVCPUs.
173e777a5bdSMauro Carvalho Chehab
174*83f8a81dSUsama Arif.. _kvm_arm_vcpu_pvtime_ctrl:
175*83f8a81dSUsama Arif
176e777a5bdSMauro Carvalho Chehab3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
177e777a5bdSMauro Carvalho Chehab==================================
178e777a5bdSMauro Carvalho Chehab
179e777a5bdSMauro Carvalho Chehab:Architectures: ARM64
180e777a5bdSMauro Carvalho Chehab
181e777a5bdSMauro Carvalho Chehab3.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA
182e777a5bdSMauro Carvalho Chehab--------------------------------------
183e777a5bdSMauro Carvalho Chehab
184e777a5bdSMauro Carvalho Chehab:Parameters: 64-bit base address
185e777a5bdSMauro Carvalho Chehab
186e777a5bdSMauro Carvalho ChehabReturns:
187e777a5bdSMauro Carvalho Chehab
188e777a5bdSMauro Carvalho Chehab	 =======  ======================================
189e777a5bdSMauro Carvalho Chehab	 -ENXIO   Stolen time not implemented
190e777a5bdSMauro Carvalho Chehab	 -EEXIST  Base address already set for this VCPU
191e777a5bdSMauro Carvalho Chehab	 -EINVAL  Base address not 64 byte aligned
192e777a5bdSMauro Carvalho Chehab	 =======  ======================================
193e777a5bdSMauro Carvalho Chehab
194e777a5bdSMauro Carvalho ChehabSpecifies the base address of the stolen time structure for this VCPU. The
195e777a5bdSMauro Carvalho Chehabbase address must be 64 byte aligned and exist within a valid guest memory
19672ef5e52SMauro Carvalho Chehabregion. See Documentation/virt/kvm/arm/pvtime.rst for more information
197e777a5bdSMauro Carvalho Chehabincluding the layout of the stolen time structure.
198828ca896SOliver Upton
199828ca896SOliver Upton4. GROUP: KVM_VCPU_TSC_CTRL
200828ca896SOliver Upton===========================
201828ca896SOliver Upton
202828ca896SOliver Upton:Architectures: x86
203828ca896SOliver Upton
204828ca896SOliver Upton4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
205828ca896SOliver Upton
206828ca896SOliver Upton:Parameters: 64-bit unsigned TSC offset
207828ca896SOliver Upton
208828ca896SOliver UptonReturns:
209828ca896SOliver Upton
210828ca896SOliver Upton	 ======= ======================================
211828ca896SOliver Upton	 -EFAULT Error reading/writing the provided
212828ca896SOliver Upton		 parameter address.
213828ca896SOliver Upton	 -ENXIO  Attribute not supported
214828ca896SOliver Upton	 ======= ======================================
215828ca896SOliver Upton
216828ca896SOliver UptonSpecifies the guest's TSC offset relative to the host's TSC. The guest's
217828ca896SOliver UptonTSC is then derived by the following equation:
218828ca896SOliver Upton
219828ca896SOliver Upton  guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
220828ca896SOliver Upton
2213f9808caSOliver UptonThis attribute is useful to adjust the guest's TSC on live migration,
2223f9808caSOliver Uptonso that the TSC counts the time during which the VM was paused. The
2233f9808caSOliver Uptonfollowing describes a possible algorithm to use for this purpose.
224828ca896SOliver Upton
225828ca896SOliver UptonFrom the source VMM process:
226828ca896SOliver Upton
2273f9808caSOliver Upton1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
2283f9808caSOliver Upton   kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
2293f9808caSOliver Upton   (host_src).
230828ca896SOliver Upton
231828ca896SOliver Upton2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
2323f9808caSOliver Upton   guest TSC offset (ofs_src[i]).
233828ca896SOliver Upton
234828ca896SOliver Upton3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
235828ca896SOliver Upton   guest's TSC (freq).
236828ca896SOliver Upton
237828ca896SOliver UptonFrom the destination VMM process:
238828ca896SOliver Upton
2393f9808caSOliver Upton4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
2403f9808caSOliver Upton   kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
2413f9808caSOliver Upton   fields.  Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
2423f9808caSOliver Upton   structure.
243828ca896SOliver Upton
2443f9808caSOliver Upton   KVM will advance the VM's kvmclock to account for elapsed time since
2453f9808caSOliver Upton   recording the clock values.  Note that this will cause problems in
2463f9808caSOliver Upton   the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
2473f9808caSOliver Upton   between the source and destination, and a reasonably short time passes
2483f9808caSOliver Upton   between the source pausing the VMs and the destination executing
2493f9808caSOliver Upton   steps 4-7.
2503f9808caSOliver Upton
2513f9808caSOliver Upton5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
2523f9808caSOliver Upton   kvmclock nanoseconds (guest_dest).
253828ca896SOliver Upton
254828ca896SOliver Upton6. Adjust the guest TSC offsets for every vCPU to account for (1) time
255828ca896SOliver Upton   elapsed since recording state and (2) difference in TSCs between the
256828ca896SOliver Upton   source and destination machine:
257828ca896SOliver Upton
2583f9808caSOliver Upton   ofs_dst[i] = ofs_src[i] -
2593f9808caSOliver Upton     (guest_src - guest_dest) * freq +
2603f9808caSOliver Upton     (tsc_src - tsc_dest)
2613f9808caSOliver Upton
2623f9808caSOliver Upton   ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
2633f9808caSOliver Upton   a time of 0 in kvmclock.  The above formula ensures that it is the
2643f9808caSOliver Upton   same on the destination as it was on the source).
265828ca896SOliver Upton
266828ca896SOliver Upton7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
267828ca896SOliver Upton   respective value derived in the previous step.
268