#
f7035ce9 |
| 08-Oct-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code
This is based on a patch by Suraj Jitindar Singh.
This moves the code in book3s_hv_rmhandlers.S that generates an external, dec
KVM: PPC: Book3S HV: Move interrupt delivery on guest entry to C code
This is based on a patch by Suraj Jitindar Singh.
This moves the code in book3s_hv_rmhandlers.S that generates an external, decrementer or privileged doorbell interrupt just before entering the guest to C code in book3s_hv_builtin.c. This is to make future maintenance and modification easier. The algorithm expressed in the C code is almost identical to the previous algorithm.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.18.9 |
|
#
aa227864 |
| 11-Sep-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Provide mode where all vCPUs on a core must be the same VM
This adds a mode where the vcore scheduling logic in HV KVM limits itself to scheduling only virtual cores from the sa
KVM: PPC: Book3S HV: Provide mode where all vCPUs on a core must be the same VM
This adds a mode where the vcore scheduling logic in HV KVM limits itself to scheduling only virtual cores from the same VM on any given physical core. This is enabled via a new module parameter on the kvm-hv module called "one_vm_per_core". For this to work on POWER9, it is necessary to set indep_threads_mode=N. (On POWER8, hardware limitations mean that KVM is never in independent threads mode, regardless of the indep_threads_mode setting.)
Thus the settings needed for this to work are:
1. The host is in SMT1 mode. 2. On POWER8, the host is not in 2-way or 4-way static split-core mode. 3. On POWER9, the indep_threads_mode parameter is N. 4. The one_vm_per_core parameter is Y.
With these settings, KVM can run up to 4 vcpus on a core at the same time on POWER9, or up to 8 vcpus on POWER8 (depending on the guest threading mode), and will ensure that all of the vcpus belong to the same VM.
This is intended for use in security-conscious settings where users are concerned about possible side-channel attacks between threads which could perhaps enable one VM to attack another VM on the same core, or the host.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16 |
|
#
d6ee76d3 |
| 16-Aug-2018 |
Luke Dashjr <luke@dashjr.org> |
powerpc64/ftrace: Include ftrace.h needed for enable/disable calls
this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in ftrace.h Without it included, the build fails.
Fixes: a4bc64d305
powerpc64/ftrace: Include ftrace.h needed for enable/disable calls
this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in ftrace.h Without it included, the build fails.
Fixes: a4bc64d305af ("powerpc64/ftrace: Disable ftrace during kvm entry/exit") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org> Acked-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5 |
|
#
45ef5992 |
| 05-Jul-2018 |
Christophe Leroy <christophe.leroy@c-s.fr> |
powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h
Signed-off-
powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for: - using functions xxx_flush_tlb_xxx() - using MMU_NO_CONTEXT - including asm-generic/pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
b5c6f760 |
| 26-Jul-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Read kvm->arch.emul_smt_mode under kvm->lock
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_
KVM: PPC: Book3S HV: Read kvm->arch.emul_smt_mode under kvm->lock
Commit 1e175d2 ("KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space", 2018-07-25) added code that uses kvm->arch.emul_smt_mode before any VCPUs are created. However, userspace can change kvm->arch.emul_smt_mode at any time up until the first VCPU is created. Hence it is (theoretically) possible for the check in kvmppc_core_vcpu_create_hv() to race with another userspace thread changing kvm->arch.emul_smt_mode.
This fixes it by moving the test that uses kvm->arch.emul_smt_mode into the block where kvm->lock is held.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
1e175d2e |
| 25-Jul-2018 |
Sam Bobroff <sam.bobroff@au1.ibm.com> |
KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space
It is not currently possible to create the full number of possible VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fe
KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space
It is not currently possible to create the full number of possible VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses fewer threads per core than its core stride (or "VSMT mode"). This is because the VCORE ID and XIVE offsets grow beyond KVM_MAX_VCPUS even though the VCPU ID is less than KVM_MAX_VCPU_ID.
To address this, "pack" the VCORE ID and XIVE offsets by using knowledge of the way the VCPU IDs will be used when there are fewer guest threads per core than the core stride. The primary thread of each core will always be used first. Then, if the guest uses more than one thread per core, these secondary threads will sequentially follow the primary in each core.
So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the VCPUs are being spaced apart, so at least half of each core is empty, and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped into the second half of each core (4..7, in an 8-thread core).
Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of each core is being left empty, and we can map down into the second and third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary threads are being used and 7/8 of the core is empty, allowing use of the 1, 5, 3 and 7 thread slots.
(Strides less than 8 are handled similarly.)
This allows the VCORE ID or offset to be calculated quickly from the VCPU ID or XIVE server numbers, without access to the VCPU structure.
[paulus@ozlabs.org - tidied up comment a little, changed some WARN_ONCE to pr_devel, wrapped line, fixed id check.]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
0abb75b7 |
| 07-Jul-2018 |
Nicholas Mc Guire <hofrat@osadl.org> |
KVM: PPC: Book3S HV: Fix constant size warning
The constants are 64bit but not explicitly declared UL resulting in sparse warnings. Fix this by declaring the constants UL.
Signed-off-by: Nicholas M
KVM: PPC: Book3S HV: Fix constant size warning
The constants are 64bit but not explicitly declared UL resulting in sparse warnings. Fix this by declaring the constants UL.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
51eaa08f |
| 07-Jul-2018 |
Nicholas Mc Guire <hofrat@osadl.org> |
KVM: PPC: Book3S HV: Add of_node_put() in success path
The call to of_find_compatible_node() is returning a pointer with incremented refcount so it must be explicitly decremented after the last use.
KVM: PPC: Book3S HV: Add of_node_put() in success path
The call to of_find_compatible_node() is returning a pointer with incremented refcount so it must be explicitly decremented after the last use. As here it is only being used for checking of node presence but the result is not actually used in the success path it can be dropped immediately.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Fixes: commit f725758b899f ("KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
2bf1071a |
| 05-Jul-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Remove POWER9 DD1 support
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing.
Signed-of
powerpc/64s: Remove POWER9 DD1 support
POWER9 DD1 was never a product. It is no longer supported by upstream firmware, and it is not effectively supported in Linux due to lack of testing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [mpe: Remove arch_make_huge_pte() entirely] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.17.4, v4.17.3, v4.17.2 |
|
#
b3dae109 |
| 12-Jun-2018 |
Peter Zijlstra <peterz@infradead.org> |
sched/swait: Rename to exclusive
Since swait basically implemented exclusive waits only, make sure the API reflects that.
$ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepar
sched/swait: Rename to exclusive
Since swait basically implemented exclusive waits only, make sure the API reflects that.
$ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done
With a few manual touch-ups.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
show more ...
|
#
fad953ce |
| 12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of:
vza
treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of:
vzalloc(a * b)
with: vzalloc(array_size(a, b))
as well as handling cases of:
vzalloc(a * b * c)
with:
vzalloc(array3_size(a, b, c))
This does, however, attempt to ignore constant size factors like:
vzalloc(4 * 1024)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@
( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) )
// Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@
( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) )
// 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@
( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) )
// 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@
vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@
( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) )
// 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@
( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) )
// 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@
( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) )
// Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@
( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) )
// And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@
( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) )
Signed-off-by: Kees Cook <keescook@chromium.org>
show more ...
|
Revision tags: v4.17.1, v4.17 |
|
#
929f45e3 |
| 29-May-2018 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
kvm: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic shou
kvm: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this.
This cleans up the error handling a lot, as this code will never get hit.
Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim KrÄmář" <rkrcmar@redhat.com> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: kvm-ppc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
9a4506e1 |
| 17-May-2018 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on
The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out
KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on
The radix guest code can has fewer restrictions about what context it can run in, so move this flushing out of assembly and have it use the Linux TLB flush implementations introduced previously.
This allows powerpc:tlbie trace events to be used.
This changes the tlbiel sequence to only execute RIC=2 flush once on the first set flushed, then RIC=0 for the rest of the sets. The end result of the flush should be unchanged. This matches the local PID flush pattern that was introduced in a5998fcb92 ("powerpc/mm/radix: Optimise tlbiel flush all case").
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
173c520a |
| 07-May-2018 |
Simon Guo <wei.guo.simon@gmail.com> |
KVM: PPC: Move nip/ctr/lr/xer registers to pt_regs in kvm_vcpu_arch
This patch moves nip/ctr/lr/xer registers from scattered places in kvm_vcpu_arch to pt_regs structure.
cr register is "unsigned l
KVM: PPC: Move nip/ctr/lr/xer registers to pt_regs in kvm_vcpu_arch
This patch moves nip/ctr/lr/xer registers from scattered places in kvm_vcpu_arch to pt_regs structure.
cr register is "unsigned long" in pt_regs and u32 in vcpu->arch. It will need more consideration and may move in later patches.
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
7aa15842 |
| 20-Apr-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctly
Although Linux doesn't use PURR and SPURR ((Scaled) Processor Utilization of Resources Register), other OSes depend on them. On
KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctly
Although Linux doesn't use PURR and SPURR ((Scaled) Processor Utilization of Resources Register), other OSes depend on them. On POWER8 they count at a rate depending on whether the VCPU is idle or running, the activity of the VCPU, and the value in the RWMR (Region-Weighting Mode Register). Hardware expects the hypervisor to update the RWMR when a core is dispatched to reflect the number of online VCPUs in the vcore.
This adds code to maintain a count in the vcore struct indicating how many VCPUs are online. In kvmppc_run_core we use that count to set the RWMR register on POWER8. If the core is split because of a static or dynamic micro-threading mode, we use the value for 8 threads. The RWMR value is not relevant when the host is executing because Linux does not use the PURR or SPURR register, so we don't bother saving and restoring the host value.
For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE register, we set online to 1 if it was 0 at the time of a KVM_RUN ioctl.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
a1f15826 |
| 20-Apr-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Add 'online' register to ONE_REG interface
This adds a new KVM_REG_PPC_ONLINE register which userspace can set to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it
KVM: PPC: Book3S HV: Add 'online' register to ONE_REG interface
This adds a new KVM_REG_PPC_ONLINE register which userspace can set to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it considers the VCPU to be offline (0), that is, not currently running, or online (1). This will be used in a later patch to configure the register which controls PURR and SPURR accumulation.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
57b8daa7 |
| 20-Apr-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts
KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts.
To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time.
In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
a4bc64d3 |
| 19-Apr-2018 |
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> |
powerpc64/ftrace: Disable ftrace during kvm entry/exit
During guest entry/exit, we switch over to/from the guest MMU context and we cannot take exceptions in the hypervisor code.
Since ftrace may b
powerpc64/ftrace: Disable ftrace during kvm entry/exit
During guest entry/exit, we switch over to/from the guest MMU context and we cannot take exceptions in the hypervisor code.
Since ftrace may be enabled and since it can result in us taking a trap, disable ftrace by setting paca->ftrace_enabled to zero. There are two paths through which we enter/exit a guest: 1. If we are the vcore runner, then we enter the guest via __kvmppc_vcore_entry() and we disable ftrace around this. This is always the case for Power9, and for the primary thread on Power8. 2. If we are a secondary thread in Power8, then we would be in nap due to SMT being disabled. We are woken up by an IPI to enter the guest. In this scenario, we enter the guest through kvm_start_guest(). We disable ftrace at this point. In this scenario, ftrace would only get re-enabled on the secondary thread when SMT is re-enabled (via start_secondary()).
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.16 |
|
#
e303c087 |
| 01-Apr-2018 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Fix ppc_breakpoint_available compile error
arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_h_set_mode’: arch/powerpc/kvm/book3s_hv.c:745:8: error: implicit declaration of func
KVM: PPC: Book3S HV: Fix ppc_breakpoint_available compile error
arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_h_set_mode’: arch/powerpc/kvm/book3s_hv.c:745:8: error: implicit declaration of function ‘ppc_breakpoint_available’ if (!ppc_breakpoint_available()) ^~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 398e712c007f ("KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
499dcd41 |
| 13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: Allocate LPPACAs individually
We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements a
powerpc/64s: Allocate LPPACAs individually
We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements at allocation time. We can not reduce the 1kB allocation size however, due to existing KVM hypervisors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
d2e60075 |
| 13-Feb-2018 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate pacas individually.
This allows flexibility in where the PA
powerpc/64: Use array of paca pointers and allocate pacas individually
Change the paca array into an array of pointers to pacas. Allocate pacas individually.
This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
398e712c |
| 26-Mar-2018 |
Michael Neuling <mikey@neuling.org> |
KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9
Return H_P2 on a h_set_mode(SET_DAWR) on POWER9 where the DAWR is disabled.
Current Linux guests ignore this error, so they wil
KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9
Return H_P2 on a h_set_mode(SET_DAWR) on POWER9 where the DAWR is disabled.
Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
4bb3c7a0 |
| 21-Mar-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specificall
KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation.
The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt.
On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0.
On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint.
Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts.
The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path.
With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
39c983ea |
| 21-Feb-2018 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Remove unused kvm_unmap_hva callback
Since commit fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2", 2017-08-31), the MMU notifier code in KVM no longer calls the kvm_unmap_hva c
KVM: PPC: Remove unused kvm_unmap_hva callback
Since commit fb1522e099f0 ("KVM: update to new mmu_notifier semantic v2", 2017-08-31), the MMU notifier code in KVM no longer calls the kvm_unmap_hva callback. This removes the PPC implementations of kvm_unmap_hva().
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
61bd0f66 |
| 02-Mar-2018 |
Laurent Vivier <lvivier@redhat.com> |
KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
Since commit 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry"), if CONFIG_VIRT_CPU_
KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN
Since commit 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the guest time is not accounted to guest time and user time, but instead to system time.
This is because guest_enter()/guest_exit() are called while interrupts are disabled and the tick counter cannot be updated between them.
To fix that, move guest_exit() after local_irq_enable(), and as guest_enter() is called with IRQ disabled, call guest_enter_irqoff() instead.
Fixes: 8b24e69fc47e ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|