Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4 |
|
#
01f0876e |
| 01-Dec-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with MSR_EE
[ Upstream commit ecd10702baae5c16a91d139bde7eff84ce55daee ]
Commit 026728dc5d41 ("KVM: PPC: Book3S HV P9: Inject pending x
KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with MSR_EE
[ Upstream commit ecd10702baae5c16a91d139bde7eff84ce55daee ]
Commit 026728dc5d41 ("KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry") changed guest entry so that if external interrupts are enabled, BOOK3S_IRQPRIO_EXTERNAL is not tested for. Test for this regardless of MSR_EE.
For an L1 host, do not inject an interrupt, but always use LPCR_MER. If the L0 desires it can inject an interrupt.
Fixes: 026728dc5d41 ("KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [jpn: use kvmpcc_get_msr(), write commit message] Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231201132618.555031-7-vaibhav@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4 |
|
#
267980ea |
| 13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Book3S HV: Introduce low level MSR accessor
[ Upstream commit 6de2e837babb411cfb3cdb570581c3a65576ddaf ]
kvmppc_get_msr() and kvmppc_set_msr_fast() serve as accessors for the MSR. However
KVM: PPC: Book3S HV: Introduce low level MSR accessor
[ Upstream commit 6de2e837babb411cfb3cdb570581c3a65576ddaf ]
kvmppc_get_msr() and kvmppc_set_msr_fast() serve as accessors for the MSR. However because the MSR is kept in the shared regs they include a conditional check for kvmppc_shared_big_endian() and endian conversion.
Within the Book3S HV specific code there are direct reads and writes of shregs::msr. In preparation for Nested APIv2 these accesses need to be replaced with accessor functions so it is possible to extend their behavior. However, using the kvmppc_get_msr() and kvmppc_set_msr_fast() functions is undesirable because it would introduce a conditional branch and endian conversion that is not currently present.
kvmppc_set_msr_hv() already exists, it is used for the kvmppc_ops::set_msr callback.
Introduce a low level accessor __kvmppc_{s,g}et_msr_hv() that simply gets and sets shregs::msr. This will be extend for Nested APIv2 support.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-8-jniethe5@gmail.com Stable-dep-of: ecd10702baae ("KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with MSR_EE") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
abcaadd4 |
| 13-Sep-2023 |
Jordan Niethe <jniethe5@gmail.com> |
KVM: PPC: Book3S HV: Use accessors for VCPU registers
[ Upstream commit ebc88ea7a6ad0ea349df9c765357d3aa4e662aa9 ]
Introduce accessor generator macros for Book3S HV VCPU registers. Use the accessor
KVM: PPC: Book3S HV: Use accessors for VCPU registers
[ Upstream commit ebc88ea7a6ad0ea349df9c765357d3aa4e662aa9 ]
Introduce accessor generator macros for Book3S HV VCPU registers. Use the accessor functions to replace direct accesses to this registers.
This will be important later for Nested APIv2 support which requires additional functionality for accessing and modifying VCPU state.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230914030600.16993-7-jniethe5@gmail.com Stable-dep-of: ecd10702baae ("KVM: PPC: Book3S HV: Handle pending exceptions on guest entry with MSR_EE") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16 |
|
#
a3800ef9 |
| 08-Mar-2023 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Enable prefixed instructions for HV KVM and disable for PR KVM
Now that we can read prefixed instructions from a HV KVM guest and emulate prefixed load/store instructions to emulated MMIO
KVM: PPC: Enable prefixed instructions for HV KVM and disable for PR KVM
Now that we can read prefixed instructions from a HV KVM guest and emulate prefixed load/store instructions to emulated MMIO locations, we can add HFSCR_PREFIXED into the set of bits that are set in the HFSCR for a HV KVM guest on POWER10, allowing the guest to use prefixed instructions.
PR KVM has not yet been extended to handle prefixed instructions in all situations where we might need to emulate them, so prevent the guest from enabling prefixed instructions in the FSCR for now.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgs25dCmLrVkBdU@cleo
show more ...
|
#
953e3739 |
| 08-Mar-2023 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Fetch prefixed instructions from the guest
In order to handle emulation of prefixed instructions in the guest, this first makes vcpu->arch.last_inst be an unsigned long, i.e. 64 bits on 64
KVM: PPC: Fetch prefixed instructions from the guest
In order to handle emulation of prefixed instructions in the guest, this first makes vcpu->arch.last_inst be an unsigned long, i.e. 64 bits on 64-bit platforms. For prefixed instructions, the upper 32 bits are used for the prefix and the lower 32 bits for the suffix, and both halves are byte-swapped if the guest endianness differs from the host.
Next, vcpu->arch.emul_inst is now 64 bits wide, to match the HEIR register on POWER10. Like HEIR, for a prefixed instruction it is defined to have the prefix is in the top 32 bits and the suffix in the bottom 32 bits, with both halves in the correct byte order.
kvmppc_get_last_inst is extended on 64-bit machines to put the prefix and suffix in the right places in the ppc_inst_t being returned.
kvmppc_load_last_inst now returns the instruction in an unsigned long in the same format as vcpu->arch.last_inst. It makes the decision about whether to fetch a suffix based on the SRR1_PREFIXED bit in the MSR image stored in the vcpu struct, which generally comes from SRR1 or HSRR1 on an interrupt. This bit is defined in Power ISA v3.1B to be set if the interrupt occurred due to a prefixed instruction and cleared otherwise for all interrupts except for instruction storage interrupt, which does not come to the hypervisor. It is set to zero for asynchronous interrupts such as external interrupts. In previous ISA versions it was always set to 0 for all interrupts except instruction storage interrupt.
The code in book3s_hv_rmhandlers.S that loads the faulting instruction on a HDSI is only used on POWER8 and therefore doesn't ever need to load a suffix.
[npiggin@gmail.com - check that the is-prefixed bit in SRR1 matches the type of instruction that was fetched.]
Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgsq9h1CCzouQuV@cleo
show more ...
|
#
acf17878 |
| 08-Mar-2023 |
Paul Mackerras <paulus@ozlabs.org> |
KVM: PPC: Make kvmppc_get_last_inst() produce a ppc_inst_t
This changes kvmppc_get_last_inst() so that the instruction it fetches is returned in a ppc_inst_t variable rather than a u32. This will a
KVM: PPC: Make kvmppc_get_last_inst() produce a ppc_inst_t
This changes kvmppc_get_last_inst() so that the instruction it fetches is returned in a ppc_inst_t variable rather than a u32. This will allow us to return a 64-bit prefixed instruction on those 64-bit machines that implement Power ISA v3.1 or later, such as POWER10. On 32-bit platforms, ppc_inst_t is 32 bits wide, and is turned back into a u32 by ppc_inst_val, which is an identity operation on those platforms.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/ZAgsiPlL9O7KnlZZ@cleo
show more ...
|
#
6cd5c1db |
| 30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Set SRR1[PREFIX] bit on injected interrupts
Pass the hypervisor (H)SRR1[PREFIX] indication through to synchronous interrupts injected into the guest.
Signed-off-by: Nicholas Pi
KVM: PPC: Book3S HV: Set SRR1[PREFIX] bit on injected interrupts
Pass the hypervisor (H)SRR1[PREFIX] indication through to synchronous interrupts injected into the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-3-npiggin@gmail.com
show more ...
|
#
460ba21d |
| 30-Mar-2023 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Permit SRR1 flags in more injected interrupt types
The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1 for many types of synchronous interrupts which is set when the inte
KVM: PPC: Permit SRR1 flags in more injected interrupt types
The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1 for many types of synchronous interrupts which is set when the interrupt is caused by a prefixed instruction.
This requires KVM to be able to set this bit when injecting interrupts into a guest. Plumb through the SRR1 "flags" argument to the core_queue APIs where it's missing for this. For now they are set to 0, which is no change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup kvmppc_core_queue_alignment() in booke.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230330103224.3589928-2-npiggin@gmail.com
show more ...
|
#
4c8c3c7f |
| 07-Mar-2023 |
Valentin Schneider <vschneid@redhat.com> |
treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it int
treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the arch-specific definitions of it to arch_smp_send_reschedule() and wrap it into an smp_send_reschedule() that contains a tracepoint.
Changes to include the declaration of the tracepoint were driven by the following coccinelle script:
@func_use@ @@ smp_send_reschedule(...);
@include@ @@ #include <trace/events/ipi.h>
@no_include depends on func_use && !include@ @@ #include <...> + + #include <trace/events/ipi.h>
[csky bits] [riscv bits] Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
show more ...
|
Revision tags: v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
67c48662 |
| 08-Feb-2023 |
Thomas Huth <thuth@redhat.com> |
KVM: PPC: Standardize on "int" return types in the powerpc KVM code
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. S
KVM: PPC: Standardize on "int" return types in the powerpc KVM code
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. Some outlier functions use "long" instead for no good reason (they do not really require long values here). Let's standardize on "int" here to avoid casting the values back and forth between the two types.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68 |
|
#
e4335f53 |
| 08-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA
PAPR specifies accumulated virtual processor wait intervals that relate to partition scheduling interval times. Implement
KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA
PAPR specifies accumulated virtual processor wait intervals that relate to partition scheduling interval times. Implement these counters in the same way as they are repoted by dtl.
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-5-npiggin@gmail.com
show more ...
|
#
1a5486b3 |
| 08-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl
Stolen time logging in dtl was removed from the P9 path, so guests had no stolen time accounting. Add it back in a simpler way that still a
KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl
Stolen time logging in dtl was removed from the P9 path, so guests had no stolen time accounting. Add it back in a simpler way that still avoids locks and per-core accounting code.
Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-4-npiggin@gmail.com
show more ...
|
#
b31bc24a |
| 08-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV: Update guest state entry/exit accounting to new API
Update the guest state and timing entry/exit accounting to use the new API, which was introduced following issues found[1]. K
KVM: PPC: Book3S HV: Update guest state entry/exit accounting to new API
Update the guest state and timing entry/exit accounting to use the new API, which was introduced following issues found[1]. KVM HV does possibly call instrumented code inside the guest context, and it does call srcu inside the guest context which is fragile at best.
Switch to the new API, moving the guest context inside the srcu_read_lock/unlock region.
[1] https://lore.kernel.org/lkml/20220201132926.3301912-1-mark.rutland@arm.com/
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-3-npiggin@gmail.com
show more ...
|
#
c953f750 |
| 08-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Fix irq disabling in tick accounting
kvmhv_run_single_vcpu() disables PMIs as well as Linux irqs, however the tick time accounting code enables and disables irqs and not PMIs
KVM: PPC: Book3S HV P9: Fix irq disabling in tick accounting
kvmhv_run_single_vcpu() disables PMIs as well as Linux irqs, however the tick time accounting code enables and disables irqs and not PMIs within this region. By chance this might not actually cause a bug, but it is clearly an incorrect use of the APIs.
Fixes: 2251fbe76395e ("KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-2-npiggin@gmail.com
show more ...
|
#
bc91c04b |
| 08-Sep-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV P9: Clear vcpu cpu fields before enabling host irqs
On guest entry, vcpu->cpu and vcpu->arch.thread_cpu are set after disabling host irqs. On guest exit there is a window whre ti
KVM: PPC: Book3S HV P9: Clear vcpu cpu fields before enabling host irqs
On guest entry, vcpu->cpu and vcpu->arch.thread_cpu are set after disabling host irqs. On guest exit there is a window whre tick time accounting briefly enables irqs before these fields are cleared.
Move them up to ensure they are cleared before host irqs are run. This is possibly not a problem, but is more symmetric and makes the fields less surprising.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-1-npiggin@gmail.com
show more ...
|
Revision tags: v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61 |
|
#
0a5bfb82 |
| 16-Aug-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Fix decrementer migration
We used to have a workaround[1] for a hang during migration that was made ineffective when we converted the decrementer expiry to be relative to guest
KVM: PPC: Book3S HV: Fix decrementer migration
We used to have a workaround[1] for a hang during migration that was made ineffective when we converted the decrementer expiry to be relative to guest timebase.
The point of the workaround was that in the absence of an explicit decrementer expiry value provided by userspace during migration, KVM needs to initialize dec_expires to a value that will result in an expired decrementer after subtracting the current guest timebase. That stops the vcpu from hanging after migration due to a decrementer that's too large.
If the dec_expires is now relative to guest timebase, its initialization needs to be guest timebase-relative as well, otherwise we end up with a decrementer expiry that is still larger than the guest timebase.
1- https://git.kernel.org/torvalds/c/5855564c8ab2
Fixes: 3c1a4322bba7 ("KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220816222517.1916391-1-farosas@linux.ibm.com
show more ...
|
Revision tags: v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56 |
|
#
d349ab99 |
| 17-Jul-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random: handle archrandom with multiple longs
The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one funct
random: handle archrandom with multiple longs
The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this.
On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts.
So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated.
Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390.
Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
show more ...
|
#
2b461880 |
| 18-Jul-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc: Fix all occurences of duplicate words
Since commit 87c78b612f4f ("powerpc: Fix all occurences of "the the"") fixed "the the", there's now a steady stream of patches fixing other duplicate w
powerpc: Fix all occurences of duplicate words
Since commit 87c78b612f4f ("powerpc: Fix all occurences of "the the"") fixed "the the", there's now a steady stream of patches fixing other duplicate words.
Just fix them all at once, to save the overhead of dealing with individual patches for each case.
This leaves a few cases of "that that", which in some contexts is correct.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220718095158.326606-1-mpe@ellerman.id.au
show more ...
|
Revision tags: v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44 |
|
#
b44bb1b7 |
| 25-May-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path
Alter the data collection points for the debug timing code in the P9 path to be more in line with what the code does. The points
KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path
Alter the data collection points for the debug timing code in the P9 path to be more in line with what the code does. The points where we accumulate time are now the following:
vcpu_entry: From vcpu_run_hv entry until the start of the inner loop;
guest_entry: From the start of the inner loop until the guest entry in asm;
in_guest: From the guest entry in asm until the return to KVM C code;
guest_exit: From the return into KVM C code until the corresponding hypercall/page fault handling or re-entry into the guest;
hypercall: Time spent handling hcalls in the kernel (hcalls can go to QEMU, not accounted here);
page_fault: Time spent handling page faults;
vcpu_exit: vcpu_run_hv exit (almost no code here currently).
Like before, these are exposed in debugfs in a file called "timings". There are four values:
- number of occurrences of the accumulation point; - total time the vcpu spent in the phase in ns; - shortest time the vcpu spent in the phase in ns; - longest time the vcpu spent in the phase in ns;
=== Before:
rm_entry: 53132 16793518 256 4060 rm_intr: 53132 2125914 22 340 rm_exit: 53132 24108344 374 2180 guest: 53132 40980507996 404 9997650 cede: 0 0 0 0
After:
vcpu_entry: 34637 7716108 178 4416 guest_entry: 52414 49365608 324 747542 in_guest: 52411 40828715840 258 9997480 guest_exit: 52410 19681717182 826 102496674 vcpu_exit: 34636 1744462 38 182 hypercall: 45712 22878288 38 1307962 page_fault: 992 111104034 568 168688
With just one instruction (hcall):
vcpu_entry: 1 942 942 942 guest_entry: 1 4044 4044 4044 in_guest: 1 1540 1540 1540 guest_exit: 1 3542 3542 3542 vcpu_exit: 1 80 80 80 hypercall: 0 0 0 0 page_fault: 0 0 0 0 ===
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-6-farosas@linux.ibm.com
show more ...
|
#
c3fa64c9 |
| 25-May-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path
We are currently doing the timing for debug purposes of the P9 entry path using the accumulators and terminology defined by the
KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path
We are currently doing the timing for debug purposes of the P9 entry path using the accumulators and terminology defined by the old entry path for P8 machines.
Not only the "real-mode" and "napping" mentions are out of place for the P9 Radix entry path but also we cannot change them because the timing code is coupled to the structures defined in struct kvm_vcpu_arch.
Add a new CONFIG_KVM_BOOK3S_HV_P9_TIMING to enable the timing code for the P9 entry path. For now, just add the new CONFIG and duplicate the structures. A subsequent patch will add the P9 changes.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-4-farosas@linux.ibm.com
show more ...
|
Revision tags: v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
ad55bae7 |
| 28-Mar-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
We removed most of the vcore logic from the P9 path but there's still a tracepoint that tried to dereference vc->runner.
Fixes: ecb6a7207f92 ("KVM:
KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
We removed most of the vcore logic from the P9 path but there's still a tracepoint that tried to dereference vc->runner.
Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
show more ...
|
#
cad32d9d |
| 06-May-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation.
The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine.
If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version.
On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs.
So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput.
This removes the real mode handlers from KVM and related code from the powernv platform.
This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed.
This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
show more ...
|
#
1d1cd0f1 |
| 25-Apr-2022 |
Fabiano Rosas <farosas@linux.ibm.com> |
KVM: PPC: Book3S HV: Initialize AMOR in nested entry
The hypervisor always sets AMOR to ~0, but let's ensure we're not passing stale values around.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.c
KVM: PPC: Book3S HV: Initialize AMOR in nested entry
The hypervisor always sets AMOR to ~0, but let's ensure we're not passing stale values around.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
show more ...
|
Revision tags: v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27 |
|
#
2852ebfa |
| 02-Mar-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to
KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to be sent to L2 and missed by the L0.
Clearing LPES when it may be set, as typically happens with XIVE enabled could cause a performance issue despite having no native XIVE support in the guest, because it will cause mediated interrupts for the L2 to be taken in HV mode, which then have to be injected.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
show more ...
|
#
11681b79 |
| 02-Mar-2022 |
Nicholas Piggin <npiggin@gmail.com> |
KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
The PowerNV L0 currently pushes the OS xive context when running a vCPU, regardless of whether it is running a nested guest. The prob
KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
The PowerNV L0 currently pushes the OS xive context when running a vCPU, regardless of whether it is running a nested guest. The problem is that xive OS ring interrupts will be delivered while the L2 is running.
At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which actually makes external interrupts go to the L0. That causes the L2 to exit and the interrupt taken or injected into the L1, so in some respects this behaves like an escalation. It's not clear if this was deliberate or not, there's no comment about it and the L1 is actually allowed to clear LPES in the L2, so it's confusing at best.
When the L2 is running, the L1 is essentially in a ceded state with respect to external interrupts (it can't respond to them directly and won't get scheduled again absent some additional event). So the natural way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to run the L2, have it arm the escalation interrupt and don't push the L1 context while running the L2.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
show more ...
|