#
65b2514e |
| 23-Apr-2024 |
Jacob Pan <jacob.jun.pan@linux.intel.com> |
KVM: VMX: Move posted interrupt descriptor out of VMX code
[ Upstream commit 699f67512f04cbaee965fad872702c06eaf440f6 ]
To prepare native usage of posted interrupts, move the PID declarations out o
KVM: VMX: Move posted interrupt descriptor out of VMX code
[ Upstream commit 699f67512f04cbaee965fad872702c06eaf440f6 ]
To prepare native usage of posted interrupts, move the PID declarations out of VMX code such that they can be shared.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20240423174114.526704-2-jacob.jun.pan@linux.intel.com Stable-dep-of: d83c36d822be ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
ebfed7be |
| 05-Dec-2023 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: VMX: Split off vmx_onhyperv.{ch} from hyperv.{ch}
[ Upstream commit 50a82b0eb88c108d1ebc73a97f5b81df0d5918e0 ]
hyperv.{ch} is currently a mix of stuff which is needed by both Hyper-V on KVM an
KVM: VMX: Split off vmx_onhyperv.{ch} from hyperv.{ch}
[ Upstream commit 50a82b0eb88c108d1ebc73a97f5b81df0d5918e0 ]
hyperv.{ch} is currently a mix of stuff which is needed by both Hyper-V on KVM and KVM on Hyper-V. As a preparation to making Hyper-V emulation optional, put KVM-on-Hyper-V specific code into dedicated files.
No functional change intended.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231205103630.1391318-4-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: d83c36d822be ("KVM: nVMX: Add a helper to get highest pending from Posted Interrupt vector") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
e06f46fd |
| 07-Jun-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Split out the non-virtualization part of vmx_interrupt_blocked()
commit 322a569c4b4188a0da2812f9e952780ce09b74ba upstream.
Move the non-VMX chunk of the "interrupt blocked" checks to a se
KVM: VMX: Split out the non-virtualization part of vmx_interrupt_blocked()
commit 322a569c4b4188a0da2812f9e952780ce09b74ba upstream.
Move the non-VMX chunk of the "interrupt blocked" checks to a separate helper so that KVM can reuse the code to detect if interrupts are blocked for L2, e.g. to determine if a virtual interrupt _for L2_ is a valid wake event. If L1 disables HLT-exiting for L2, nested APICv is enabled, and L2 HLTs, then L2 virtual interrupts are valid wake events, but if and only if interrupts are unblocked for L2.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240607172609.3205077-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
037e48ce |
| 06-Mar-2024 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/pmu: Disable support for adaptive PEBS
commit 9e985cbf2942a1bb8fcef9adc2a17d90fd7ca8ee upstream.
Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally bro
KVM: x86/pmu: Disable support for adaptive PEBS
commit 9e985cbf2942a1bb8fcef9adc2a17d90fd7ca8ee upstream.
Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest.
Bug #1 is that KVM doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc.
Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records.
Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests.
Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER.
Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records.
Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels.
Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity.
Link: https://lore.kernel.org/all/20240306230153.786365-1-seanjc@google.com Link: https://lore.kernel.org/all/ZeepGjHCeSfadANM@google.com Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Cc: Mingwei Zhang <mizhang@google.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhang Xiong <xiong.y.zhang@intel.com> Cc: Lv Zhiyuan <zhiyuan.lv@intel.com> Cc: Dapeng Mi <dapeng1.mi@intel.com> Cc: Jim Mattson <jmattson@google.com> Acked-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20240307005833.827147-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
e81742f6 |
| 03-Mar-2024 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
commit 43fb862de8f628c5db5e96831c915b9aebf62d33 upstream.
During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
commit 43fb862de8f628c5db5e96831c915b9aebf62d33 upstream.
During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like register push onto stack may put host data in MDS affected CPU buffers. A guest can then use MDS to sample host data.
Although likelihood of secrets surviving in registers at current VERW callsite is less, but it can't be ruled out. Harden the MDS mitigation by moving the VERW mitigation late in VMentry path.
Note that VERW for MMIO Stale Data mitigation is unchanged because of the complexity of per-guest conditional VERW which is not easy to handle that late in asm with no GPRs available. If the CPU is also affected by MDS, VERW is unconditionally executed late in asm regardless of guest having MMIO access.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
7a62647e |
| 03-Mar-2024 |
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
commit 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 upstream.
The VERW mitigation at exit-to-user is enabled via a static branch mds_use
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
commit 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 upstream.
The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm.
Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode().
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
aaff74d8 |
| 09-Feb-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
work around gcc bugs with 'asm goto' with outputs
commit 4356e9f841f7fbb945521cef3577ba394c65f3fc upstream.
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' m
work around gcc bugs with 'asm goto' with outputs
commit 4356e9f841f7fbb945521cef3577ba394c65f3fc upstream.
We've had issues with gcc and 'asm goto' before, and we created a 'asm_volatile_goto()' macro for that in the past: see commits 3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation bug") and a9f180345f53 ("compiler/gcc4: Make quirk for asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit 43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR 58670") because we no longer supported building the kernel with the affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar problem, which is fixed by re-applying that ancient workaround. But the problem in question is limited to only the 'asm goto with outputs' cases, so instead of re-introducing the old workaround as-is, let's rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/ Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
7de33b0f |
| 12-Sep-2023 |
Haitao Shan <hshan@google.com> |
KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.
commit 9cfec6d097c607e36199cf0cfbb8cf5acbd8e9b2 upstream.
When running android emulator (which is based on QEMU 2.12) on certain I
KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.
commit 9cfec6d097c607e36199cf0cfbb8cf5acbd8e9b2 upstream.
When running android emulator (which is based on QEMU 2.12) on certain Intel hosts with kernel version 6.3-rc1 or above, guest will freeze after loading a snapshot. This is almost 100% reproducible. By default, the android emulator will use snapshot to speed up the next launching of the same android guest. So this breaks the android emulator badly.
I tested QEMU 8.0.4 from Debian 12 with an Ubuntu 22.04 guest by running command "loadvm" after "savevm". The same issue is observed. At the same time, none of our AMD platforms is impacted. More experiments show that loading the KVM module with "enable_apicv=false" can workaround it.
The issue started to show up after commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode"). However, as is pointed out by Sean Christopherson, it is introduced by commit 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC"). commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode") just makes it easier to hit the issue.
Having both commits, the oneshot lapic timer gets fired immediately inside the KVM_SET_LAPIC call when loading the snapshot. On Intel platforms with APIC virtualization and posted interrupt processing, this eventually leads to setting the corresponding PIR bit. However, the whole PIR bits get cleared later in the same KVM_SET_LAPIC call by apicv_post_state_restore. This leads to timer interrupt lost.
The fix is to move vmx_apicv_post_state_restore to the beginning of the KVM_SET_LAPIC call and rename to vmx_apicv_pre_state_restore. What vmx_apicv_post_state_restore does is actually clearing any former apicv state and this behavior is more suitable to carry out in the beginning.
Fixes: 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haitao Shan <hshan@google.com> Link: https://lore.kernel.org/r/20230913000215.478387-1-hshan@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
50011c2a |
| 24-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling
Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM p
KVM: VMX: Refresh available regs and IDT vectoring info before NMI handling
Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available registers bitmask after handling NMIs results in KVM querying info from the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both missed NMIs and spurious NMIs in the host.
Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit path too, e.g. to guard against similar consumption of stale data. The field is read on every "normal" VM-Exit, and there's no point in delaying the inevitable.
Reported-by: Like Xu <like.xu.linux@gmail.com> Fixes: 11df586d774f ("KVM: VMX: Handle NMI VM-Exits in noinstr region") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825014532.2846714-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
9ca0c1a1 |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Delete ancient pr_warn() about KVM_SET_TSS_ADDR not being set
Delete KVM's printk about KVM_SET_TSS_ADDR not being called. When the printk was added by commit 776e58ea3d37 ("KVM: unbreak
KVM: VMX: Delete ancient pr_warn() about KVM_SET_TSS_ADDR not being set
Delete KVM's printk about KVM_SET_TSS_ADDR not being called. When the printk was added by commit 776e58ea3d37 ("KVM: unbreak userspace that does not sets tss address"), KVM also stuffed a "hopefully safe" value, i.e. the message wasn't purely informational. For reasons unknown, ostensibly to try and help people running outdated qemu-kvm versions, the message got left behind when KVM's stuffing was removed by commit 4918c6ca6838 ("KVM: VMX: Require KVM_SET_TSS_ADDR being called prior to running a VCPU").
Today, the message is completely nonsensical, as it has been over a decade since KVM supported userspace running a Real Mode guest, on a CPU without unrestricted guest support, without doing KVM_SET_TSS_ADDR before KVM_RUN. I.e. KVM's ABI has required KVM_SET_TSS_ADDR for 10+ years.
To make matters worse, the message is prone to false positives as it triggers when simply *creating* a vCPU due to RESET putting vCPUs into Real Mode, even when the user has no intention of ever *running* the vCPU in a Real Mode. E.g. KVM selftests stuff 64-bit mode and never touch Real Mode, but trigger the message even though they run just fine without doing KVM_SET_TSS_ADDR. Creating "dummy" vCPUs, e.g. to probe features, can also trigger the message. In both scenarios, the message confuses users and falsely implies that they've done something wrong.
Reported-by: Thorsten Glaser <t.glaser@tarent.de> Closes: https://lkml.kernel.org/r/f1afa6c0-cde2-ab8b-ea71-bfa62a45b956%40tarent.de Link: https://lore.kernel.org/r/20230815174215.433222-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
1c18efda |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nVMX: Use KVM-governed feature framework to track "nested VMX enabled"
Track "VMX exposed to L1" via a governed feature flag instead of using a dedicated helper to provide the same functionalit
KVM: nVMX: Use KVM-governed feature framework to track "nested VMX enabled"
Track "VMX exposed to L1" via a governed feature flag instead of using a dedicated helper to provide the same functionality. The main goal is to drive convergence between VMX and SVM with respect to querying features that are controllable via module param (SVM likes to cache nested features), avoiding the guest CPUID lookups at runtime is just a bonus and unlikely to provide any meaningful performance benefits.
Note, X86_FEATURE_VMX is set in kvm_cpu_caps if and only if "nested" is true, and the CPU obviously supports VMX if KVM+VMX is running. I.e. the check on "nested" is now implicitly down by the kvm_cpu_cap_has() check in kvm_governed_feature_check_and_set().
No functional change intended.
Reviewed-by: Yuan Yao <yuan.yao@intel.com> Reviwed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
fe60e8f6 |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"
Use the governed feature framework to track if XSAVES is "enabled", i.e. if XSAVES can be used by the guest. Add a comment in
KVM: x86: Use KVM-governed feature framework to track "XSAVES enabled"
Use the governed feature framework to track if XSAVES is "enabled", i.e. if XSAVES can be used by the guest. Add a comment in the SVM code to explain the very unintuitive logic of deliberately NOT checking if XSAVES is enumerated in the guest CPUID model.
No functional change intended.
Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
662f6815 |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Rename XSAVES control to follow KVM's preferred "ENABLE_XYZ"
Rename the XSAVES secondary execution control to follow KVM's preferred style so that XSAVES related logic can use common macro
KVM: VMX: Rename XSAVES control to follow KVM's preferred "ENABLE_XYZ"
Rename the XSAVES secondary execution control to follow KVM's preferred style so that XSAVES related logic can use common macros that depend on KVM's preferred style.
No functional change intended.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230815203653.519297-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
0497d2ac |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Check KVM CPU caps, not just VMX MSR support, for XSAVE enabling
Check KVM CPU capabilities instead of raw VMX support for XSAVES when determining whether or not XSAVER can/should be expos
KVM: VMX: Check KVM CPU caps, not just VMX MSR support, for XSAVE enabling
Check KVM CPU capabilities instead of raw VMX support for XSAVES when determining whether or not XSAVER can/should be exposed to the guest. Practically speaking, it's nonsensical/impossible for a CPU to support "enable XSAVES" without XSAVES being supported natively. The real motivation for checking kvm_cpu_cap_has() is to allow using the governed feature's standard check-and-set logic.
Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
1143c0b8 |
| 15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Recompute "XSAVES enabled" only after CPUID update
Recompute whether or not XSAVES is enabled for the guest only if the guest's CPUID model changes instead of redoing the computation every
KVM: VMX: Recompute "XSAVES enabled" only after CPUID update
Recompute whether or not XSAVES is enabled for the guest only if the guest's CPUID model changes instead of redoing the computation every time KVM generates vmcs01's secondary execution controls. The boot_cpu_has() and cpu_has_vmx_xsaves() checks should never change after KVM is loaded, and if they do the kernel/KVM is hosed.
Opportunistically add a comment explaining _why_ XSAVES is effectively exposed to the guest if and only if XSAVE is also exposed to the guest.
Practically speaking, no functional change intended (KVM will do fewer computations, but should still see the same xsaves_enabled value whenever KVM looks at it).
Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
7d18eef1 |
| 10-Aug-2023 |
Shiyuan Gao <gaoshiyuan@baidu.com> |
KVM: VMX: Rename vmx_get_max_tdp_level() to vmx_get_max_ept_level()
In VMX, ept_level looks better than tdp_level and is consistent with SVM's get_npt_level().
Signed-off-by: Shiyuan Gao <gaoshiyua
KVM: VMX: Rename vmx_get_max_tdp_level() to vmx_get_max_ept_level()
In VMX, ept_level looks better than tdp_level and is consistent with SVM's get_npt_level().
Signed-off-by: Shiyuan Gao <gaoshiyuan@baidu.com> Link: https://lore.kernel.org/r/20230810113853.98114-1-gaoshiyuan@baidu.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
28b82352 |
| 09-Aug-2023 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/apic: Wrap IPI calls into helper functions
Move them to one place so the static call conversion gets simpler.
No functional change.
[ dhansen: merge against recent x86/apic changes ]
Signed-o
x86/apic: Wrap IPI calls into helper functions
Move them to one place so the static call conversion gets simpler.
No functional change.
[ dhansen: merge against recent x86/apic changes ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
show more ...
|
#
2d636990 |
| 28-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks
Drop the @offset and @multiplier params from the kvm_x86_ops hooks for propagating TSC offsets/multipliers into hardware, and i
KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks
Drop the @offset and @multiplier params from the kvm_x86_ops hooks for propagating TSC offsets/multipliers into hardware, and instead have the vendor implementations pull the information directly from the vCPU structure. The respective vCPU fields _must_ be written at the same time in order to maintain consistent state, i.e. it's not random luck that the value passed in by all callers is grabbed from the vCPU.
Explicitly grabbing the value from the vCPU field in SVM's implementation in particular will allow for additional cleanup without introducing even more subtle dependencies. Specifically, SVM can skip the WRMSR if guest state isn't loaded, i.e. svm_prepare_switch_to_guest() will load the correct value for the vCPU prior to entering the guest.
This also reconciles KVM's handling of related values that are stored in the vCPU, as svm_write_tsc_offset() already assumes/requires the caller to have updated l1_tsc_offset.
Link: https://lore.kernel.org/r/20230729011608.1065019-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
a788fbb7 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0
Bail from vmx_emergency_disable() without processing the list of loaded VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON.
KVM: VMX: Skip VMCLEAR logic during emergency reboots if CR4.VMXE=0
Bail from vmx_emergency_disable() without processing the list of loaded VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be impossible for the list to have entries if VMX is already disabled, and even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. processing the list is pointless even if it somehow isn't empty.
Assuming no existing KVM bugs, this should be a glorified nop. The primary motivation for the change is to avoid having code that looks like it does VMCLEAR, but then skips VMXON, which is nonsensical.
Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
6ae44e01 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
Set kvm_rebooting when virtualization is disabled in an emergency so that KVM eats faults on virtualization instructions even if kvm_
KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
Set kvm_rebooting when virtualization is disabled in an emergency so that KVM eats faults on virtualization instructions even if kvm_reboot() isn't reached.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
f9a88660 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: VMX: Ensure CPU is stable when probing basic VMX support
Disable migration when probing VMX support during module load to ensure the CPU is stable, mostly to match similar SVM logic, where allo
KVM: VMX: Ensure CPU is stable when probing basic VMX support
Disable migration when probing VMX support during module load to ensure the CPU is stable, mostly to match similar SVM logic, where allowing migration effective requires deliberately writing buggy code. As a bonus, KVM won't report the wrong CPU to userspace if VMX is unsupported, but in practice that is a very, very minor bonus as the only way that reporting the wrong CPU would actually matter is if hardware is broken or if the system is misconfigured, i.e. if KVM gets migrated from a CPU that _does_ support VMX to a CPU that does _not_ support VMX.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
22e420e1 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
x86/virt: KVM: Move VMXOFF helpers into KVM VMX
Now that VMX is disabled in emergencies via the virt callbacks, move the VMXOFF helpers into KVM, the only remaining user.
No functional change inten
x86/virt: KVM: Move VMXOFF helpers into KVM VMX
Now that VMX is disabled in emergencies via the virt callbacks, move the VMXOFF helpers into KVM, the only remaining user.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
b6a6af0d |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
x86/virt: KVM: Open code cpu_has_vmx() in KVM VMX
Fold the raw CPUID check for VMX into kvm_is_vmx_supported(), its sole user. Keep the check even though KVM also checks X86_FEATURE_VMX, as the int
x86/virt: KVM: Open code cpu_has_vmx() in KVM VMX
Fold the raw CPUID check for VMX into kvm_is_vmx_supported(), its sole user. Keep the check even though KVM also checks X86_FEATURE_VMX, as the intent is to provide a unique error message if VMX is unsupported by hardware, whereas X86_FEATURE_VMX may be clear due to firmware and/or kernel actions.
No functional change intended.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
119b5cb4 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
x86/reboot: KVM: Handle VMXOFF in KVM's reboot callback
Use KVM VMX's reboot/crash callback to do VMXOFF in an emergency instead of manually and blindly doing VMXOFF. There's no need to attempt VMX
x86/reboot: KVM: Handle VMXOFF in KVM's reboot callback
Use KVM VMX's reboot/crash callback to do VMXOFF in an emergency instead of manually and blindly doing VMXOFF. There's no need to attempt VMXOFF if a hypervisor, i.e. KVM, isn't loaded/active, i.e. if the CPU can't possibly be post-VMXON.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
5e408396 |
| 21-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
x86/reboot: Harden virtualization hooks for emergency reboot
Provide dedicated helpers to (un)register virt hooks used during an emergency crash/reboot, and WARN if there is an attempt to overwrite
x86/reboot: Harden virtualization hooks for emergency reboot
Provide dedicated helpers to (un)register virt hooks used during an emergency crash/reboot, and WARN if there is an attempt to overwrite the registered callback, or an attempt to do an unpaired unregister.
Opportunsitically use rcu_assign_pointer() instead of RCU_INIT_POINTER(), mainly so that the set/unset paths are more symmetrical, but also because any performance gains from using RCU_INIT_POINTER() are meaningless for this code.
Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20230721201859.2307736-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|