Revision tags: v5.15.54 |
|
#
a63f7778 |
| 08-Jul-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.19-rc5' into next
Merge with mainline to bring up the latest definition from MFD subsystem needed for Mediatek keypad driver.
|
Revision tags: v5.15.53 |
|
#
dd84cfff |
| 04-Jul-2022 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v5.19-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.19
A collection of fixes for v5.19, quite large but nothing major -
Merge tag 'asoc-fix-v5.19-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.19
A collection of fixes for v5.19, quite large but nothing major - a good chunk of it is more stuff that was identified by mixer-test regarding event generation.
show more ...
|
Revision tags: v5.15.52, v5.15.51 |
|
#
ee56c3e8 |
| 27-Jun-2022 |
akpm <akpm@linux-foundation.org> |
Merge branch 'master' into mm-nonmm-stable
|
#
46a3b112 |
| 27-Jun-2022 |
akpm <akpm@linux-foundation.org> |
Merge branch 'master' into mm-stable
|
Revision tags: v5.15.50 |
|
#
93817be8 |
| 23-Jun-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
Revision tags: v5.15.49 |
|
#
2b1333b8 |
| 20-Jun-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get new regmap APIs of v5.19-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
22fe2b36 |
| 20-Jun-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge v5.19-rc3 into usb-next
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
df36f3e3 |
| 20-Jun-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge tag 'v5.19-rc3' into tty-next
We need the tty/serial fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
e8f4118f |
| 20-Jun-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.19-rc3 into staging-next
This resolves the merge issue with: drivers/staging/r8188eu/os_dep/ioctl_linux.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Har
Merge 5.19-rc3 into staging-next
This resolves the merge issue with: drivers/staging/r8188eu/os_dep/ioctl_linux.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
05c6ca85 |
| 19-Jun-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Make RESERVE_BRK() work again with older binutils. The recent '
Merge tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Make RESERVE_BRK() work again with older binutils. The recent 'simplification' broke that.
- Make early #VE handling increment RIP when successful.
- Make the #VE code consistent vs. the RIP adjustments and add comments.
- Handle load_unaligned_zeropad() across page boundaries correctly in #VE when the second page is shared.
* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page x86/tdx: Clarify RIP adjustments in #VE handler x86/tdx: Fix early #VE handling x86/mm: Fix RESERVE_BRK() for older binutils
show more ...
|
Revision tags: v5.15.48, v5.15.47 |
|
#
1e776965 |
| 14-Jun-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads.
In TDX guests, the second page can be shared page and a VMM may configure it to trigger #VE.
The kernel assumes that #VE on a shared page is an MMIO access and tries to decode instruction to handle it. In case of load_unaligned_zeropad() it may result in confusion as it is not MMIO access.
Fix it by detecting split page MMIO accesses and failing them. load_unaligned_zeropad() will recover using exception fixups.
The issue was discovered by analysis and reproduced artificially. It was not triggered during testing.
[ dhansen: fix up changelogs and comments for grammar and clarity, plus incorporate Kirill's off-by-one fix]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
show more ...
|
#
cdd85786 |
| 14-Jun-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Clarify RIP adjustments in #VE handler
After successful #VE handling, tdx_handle_virt_exception() has to move RIP to the next instruction. The handler needs to know the length of the instru
x86/tdx: Clarify RIP adjustments in #VE handler
After successful #VE handling, tdx_handle_virt_exception() has to move RIP to the next instruction. The handler needs to know the length of the instruction.
If the #VE happened due to instruction execution, the GET_VEINFO TDX module call provides info on the instruction in R10, including its length.
For #VE due to EPT violation, the info in R10 is not populand and the kernel must decode the instruction manually to find out its length.
Restructure the code to make it explicit that the instruction length depends on the type of #VE. Make individual #VE handlers return the instruction length on success or -errno on failure.
[ dhansen: fix up changelog and comments ]
Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-3-kirill.shutemov@linux.intel.com
show more ...
|
#
60428d8b |
| 14-Jun-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Fix early #VE handling
tdx_early_handle_ve() does not increment RIP after successfully handling the exception. That leads to infinite loop of exceptions.
Move RIP when exceptions are succ
x86/tdx: Fix early #VE handling
tdx_early_handle_ve() does not increment RIP after successfully handling the exception. That leads to infinite loop of exceptions.
Move RIP when exceptions are successfully handled.
[ dhansen: make problem statement more clear ]
Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-2-kirill.shutemov@linux.intel.com
show more ...
|
#
f777316e |
| 15-Jun-2022 |
Takashi Iwai <tiwai@suse.de> |
Merge branch 'topic/ctl-enhancements' into for-next
Pull ALSA control enhancement patches. One is the faster lookup of control elements, and another is to introduce the input data validation.
Signe
Merge branch 'topic/ctl-enhancements' into for-next
Pull ALSA control enhancement patches. One is the faster lookup of control elements, and another is to introduce the input data validation.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
show more ...
|
#
66da6500 |
| 09-Jun-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 5.19, take #1
- Typo fix in arch/riscv/kvm/vmid.c
- Remove broken reference pattern from MAIN
Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 5.19, take #1
- Typo fix in arch/riscv/kvm/vmid.c
- Remove broken reference pattern from MAINTAINERS entry
show more ...
|
Revision tags: v5.15.46 |
|
#
6e2b347d |
| 08-Jun-2022 |
Maxime Ripard <maxime@cerno.tech> |
Merge v5.19-rc1 into drm-misc-fixes
Let's kick-off the start of the 5.19 fix cycle
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
#
073350da |
| 07-Jun-2022 |
Mark Brown <broonie@kernel.org> |
Merge tag 'v5.19-rc1' into asoc-5.19
Linux 5.19-rc1
|
Revision tags: v5.15.45, v5.15.44, v5.15.43, v5.15.42 |
|
#
3a755ebc |
| 23-May-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov: "Intel Trust Domain Extensions (TDX) support.
This is the
Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov: "Intel Trust Domain Extensions (TDX) support.
This is the Intel version of a confidential computing solution called Trust Domain Extensions (TDX). This series adds support to run the kernel as part of a TDX guest. It provides similar guest protections to AMD's SEV-SNP like guest memory and register state encryption, memory integrity protection and a lot more.
Design-wise, it differs from AMD's solution considerably: it uses a software module which runs in a special CPU mode called (Secure Arbitration Mode) SEAM. As the name suggests, this module serves as sort of an arbiter which the confidential guest calls for services it needs during its lifetime.
Just like AMD's SNP set, this series reworks and streamlines certain parts of x86 arch code so that this feature can be properly accomodated"
* tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/tdx: Fix RETs in TDX asm x86/tdx: Annotate a noreturn function x86/mm: Fix spacing within memory encryption features message x86/kaslr: Fix build warning in KASLR code in boot stub Documentation/x86: Document TDX kernel architecture ACPICA: Avoid cache flush inside virtual machines x86/tdx/ioapic: Add shared bit for IOAPIC base address x86/mm: Make DMA memory shared for TD guest x86/mm/cpa: Add support for TDX shared memory x86/tdx: Make pages shared in ioremap() x86/topology: Disable CPU online/offline control for TDX guests x86/boot: Avoid #VE during boot for TDX platforms x86/boot: Set CR0.NE early and keep it set during the boot x86/acpi/x86/boot: Add multiprocessor wake-up support x86/boot: Add a trampoline for booting APs via firmware handoff x86/tdx: Wire up KVM hypercalls x86/tdx: Port I/O: Add early boot support x86/tdx: Port I/O: Add runtime hypercalls x86/boot: Port I/O: Add decompression-time support for TDX x86/boot: Port I/O: Allow to hook up alternative helpers ...
show more ...
|
Revision tags: v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33 |
|
#
7dbde763 |
| 05-Apr-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm/cpa: Add support for TDX shared memory
Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared.
It is a two-step
x86/mm/cpa: Add support for TDX shared memory
Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared.
It is a two-step process: the guest sets the shared bit in the page table entry and notifies VMM about the change. The notification happens using MapGPA hypercall.
Conversion back to private memory requires clearing the shared bit, notifying VMM with MapGPA hypercall following with accepting the memory with AcceptPage hypercall.
Provide a TDX version of x86_platform.guest.* callbacks. It makes __set_memory_enc_pgtable() work right in TDX guest.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-27-kirill.shutemov@linux.intel.com
show more ...
|
#
cfb8ec7a |
| 05-Apr-2022 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
x86/tdx: Wire up KVM hypercalls
KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI is similar, those instructions no longer function for TDX guests.
Make vendor-specific TDVMCA
x86/tdx: Wire up KVM hypercalls
KVM hypercalls use the VMCALL or VMMCALL instructions. Although the ABI is similar, those instructions no longer function for TDX guests.
Make vendor-specific TDVMCALLs instead of VMCALL. This enables TDX guests to run with KVM acting as the hypervisor.
Among other things, KVM hypercall is used to send IPIs.
Since the KVM driver can be built as a kernel module, export tdx_kvm_hypercall() to make the symbols visible to kvm.ko.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-20-kirill.shutemov@linux.intel.com
show more ...
|
#
32e72854 |
| 05-Apr-2022 |
Andi Kleen <ak@linux.intel.com> |
x86/tdx: Port I/O: Add early boot support
TDX guests cannot do port I/O directly. The TDX module triggers a #VE exception to let the guest kernel emulate port I/O by converting them into TDCALLs to
x86/tdx: Port I/O: Add early boot support
TDX guests cannot do port I/O directly. The TDX module triggers a #VE exception to let the guest kernel emulate port I/O by converting them into TDCALLs to call the host.
But before IDT handlers are set up, port I/O cannot be emulated using normal kernel #VE handlers. To support the #VE-based emulation during this boot window, add a minimal early #VE handler support in early exception handlers. This is similar to what AMD SEV does. This is mainly to support earlyprintk's serial driver, as well as potentially the VGA driver.
The early handler only supports I/O-related #VE exceptions. Unhandled or failed exceptions will be handled via early_fixup_exceptions() (like normal exception failures). At runtime I/O-related #VE exceptions (along with other types) handled by virt_exception_kernel().
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-19-kirill.shutemov@linux.intel.com
show more ...
|
#
03149948 |
| 05-Apr-2022 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
x86/tdx: Port I/O: Add runtime hypercalls
TDX hypervisors cannot emulate instructions directly. This includes port I/O which is normally emulated in the hypervisor. All port I/O instructions inside
x86/tdx: Port I/O: Add runtime hypercalls
TDX hypervisors cannot emulate instructions directly. This includes port I/O which is normally emulated in the hypervisor. All port I/O instructions inside TDX trigger the #VE exception in the guest and would be normally emulated there.
Use a hypercall to emulate port I/O. Extend the tdx_handle_virt_exception() and add support to handle the #VE due to port I/O instructions.
String I/O operations are not supported in TDX. Unroll them by declaring CC_ATTR_GUEST_UNROLL_STRING_IO confidential computing attribute.
== Userspace Implications ==
The ioperm() facility allows userspace access to I/O instructions like inb/outb. Among other things, this allows writing userspace device drivers.
This series has no special handling for ioperm(). Users will be able to successfully request I/O permissions but will induce a #VE on their first I/O instruction which leads SIGSEGV. If this is undesirable users can enable kernel lockdown feature with 'lockdown=integrity' kernel command line option. It makes ioperm() fail.
More robust handling of this situation (denying ioperm() in all TDX guests) will be addressed in follow-on work.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-18-kirill.shutemov@linux.intel.com
show more ...
|
#
31d58c4e |
| 05-Apr-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Handle in-kernel MMIO
In non-TDX VMs, MMIO is implemented by providing the guest a mapping which will cause a VMEXIT on access and then the VMM emulating the instruction that caused the VME
x86/tdx: Handle in-kernel MMIO
In non-TDX VMs, MMIO is implemented by providing the guest a mapping which will cause a VMEXIT on access and then the VMM emulating the instruction that caused the VMEXIT. That's not possible for TDX VM.
To emulate an instruction an emulator needs two things:
- R/W access to the register file to read/modify instruction arguments and see RIP of the faulted instruction.
- Read access to memory where instruction is placed to see what to emulate. In this case it is guest kernel text.
Both of them are not available to VMM in TDX environment:
- Register file is never exposed to VMM. When a TD exits to the module, it saves registers into the state-save area allocated for that TD. The module then scrubs these registers before returning execution control to the VMM, to help prevent leakage of TD state.
- TDX does not allow guests to execute from shared memory. All executed instructions are in TD-private memory. Being private to the TD, VMMs have no way to access TD-private memory and no way to read the instruction to decode and emulate it.
In TDX the MMIO regions are instead configured by VMM to trigger a #VE exception in the guest.
Add #VE handling that emulates the MMIO instruction inside the guest and converts it into a controlled hypercall to the host.
This approach is bad for performance. But, it has (virtually) no impact on the size of the kernel image and will work for a wide variety of drivers. This allows TDX deployments to use arbitrary devices and device drivers, including virtio. TDX customers have asked for the capability to use random devices in their deployments.
In other words, even if all of the work was done to paravirtualize all x86 MMIO users and virtio, this approach would still be needed. There is essentially no way to get rid of this code.
This approach is functional for all in-kernel MMIO users current and future and does so with a minimal amount of code and kernel image bloat.
MMIO addresses can be used with any CPU instruction that accesses memory. Address only MMIO accesses done via io.h helpers, such as 'readl()' or 'writeq()'.
Any CPU instruction that accesses memory can also be used to access MMIO. However, by convention, MMIO access are typically performed via io.h helpers such as 'readl()' or 'writeq()'.
The io.h helpers intentionally use a limited set of instructions when accessing MMIO. This known, limited set of instructions makes MMIO instruction decoding and emulation feasible in KVM hosts and SEV guests today.
MMIO accesses performed without the io.h helpers are at the mercy of the compiler. Compilers can and will generate a much more broad set of instructions which can not practically be decoded and emulated. TDX guests will oops if they encounter one of these decoding failures.
This means that TDX guests *must* use the io.h helpers to access MMIO.
This requirement is not new. Both KVM hosts and AMD SEV guests have the same limitations on MMIO access.
=== Potential alternative approaches ===
== Paravirtualizing all MMIO ==
An alternative to letting MMIO induce a #VE exception is to avoid the #VE in the first place. Similar to the port I/O case, it is theoretically possible to paravirtualize MMIO accesses.
Like the exception-based approach offered here, a fully paravirtualized approach would be limited to MMIO users that leverage common infrastructure like the io.h macros.
However, any paravirtual approach would be patching approximately 120k call sites. Any paravirtual approach would need to replace a bare memory access instruction with (at least) a function call. With a conservative overhead estimation of 5 bytes per call site (CALL instruction), it leads to bloating code by 600k.
Many drivers will never be used in the TDX environment and the bloat cannot be justified.
== Patching TDX drivers ==
Rather than touching the entire kernel, it might also be possible to just go after drivers that use MMIO in TDX guests *and* are performance critical to justify the effrort. Right now, that's limited only to virtio.
All virtio MMIO appears to be done through a single function, which makes virtio eminently easy to patch.
This approach will be adopted in the future, removing the bulk of MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-12-kirill.shutemov@linux.intel.com
show more ...
|
#
c141fa2c |
| 05-Apr-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Handle CPUID via #VE
In TDX guests, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE.
Implement the #VE handling for EXIT_REASON_CPUID by hand
x86/tdx: Handle CPUID via #VE
In TDX guests, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE.
Implement the #VE handling for EXIT_REASON_CPUID by handing it through the hypercall, which in turn lets the TDX module handle it by invoking the host VMM.
More details on CPUID Virtualization can be found in the TDX module specification, the section titled "CPUID Virtualization".
Note that VMM that handles the hypercall is not trusted. It can return data that may steer the guest kernel in wrong direct. Only allow VMM to control range reserved for hypervisor communication.
Return all-zeros for any CPUID outside the hypervisor range. It matches CPU behaviour for non-supported leaf.
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-11-kirill.shutemov@linux.intel.com
show more ...
|
#
ae87f609 |
| 05-Apr-2022 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Add MSR support for TDX guests
Use hypercall to emulate MSR read/write for the TDX platform.
There are two viable approaches for doing MSRs in a TD guest:
1. Execute the RDMSR/WRMSR instr
x86/tdx: Add MSR support for TDX guests
Use hypercall to emulate MSR read/write for the TDX platform.
There are two viable approaches for doing MSRs in a TD guest:
1. Execute the RDMSR/WRMSR instructions like most VMs and bare metal do. Some will succeed, others will cause a #VE. All of those that cause a #VE will be handled with a TDCALL. 2. Use paravirt infrastructure. The paravirt hook has to keep a list of which MSRs would cause a #VE and use a TDCALL. All other MSRs execute RDMSR/WRMSR instructions directly.
The second option can be ruled out because the list of MSRs was challenging to maintain. That leaves option #1 as the only viable solution for the minimal TDX support.
Kernel relies on the exception fixup machinery to handle MSR access errors. #VE handler uses the same exception fixup code as #GP. It covers MSR accesses along with other types of fixups.
For performance-critical MSR writes (like TSC_DEADLINE), future patches will replace the WRMSR/#VE sequence with the direct TDCALL.
RDMSR and WRMSR specification details can be found in Guest-Host-Communication Interface (GHCI) for Intel Trust Domain Extensions (Intel TDX) specification, sec titled "TDG.VP. VMCALL<Instruction.RDMSR>" and "TDG.VP.VMCALL<Instruction.WRMSR>".
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-10-kirill.shutemov@linux.intel.com
show more ...
|