239bff01 | 04-Dec-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Allow 32-bit emulation by default
[ upstream commit f4116bfc44621882556bbf70f5284fbf429a5cf6 ]
32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an inter
x86/tdx: Allow 32-bit emulation by default
[ upstream commit f4116bfc44621882556bbf70f5284fbf429a5cf6 ]
32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an interrupt on vector 0x80.
Now that int80_emulation() has a check for external interrupts the limitation can be lifted.
To distinguish software interrupts from external ones, int80_emulation() checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0.
On TDX, the VAPIC state (including ISR) is protected and cannot be manipulated by the VMM. The ISR bit is set by the microcode flow during the handling of posted interrupts.
[ dhansen: more changelog tweaks ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
195edce0 | 06-Jun-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
tl;dr: There is a race in the TDX private<=>shared conversion code which could kill the TDX guest. Fix it by cha
x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
tl;dr: There is a race in the TDX private<=>shared conversion code which could kill the TDX guest. Fix it by changing conversion ordering to eliminate the window.
TDX hardware maintains metadata to track which pages are private and shared. Additionally, TDX guests use the guest x86 page tables to specify whether a given mapping is intended to be private or shared. Bad things happen when the intent and metadata do not match.
So there are two thing in play: 1. "the page" -- the physical TDX page metadata 2. "the mapping" -- the guest-controlled x86 page table intent
For instance, an unrecoverable exit to VMM occurs if a guest touches a private mapping that points to a shared physical page.
In summary: * Private mapping => Private Page == OK (obviously) * Shared mapping => Shared Page == OK (obviously) * Private mapping => Shared Page == BIG BOOM! * Shared mapping => Private Page == OK-ish (It will read generate a recoverable #VE via handle_mmio())
Enter load_unaligned_zeropad(). It can touch memory that is adjacent but otherwise unrelated to the memory it needs to touch. It will cause one of those unrecoverable exits (aka. BIG BOOM) if it blunders into a shared mapping pointing to a private page.
This is a problem when __set_memory_enc_pgtable() converts pages from shared to private. It first changes the mapping and second modifies the TDX page metadata. It's moving from:
* Shared mapping => Shared Page == OK to: * Private mapping => Shared Page == BIG BOOM!
This means that there is a window with a shared mapping pointing to a private page where load_unaligned_zeropad() can strike.
Add a TDX handler for guest.enc_status_change_prepare(). This converts the page from shared to private *before* the page becomes private. This ensures that there is never a private mapping to a shared page.
Leave a guest.enc_status_change_finish() in place but only use it for private=>shared conversions. This will delay updating the TDX metadata marking the page private until *after* the mapping matches the metadata. This also ensures that there is never a private mapping to a shared page.
[ dhansen: rewrite changelog ]
Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20230606095622.1939-3-kirill.shutemov%40linux.intel.com
show more ...
|
c2b353ae | 06-Jun-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Refactor try_accept_one()
Rework try_accept_one() to return accepted size instead of modifying 'start' inside the helper. It makes 'start' in-only argument and streamlines code on the calle
x86/tdx: Refactor try_accept_one()
Rework try_accept_one() to return accepted size instead of modifying 'start' inside the helper. It makes 'start' in-only argument and streamlines code on the caller side.
Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230606142637.5171-9-kirill.shutemov@linux.intel.com
show more ...
|
1e70c680 | 30-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Do not corrupt frame-pointer in __tdx_hypercall()
If compiled with CONFIG_FRAME_POINTER=y, objtool is not happy that __tdx_hypercall() messes up RBP:
objtool: __tdx_hypercall+0x7f: retur
x86/tdx: Do not corrupt frame-pointer in __tdx_hypercall()
If compiled with CONFIG_FRAME_POINTER=y, objtool is not happy that __tdx_hypercall() messes up RBP:
objtool: __tdx_hypercall+0x7f: return with modified stack frame
Rework the function to store TDX_HCALL_ flags on stack instead of RBP.
[ dhansen: minor changelog tweaks ]
Fixes: c30c4b2555ba ("x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/202301290255.buUBs99R-lkp@intel.com Link: https://lore.kernel.org/all/20230130135354.27674-1-kirill.shutemov%40linux.intel.com
show more ...
|
8de62af0 | 26-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Disable NOTIFY_ENABLES
== Background ==
There is a class of side-channel attacks against SGX enclaves called "SGX Step"[1]. These attacks create lots of exceptions inside of enclaves. Basi
x86/tdx: Disable NOTIFY_ENABLES
== Background ==
There is a class of side-channel attacks against SGX enclaves called "SGX Step"[1]. These attacks create lots of exceptions inside of enclaves. Basically, run an in-enclave instruction, cause an exception. Over and over.
There is a concern that a VMM could attack a TDX guest in the same way by causing lots of #VE's. The TDX architecture includes new countermeasures for these attacks. It basically counts the number of exceptions and can send another *special* exception once the number of VMM-induced #VE's hits a critical threshold[2].
== Problem ==
But, these special exceptions are independent of any action that the guest takes. They can occur anywhere that the guest executes. This includes sensitive areas like the entry code. The (non-paranoid) #VE handler is incapable of handling exceptions in these areas.
== Solution ==
Fortunately, the special exceptions can be disabled by the guest via write to NOTIFY_ENABLES TDCS field. NOTIFY_ENABLES is disabled by default, but might be enabled by a bootloader, firmware or an earlier kernel before the current kernel runs.
Disable NOTIFY_ENABLES feature explicitly and unconditionally. Any NOTIFY_ENABLES-based #VE's that occur before this point will end up in the early #VE exception handler and die due to unexpected exit reason.
[1] https://github.com/jovanbulck/sgx-step [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#safety-against-ve-in-kernel-code
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-8-kirill.shutemov%40linux.intel.com
show more ...
|
47e67cf3 | 26-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Relax SEPT_VE_DISABLE check for debug TD
A "SEPT #VE" occurs when a TDX guest touches memory that is not properly mapped into the "secure EPT". This can be the result of hypervisor attacks
x86/tdx: Relax SEPT_VE_DISABLE check for debug TD
A "SEPT #VE" occurs when a TDX guest touches memory that is not properly mapped into the "secure EPT". This can be the result of hypervisor attacks or bugs, *OR* guest bugs. Most notably, buggy guests might touch unaccepted memory for lots of different memory safety bugs like buffer overflows.
TDX guests do not want to continue in the face of hypervisor attacks or hypervisor bugs. They want to terminate as fast and safely as possible. SEPT_VE_DISABLE ensures that TDX guests *can't* continue in the face of these kinds of issues.
But, that causes a problem. TDX guests that can't continue can't spit out oopses or other debugging info. In essence SEPT_VE_DISABLE=1 guests are not debuggable.
Relax the SEPT_VE_DISABLE check to warning on debug TD and panic() in the #VE handler on EPT-violation on private memory. It will produce useful backtrace.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-7-kirill.shutemov%40linux.intel.com
show more ...
|
71acdcd7 | 26-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLE
Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set. If it is not set, the kernel is theoretically required to handle
x86/tdx: Use ReportFatalError to report missing SEPT_VE_DISABLE
Linux TDX guests require that the SEPT_VE_DISABLE "attribute" be set. If it is not set, the kernel is theoretically required to handle exceptions anywhere that kernel memory is accessed, including places like NMI handlers and in the syscall entry gap.
Rather than even try to handle these exceptions, the kernel refuses to run if SEPT_VE_DISABLE is unset.
However, the SEPT_VE_DISABLE detection and refusal code happens very early in boot, even before earlyprintk runs. Calling panic() will effectively just hang the system.
Instead, call a TDX-specific panic() function. This makes a very simple TDVMCALL which gets a short error string out to the hypervisor without any console infrastructure.
Use TDG.VP.VMCALL<ReportFatalError> to report the error. The hypercall can encode message up to 64 bytes in eight registers.
[ dhansen: tweak comment and remove while loop brackets. ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-6-kirill.shutemov%40linux.intel.com
show more ...
|
752d1330 | 26-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Expand __tdx_hypercall() to handle more arguments
So far __tdx_hypercall() only handles six arguments for VMCALL. Expanding it to six more register would allow to cover more use-cases like
x86/tdx: Expand __tdx_hypercall() to handle more arguments
So far __tdx_hypercall() only handles six arguments for VMCALL. Expanding it to six more register would allow to cover more use-cases like ReportFatalError() and Hyper-V hypercalls.
With all preparations in place, the expansion is pretty straight forward.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-5-kirill.shutemov%40linux.intel.com
show more ...
|
c30c4b25 | 26-Jan-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments
RDI is the first argument to __tdx_hypercall() that used to pass pointer to struct tdx_hypercall_args. RSI is the second argumen
x86/tdx: Refactor __tdx_hypercall() to allow pass down more arguments
RDI is the first argument to __tdx_hypercall() that used to pass pointer to struct tdx_hypercall_args. RSI is the second argument that contains flags, such as TDX_HCALL_HAS_OUTPUT and TDX_HCALL_ISSUE_STI.
RDI and RSI can also be used as arguments to TDVMCALL leafs. Move RDI to RAX and RSI to RBP to free up them for the hypercall arguments.
RAX saved on stack during TDCALL as it returns status code in the register.
RBP value has to be restored before returning from __tdx_hypercall() as it is callee-saved register.
This is preparatory patch. No functional change.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230126221159.8635-4-kirill.shutemov%40linux.intel.com
show more ...
|