837197f4 | 04-Nov-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Dynamically disable SEPT violations from causing #VEs
[ Upstream commit f65aa0ad79fca4ace921da0701644f020129043d ]
Memory access #VEs are hard for Linux to handle in contexts like the entr
x86/tdx: Dynamically disable SEPT violations from causing #VEs
[ Upstream commit f65aa0ad79fca4ace921da0701644f020129043d ]
Memory access #VEs are hard for Linux to handle in contexts like the entry code or NMIs. But other OSes need them for functionality. There's a static (pre-guest-boot) way for a VMM to choose one or the other. But VMMs don't always know which OS they are booting, so they choose to deliver those #VEs so the "other" OSes will work. That, unfortunately has left us in the lurch and exposed to these hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static configuration is set to "send nasty #VEs", the kernel can dynamically request that they be disabled. Once they are disabled, access to private memory that is not in the Mapped state in the Secure-EPT (SEPT) will result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE attribute is no longer reliable. It reflects the initial state of the control for the TD, but it will not be updated if someone (e.g. bootloader) changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to determine if SEPT #VEs are enabled or disabled.
[ dhansen: remove 'return' at end of function ]
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-4-kirill.shutemov%40linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
44cb69db | 04-Nov-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Rename tdx_parse_tdinfo() to tdx_setup()
[ Upstream commit b064043d9565786b385f85e6436ca5716bbd5552 ]
Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES there.
The f
x86/tdx: Rename tdx_parse_tdinfo() to tdx_setup()
[ Upstream commit b064043d9565786b385f85e6436ca5716bbd5552 ]
Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES there.
The function will be extended to adjust TD configuration.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-3-kirill.shutemov%40linux.intel.com Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
d4e39b6f | 04-Nov-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Introduce wrappers to read and write TD metadata
[ Upstream commit 5081e8fadb809253c911b349b01d87c5b4e3fec5 ]
The TDG_VM_WR TDCALL is used to ask the TDX module to change some TD-specific
x86/tdx: Introduce wrappers to read and write TD metadata
[ Upstream commit 5081e8fadb809253c911b349b01d87c5b4e3fec5 ]
The TDG_VM_WR TDCALL is used to ask the TDX module to change some TD-specific VM configuration. There is currently only one user in the kernel of this TDCALL leaf. More will be added shortly.
Refactor to make way for more users of TDG_VM_WR who will need to modify other TD configuration values.
Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific metadata from the TDX module. There are currently no users for TDG_VM_RD. Mark it as __maybe_unused until the first user appears.
This is preparation for enumeration and enabling optional TD features.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-2-kirill.shutemov%40linux.intel.com Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
d0f6d80d | 15-Aug-2023 |
Kai Huang <kai.huang@intel.com> |
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
[ Upstream commit 57a420bb8186d1d0178b857e5dd5026093641654 ]
Currently, the TDX_MODULE_CALL asm macro, which handles both TDCALL
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
[ Upstream commit 57a420bb8186d1d0178b857e5dd5026093641654 ]
Currently, the TDX_MODULE_CALL asm macro, which handles both TDCALL and SEAMCALL, takes one parameter for each input register and an optional 'struct tdx_module_output' (a collection of output registers) as output. This is different from the TDX_HYPERCALL macro which uses a single 'struct tdx_hypercall_args' to carry all input/output registers.
The newer TDX versions introduce more TDCALLs/SEAMCALLs which use more input/output registers. Also, the TDH.VP.ENTER (which isn't covered by the current TDX_MODULE_CALL macro) basically can use all registers that the TDX_HYPERCALL does. The current TDX_MODULE_CALL macro isn't extendible to cover those cases.
Similar to the TDX_HYPERCALL macro, simplify the TDX_MODULE_CALL macro to use a single structure 'struct tdx_module_args' to carry all the input/output registers. Currently, R10/R11 are only used as output register but not as input by any TDCALL/SEAMCALL. Change to also use R10/R11 as input register to make input/output registers symmetric.
Currently, the TDX_MODULE_CALL macro depends on the caller to pass a non-NULL 'struct tdx_module_output' to get additional output registers. Similar to the TDX_HYPERCALL macro, change the TDX_MODULE_CALL macro to take a new 'ret' macro argument to indicate whether to save the output registers to the 'struct tdx_module_args'. Also introduce a new __tdcall_ret() for that purpose, similar to the __tdx_hypercall_ret().
Note the tdcall(), which is a wrapper of __tdcall(), is called by three callers: tdx_parse_tdinfo(), tdx_get_ve_info() and tdx_early_init(). The former two need the additional output but the last one doesn't. For simplicity, make tdcall() always call __tdcall_ret() to avoid another "_ret()" wrapper. The last caller tdx_early_init() isn't performance critical anyway.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/483616c1762d85eb3a3c3035a7de061cfacf2f14.1692096753.git.kai.huang%40intel.com Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
a79a114f | 15-Aug-2023 |
Kai Huang <kai.huang@intel.com> |
x86/tdx: Rename __tdx_module_call() to __tdcall()
[ Upstream commit 5efb96289e581c187af1bc288ce5d26ed6181749 ]
__tdx_module_call() is only used by the TDX guest to issue TDCALL to the TDX module.
x86/tdx: Rename __tdx_module_call() to __tdcall()
[ Upstream commit 5efb96289e581c187af1bc288ce5d26ed6181749 ]
__tdx_module_call() is only used by the TDX guest to issue TDCALL to the TDX module. Rename it to __tdcall() to match its behaviour, e.g., it cannot be used to make host-side SEAMCALL.
Also rename tdx_module_call() which is a wrapper of __tdx_module_call() to tdcall().
No functional change intended.
Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/785d20d99fbcd0db8262c94da6423375422d8c75.1692096753.git.kai.huang%40intel.com Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
40d3b219 | 15-Aug-2023 |
Kai Huang <kai.huang@intel.com> |
x86/tdx: Make macros of TDCALLs consistent with the spec
[ Upstream commit f0024dbfc48d8814d915eb5bd5253496b9b8a6df ]
The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel doesn'
x86/tdx: Make macros of TDCALLs consistent with the spec
[ Upstream commit f0024dbfc48d8814d915eb5bd5253496b9b8a6df ]
The TDX spec names all TDCALLs with prefix "TDG". Currently, the kernel doesn't follow such convention for the macros of those TDCALLs but uses prefix "TDX_" for all of them. Although it's arguable whether the TDX spec names those TDCALLs properly, it's better for the kernel to follow the spec when naming those macros.
Change all macros of TDCALLs to make them consistent with the spec. As a bonus, they get distinguished easily from the host-side SEAMCALLs, which all have prefix "TDH".
No functional change intended.
Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/516dccd0bd8fb9a0b6af30d25bb2d971aa03d598.1692096753.git.kai.huang%40intel.com Stable-dep-of: f65aa0ad79fc ("x86/tdx: Dynamically disable SEPT violations from causing #VEs") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
239bff01 | 04-Dec-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Allow 32-bit emulation by default
[ upstream commit f4116bfc44621882556bbf70f5284fbf429a5cf6 ]
32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an inter
x86/tdx: Allow 32-bit emulation by default
[ upstream commit f4116bfc44621882556bbf70f5284fbf429a5cf6 ]
32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an interrupt on vector 0x80.
Now that int80_emulation() has a check for external interrupts the limitation can be lifted.
To distinguish software interrupts from external ones, int80_emulation() checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0.
On TDX, the VAPIC state (including ISR) is protected and cannot be manipulated by the VMM. The ISR bit is set by the microcode flow during the handling of posted interrupts.
[ dhansen: more changelog tweaks ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
195edce0 | 06-Jun-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
tl;dr: There is a race in the TDX private<=>shared conversion code which could kill the TDX guest. Fix it by cha
x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
tl;dr: There is a race in the TDX private<=>shared conversion code which could kill the TDX guest. Fix it by changing conversion ordering to eliminate the window.
TDX hardware maintains metadata to track which pages are private and shared. Additionally, TDX guests use the guest x86 page tables to specify whether a given mapping is intended to be private or shared. Bad things happen when the intent and metadata do not match.
So there are two thing in play: 1. "the page" -- the physical TDX page metadata 2. "the mapping" -- the guest-controlled x86 page table intent
For instance, an unrecoverable exit to VMM occurs if a guest touches a private mapping that points to a shared physical page.
In summary: * Private mapping => Private Page == OK (obviously) * Shared mapping => Shared Page == OK (obviously) * Private mapping => Shared Page == BIG BOOM! * Shared mapping => Private Page == OK-ish (It will read generate a recoverable #VE via handle_mmio())
Enter load_unaligned_zeropad(). It can touch memory that is adjacent but otherwise unrelated to the memory it needs to touch. It will cause one of those unrecoverable exits (aka. BIG BOOM) if it blunders into a shared mapping pointing to a private page.
This is a problem when __set_memory_enc_pgtable() converts pages from shared to private. It first changes the mapping and second modifies the TDX page metadata. It's moving from:
* Shared mapping => Shared Page == OK to: * Private mapping => Shared Page == BIG BOOM!
This means that there is a window with a shared mapping pointing to a private page where load_unaligned_zeropad() can strike.
Add a TDX handler for guest.enc_status_change_prepare(). This converts the page from shared to private *before* the page becomes private. This ensures that there is never a private mapping to a shared page.
Leave a guest.enc_status_change_finish() in place but only use it for private=>shared conversions. This will delay updating the TDX metadata marking the page private until *after* the mapping matches the metadata. This also ensures that there is never a private mapping to a shared page.
[ dhansen: rewrite changelog ]
Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lore.kernel.org/all/20230606095622.1939-3-kirill.shutemov%40linux.intel.com
show more ...
|
c2b353ae | 06-Jun-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Refactor try_accept_one()
Rework try_accept_one() to return accepted size instead of modifying 'start' inside the helper. It makes 'start' in-only argument and streamlines code on the calle
x86/tdx: Refactor try_accept_one()
Rework try_accept_one() to return accepted size instead of modifying 'start' inside the helper. It makes 'start' in-only argument and streamlines code on the caller side.
Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20230606142637.5171-9-kirill.shutemov@linux.intel.com
show more ...
|