165fd4cb6SThomas GleixnerL1TF - L1 Terminal Fault 265fd4cb6SThomas Gleixner======================== 365fd4cb6SThomas Gleixner 465fd4cb6SThomas GleixnerL1 Terminal Fault is a hardware vulnerability which allows unprivileged 565fd4cb6SThomas Gleixnerspeculative access to data which is available in the Level 1 Data Cache 665fd4cb6SThomas Gleixnerwhen the page table entry controlling the virtual address, which is used 765fd4cb6SThomas Gleixnerfor the access, has the Present bit cleared or other reserved bits set. 865fd4cb6SThomas Gleixner 965fd4cb6SThomas GleixnerAffected processors 1065fd4cb6SThomas Gleixner------------------- 1165fd4cb6SThomas Gleixner 1265fd4cb6SThomas GleixnerThis vulnerability affects a wide range of Intel processors. The 1365fd4cb6SThomas Gleixnervulnerability is not present on: 1465fd4cb6SThomas Gleixner 1565fd4cb6SThomas Gleixner - Processors from AMD, Centaur and other non Intel vendors 1665fd4cb6SThomas Gleixner 1765fd4cb6SThomas Gleixner - Older processor models, where the CPU family is < 6 1865fd4cb6SThomas Gleixner 1965fd4cb6SThomas Gleixner - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft, 2065fd4cb6SThomas Gleixner Penwell, Pineview, Silvermont, Airmont, Merrifield) 2165fd4cb6SThomas Gleixner 2265fd4cb6SThomas Gleixner - The Intel XEON PHI family 2365fd4cb6SThomas Gleixner 2465fd4cb6SThomas Gleixner - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the 2565fd4cb6SThomas Gleixner IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected 2665fd4cb6SThomas Gleixner by the Meltdown vulnerability either. These CPUs should become 2765fd4cb6SThomas Gleixner available by end of 2018. 2865fd4cb6SThomas Gleixner 2965fd4cb6SThomas GleixnerWhether a processor is affected or not can be read out from the L1TF 3065fd4cb6SThomas Gleixnervulnerability file in sysfs. See :ref:`l1tf_sys_info`. 3165fd4cb6SThomas Gleixner 3265fd4cb6SThomas GleixnerRelated CVEs 3365fd4cb6SThomas Gleixner------------ 3465fd4cb6SThomas Gleixner 3565fd4cb6SThomas GleixnerThe following CVE entries are related to the L1TF vulnerability: 3665fd4cb6SThomas Gleixner 3765fd4cb6SThomas Gleixner ============= ================= ============================== 3865fd4cb6SThomas Gleixner CVE-2018-3615 L1 Terminal Fault SGX related aspects 3965fd4cb6SThomas Gleixner CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects 4065fd4cb6SThomas Gleixner CVE-2018-3646 L1 Terminal Fault Virtualization related aspects 4165fd4cb6SThomas Gleixner ============= ================= ============================== 4265fd4cb6SThomas Gleixner 4365fd4cb6SThomas GleixnerProblem 4465fd4cb6SThomas Gleixner------- 4565fd4cb6SThomas Gleixner 4665fd4cb6SThomas GleixnerIf an instruction accesses a virtual address for which the relevant page 4765fd4cb6SThomas Gleixnertable entry (PTE) has the Present bit cleared or other reserved bits set, 4865fd4cb6SThomas Gleixnerthen speculative execution ignores the invalid PTE and loads the referenced 4965fd4cb6SThomas Gleixnerdata if it is present in the Level 1 Data Cache, as if the page referenced 5065fd4cb6SThomas Gleixnerby the address bits in the PTE was still present and accessible. 5165fd4cb6SThomas Gleixner 5265fd4cb6SThomas GleixnerWhile this is a purely speculative mechanism and the instruction will raise 5365fd4cb6SThomas Gleixnera page fault when it is retired eventually, the pure act of loading the 5465fd4cb6SThomas Gleixnerdata and making it available to other speculative instructions opens up the 5565fd4cb6SThomas Gleixneropportunity for side channel attacks to unprivileged malicious code, 5665fd4cb6SThomas Gleixnersimilar to the Meltdown attack. 5765fd4cb6SThomas Gleixner 5865fd4cb6SThomas GleixnerWhile Meltdown breaks the user space to kernel space protection, L1TF 5965fd4cb6SThomas Gleixnerallows to attack any physical memory address in the system and the attack 6065fd4cb6SThomas Gleixnerworks across all protection domains. It allows an attack of SGX and also 6165fd4cb6SThomas Gleixnerworks from inside virtual machines because the speculation bypasses the 6265fd4cb6SThomas Gleixnerextended page table (EPT) protection mechanism. 6365fd4cb6SThomas Gleixner 6465fd4cb6SThomas Gleixner 6565fd4cb6SThomas GleixnerAttack scenarios 6665fd4cb6SThomas Gleixner---------------- 6765fd4cb6SThomas Gleixner 6865fd4cb6SThomas Gleixner1. Malicious user space 6965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^ 7065fd4cb6SThomas Gleixner 7165fd4cb6SThomas Gleixner Operating Systems store arbitrary information in the address bits of a 7265fd4cb6SThomas Gleixner PTE which is marked non present. This allows a malicious user space 7365fd4cb6SThomas Gleixner application to attack the physical memory to which these PTEs resolve. 7465fd4cb6SThomas Gleixner In some cases user-space can maliciously influence the information 7565fd4cb6SThomas Gleixner encoded in the address bits of the PTE, thus making attacks more 7665fd4cb6SThomas Gleixner deterministic and more practical. 7765fd4cb6SThomas Gleixner 7865fd4cb6SThomas Gleixner The Linux kernel contains a mitigation for this attack vector, PTE 7965fd4cb6SThomas Gleixner inversion, which is permanently enabled and has no performance 8065fd4cb6SThomas Gleixner impact. The kernel ensures that the address bits of PTEs, which are not 8165fd4cb6SThomas Gleixner marked present, never point to cacheable physical memory space. 8265fd4cb6SThomas Gleixner 8365fd4cb6SThomas Gleixner A system with an up to date kernel is protected against attacks from 8465fd4cb6SThomas Gleixner malicious user space applications. 8565fd4cb6SThomas Gleixner 8665fd4cb6SThomas Gleixner2. Malicious guest in a virtual machine 8765fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 8865fd4cb6SThomas Gleixner 8965fd4cb6SThomas Gleixner The fact that L1TF breaks all domain protections allows malicious guest 9065fd4cb6SThomas Gleixner OSes, which can control the PTEs directly, and malicious guest user 9165fd4cb6SThomas Gleixner space applications, which run on an unprotected guest kernel lacking the 9265fd4cb6SThomas Gleixner PTE inversion mitigation for L1TF, to attack physical host memory. 9365fd4cb6SThomas Gleixner 9465fd4cb6SThomas Gleixner A special aspect of L1TF in the context of virtualization is symmetric 9565fd4cb6SThomas Gleixner multi threading (SMT). The Intel implementation of SMT is called 9665fd4cb6SThomas Gleixner HyperThreading. The fact that Hyperthreads on the affected processors 9765fd4cb6SThomas Gleixner share the L1 Data Cache (L1D) is important for this. As the flaw allows 9865fd4cb6SThomas Gleixner only to attack data which is present in L1D, a malicious guest running 9965fd4cb6SThomas Gleixner on one Hyperthread can attack the data which is brought into the L1D by 10065fd4cb6SThomas Gleixner the context which runs on the sibling Hyperthread of the same physical 10165fd4cb6SThomas Gleixner core. This context can be host OS, host user space or a different guest. 10265fd4cb6SThomas Gleixner 10365fd4cb6SThomas Gleixner If the processor does not support Extended Page Tables, the attack is 10465fd4cb6SThomas Gleixner only possible, when the hypervisor does not sanitize the content of the 10565fd4cb6SThomas Gleixner effective (shadow) page tables. 10665fd4cb6SThomas Gleixner 10765fd4cb6SThomas Gleixner While solutions exist to mitigate these attack vectors fully, these 10865fd4cb6SThomas Gleixner mitigations are not enabled by default in the Linux kernel because they 10965fd4cb6SThomas Gleixner can affect performance significantly. The kernel provides several 11065fd4cb6SThomas Gleixner mechanisms which can be utilized to address the problem depending on the 11165fd4cb6SThomas Gleixner deployment scenario. The mitigations, their protection scope and impact 11265fd4cb6SThomas Gleixner are described in the next sections. 11365fd4cb6SThomas Gleixner 11465fd4cb6SThomas Gleixner The default mitigations and the rationale for choosing them are explained 11565fd4cb6SThomas Gleixner at the end of this document. See :ref:`default_mitigations`. 11665fd4cb6SThomas Gleixner 11765fd4cb6SThomas Gleixner.. _l1tf_sys_info: 11865fd4cb6SThomas Gleixner 11965fd4cb6SThomas GleixnerL1TF system information 12065fd4cb6SThomas Gleixner----------------------- 12165fd4cb6SThomas Gleixner 12265fd4cb6SThomas GleixnerThe Linux kernel provides a sysfs interface to enumerate the current L1TF 12365fd4cb6SThomas Gleixnerstatus of the system: whether the system is vulnerable, and which 12465fd4cb6SThomas Gleixnermitigations are active. The relevant sysfs file is: 12565fd4cb6SThomas Gleixner 12665fd4cb6SThomas Gleixner/sys/devices/system/cpu/vulnerabilities/l1tf 12765fd4cb6SThomas Gleixner 12865fd4cb6SThomas GleixnerThe possible values in this file are: 12965fd4cb6SThomas Gleixner 13065fd4cb6SThomas Gleixner =========================== =============================== 13165fd4cb6SThomas Gleixner 'Not affected' The processor is not vulnerable 13265fd4cb6SThomas Gleixner 'Mitigation: PTE Inversion' The host protection is active 13365fd4cb6SThomas Gleixner =========================== =============================== 13465fd4cb6SThomas Gleixner 13565fd4cb6SThomas GleixnerIf KVM/VMX is enabled and the processor is vulnerable then the following 13665fd4cb6SThomas Gleixnerinformation is appended to the 'Mitigation: PTE Inversion' part: 13765fd4cb6SThomas Gleixner 13865fd4cb6SThomas Gleixner - SMT status: 13965fd4cb6SThomas Gleixner 14065fd4cb6SThomas Gleixner ===================== ================ 14165fd4cb6SThomas Gleixner 'VMX: SMT vulnerable' SMT is enabled 14265fd4cb6SThomas Gleixner 'VMX: SMT disabled' SMT is disabled 14365fd4cb6SThomas Gleixner ===================== ================ 14465fd4cb6SThomas Gleixner 14565fd4cb6SThomas Gleixner - L1D Flush mode: 14665fd4cb6SThomas Gleixner 14765fd4cb6SThomas Gleixner ================================ ==================================== 14865fd4cb6SThomas Gleixner 'L1D vulnerable' L1D flushing is disabled 14965fd4cb6SThomas Gleixner 15065fd4cb6SThomas Gleixner 'L1D conditional cache flushes' L1D flush is conditionally enabled 15165fd4cb6SThomas Gleixner 15265fd4cb6SThomas Gleixner 'L1D cache flushes' L1D flush is unconditionally enabled 15365fd4cb6SThomas Gleixner ================================ ==================================== 15465fd4cb6SThomas Gleixner 15565fd4cb6SThomas GleixnerThe resulting grade of protection is discussed in the following sections. 15665fd4cb6SThomas Gleixner 15765fd4cb6SThomas Gleixner 15865fd4cb6SThomas GleixnerHost mitigation mechanism 15965fd4cb6SThomas Gleixner------------------------- 16065fd4cb6SThomas Gleixner 16165fd4cb6SThomas GleixnerThe kernel is unconditionally protected against L1TF attacks from malicious 16265fd4cb6SThomas Gleixneruser space running on the host. 16365fd4cb6SThomas Gleixner 16465fd4cb6SThomas Gleixner 16565fd4cb6SThomas GleixnerGuest mitigation mechanisms 16665fd4cb6SThomas Gleixner--------------------------- 16765fd4cb6SThomas Gleixner 16865fd4cb6SThomas Gleixner.. _l1d_flush: 16965fd4cb6SThomas Gleixner 17065fd4cb6SThomas Gleixner1. L1D flush on VMENTER 17165fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^ 17265fd4cb6SThomas Gleixner 17365fd4cb6SThomas Gleixner To make sure that a guest cannot attack data which is present in the L1D 17465fd4cb6SThomas Gleixner the hypervisor flushes the L1D before entering the guest. 17565fd4cb6SThomas Gleixner 17665fd4cb6SThomas Gleixner Flushing the L1D evicts not only the data which should not be accessed 17765fd4cb6SThomas Gleixner by a potentially malicious guest, it also flushes the guest 17865fd4cb6SThomas Gleixner data. Flushing the L1D has a performance impact as the processor has to 17965fd4cb6SThomas Gleixner bring the flushed guest data back into the L1D. Depending on the 18065fd4cb6SThomas Gleixner frequency of VMEXIT/VMENTER and the type of computations in the guest 18165fd4cb6SThomas Gleixner performance degradation in the range of 1% to 50% has been observed. For 18265fd4cb6SThomas Gleixner scenarios where guest VMEXIT/VMENTER are rare the performance impact is 18365fd4cb6SThomas Gleixner minimal. Virtio and mechanisms like posted interrupts are designed to 18465fd4cb6SThomas Gleixner confine the VMEXITs to a bare minimum, but specific configurations and 18565fd4cb6SThomas Gleixner application scenarios might still suffer from a high VMEXIT rate. 18665fd4cb6SThomas Gleixner 18765fd4cb6SThomas Gleixner The kernel provides two L1D flush modes: 18865fd4cb6SThomas Gleixner - conditional ('cond') 18965fd4cb6SThomas Gleixner - unconditional ('always') 19065fd4cb6SThomas Gleixner 19165fd4cb6SThomas Gleixner The conditional mode avoids L1D flushing after VMEXITs which execute 19265fd4cb6SThomas Gleixner only audited code paths before the corresponding VMENTER. These code 19365fd4cb6SThomas Gleixner paths have been verified that they cannot expose secrets or other 19465fd4cb6SThomas Gleixner interesting data to an attacker, but they can leak information about the 19565fd4cb6SThomas Gleixner address space layout of the hypervisor. 19665fd4cb6SThomas Gleixner 19765fd4cb6SThomas Gleixner Unconditional mode flushes L1D on all VMENTER invocations and provides 19865fd4cb6SThomas Gleixner maximum protection. It has a higher overhead than the conditional 19965fd4cb6SThomas Gleixner mode. The overhead cannot be quantified correctly as it depends on the 20065fd4cb6SThomas Gleixner workload scenario and the resulting number of VMEXITs. 20165fd4cb6SThomas Gleixner 20265fd4cb6SThomas Gleixner The general recommendation is to enable L1D flush on VMENTER. The kernel 20365fd4cb6SThomas Gleixner defaults to conditional mode on affected processors. 20465fd4cb6SThomas Gleixner 20565fd4cb6SThomas Gleixner **Note**, that L1D flush does not prevent the SMT problem because the 20665fd4cb6SThomas Gleixner sibling thread will also bring back its data into the L1D which makes it 20765fd4cb6SThomas Gleixner attackable again. 20865fd4cb6SThomas Gleixner 20965fd4cb6SThomas Gleixner L1D flush can be controlled by the administrator via the kernel command 21065fd4cb6SThomas Gleixner line and sysfs control files. See :ref:`mitigation_control_command_line` 21165fd4cb6SThomas Gleixner and :ref:`mitigation_control_kvm`. 21265fd4cb6SThomas Gleixner 21365fd4cb6SThomas Gleixner.. _guest_confinement: 21465fd4cb6SThomas Gleixner 21565fd4cb6SThomas Gleixner2. Guest VCPU confinement to dedicated physical cores 21665fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 21765fd4cb6SThomas Gleixner 21865fd4cb6SThomas Gleixner To address the SMT problem, it is possible to make a guest or a group of 21965fd4cb6SThomas Gleixner guests affine to one or more physical cores. The proper mechanism for 22065fd4cb6SThomas Gleixner that is to utilize exclusive cpusets to ensure that no other guest or 22165fd4cb6SThomas Gleixner host tasks can run on these cores. 22265fd4cb6SThomas Gleixner 22365fd4cb6SThomas Gleixner If only a single guest or related guests run on sibling SMT threads on 22465fd4cb6SThomas Gleixner the same physical core then they can only attack their own memory and 22565fd4cb6SThomas Gleixner restricted parts of the host memory. 22665fd4cb6SThomas Gleixner 22765fd4cb6SThomas Gleixner Host memory is attackable, when one of the sibling SMT threads runs in 22865fd4cb6SThomas Gleixner host OS (hypervisor) context and the other in guest context. The amount 22965fd4cb6SThomas Gleixner of valuable information from the host OS context depends on the context 23065fd4cb6SThomas Gleixner which the host OS executes, i.e. interrupts, soft interrupts and kernel 23165fd4cb6SThomas Gleixner threads. The amount of valuable data from these contexts cannot be 23265fd4cb6SThomas Gleixner declared as non-interesting for an attacker without deep inspection of 23365fd4cb6SThomas Gleixner the code. 23465fd4cb6SThomas Gleixner 23565fd4cb6SThomas Gleixner **Note**, that assigning guests to a fixed set of physical cores affects 23665fd4cb6SThomas Gleixner the ability of the scheduler to do load balancing and might have 23765fd4cb6SThomas Gleixner negative effects on CPU utilization depending on the hosting 23865fd4cb6SThomas Gleixner scenario. Disabling SMT might be a viable alternative for particular 23965fd4cb6SThomas Gleixner scenarios. 24065fd4cb6SThomas Gleixner 24165fd4cb6SThomas Gleixner For further information about confining guests to a single or to a group 24265fd4cb6SThomas Gleixner of cores consult the cpusets documentation: 24365fd4cb6SThomas Gleixner 2444f4cfa6cSMauro Carvalho Chehab https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst 24565fd4cb6SThomas Gleixner 24665fd4cb6SThomas Gleixner.. _interrupt_isolation: 24765fd4cb6SThomas Gleixner 24865fd4cb6SThomas Gleixner3. Interrupt affinity 24965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^ 25065fd4cb6SThomas Gleixner 25165fd4cb6SThomas Gleixner Interrupts can be made affine to logical CPUs. This is not universally 25265fd4cb6SThomas Gleixner true because there are types of interrupts which are truly per CPU 25365fd4cb6SThomas Gleixner interrupts, e.g. the local timer interrupt. Aside of that multi queue 25465fd4cb6SThomas Gleixner devices affine their interrupts to single CPUs or groups of CPUs per 25565fd4cb6SThomas Gleixner queue without allowing the administrator to control the affinities. 25665fd4cb6SThomas Gleixner 25765fd4cb6SThomas Gleixner Moving the interrupts, which can be affinity controlled, away from CPUs 25865fd4cb6SThomas Gleixner which run untrusted guests, reduces the attack vector space. 25965fd4cb6SThomas Gleixner 26065fd4cb6SThomas Gleixner Whether the interrupts with are affine to CPUs, which run untrusted 26165fd4cb6SThomas Gleixner guests, provide interesting data for an attacker depends on the system 26265fd4cb6SThomas Gleixner configuration and the scenarios which run on the system. While for some 26365fd4cb6SThomas Gleixner of the interrupts it can be assumed that they won't expose interesting 26465fd4cb6SThomas Gleixner information beyond exposing hints about the host OS memory layout, there 26565fd4cb6SThomas Gleixner is no way to make general assumptions. 26665fd4cb6SThomas Gleixner 26765fd4cb6SThomas Gleixner Interrupt affinity can be controlled by the administrator via the 26865fd4cb6SThomas Gleixner /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is 26965fd4cb6SThomas Gleixner available at: 27065fd4cb6SThomas Gleixner 271*e00b0ab8SMauro Carvalho Chehab https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst 27265fd4cb6SThomas Gleixner 27365fd4cb6SThomas Gleixner.. _smt_control: 27465fd4cb6SThomas Gleixner 27565fd4cb6SThomas Gleixner4. SMT control 27665fd4cb6SThomas Gleixner^^^^^^^^^^^^^^ 27765fd4cb6SThomas Gleixner 27865fd4cb6SThomas Gleixner To prevent the SMT issues of L1TF it might be necessary to disable SMT 27965fd4cb6SThomas Gleixner completely. Disabling SMT can have a significant performance impact, but 28065fd4cb6SThomas Gleixner the impact depends on the hosting scenario and the type of workloads. 28165fd4cb6SThomas Gleixner The impact of disabling SMT needs also to be weighted against the impact 28265fd4cb6SThomas Gleixner of other mitigation solutions like confining guests to dedicated cores. 28365fd4cb6SThomas Gleixner 28465fd4cb6SThomas Gleixner The kernel provides a sysfs interface to retrieve the status of SMT and 28565fd4cb6SThomas Gleixner to control it. It also provides a kernel command line interface to 28665fd4cb6SThomas Gleixner control SMT. 28765fd4cb6SThomas Gleixner 28865fd4cb6SThomas Gleixner The kernel command line interface consists of the following options: 28965fd4cb6SThomas Gleixner 29065fd4cb6SThomas Gleixner =========== ========================================================== 29165fd4cb6SThomas Gleixner nosmt Affects the bring up of the secondary CPUs during boot. The 29265fd4cb6SThomas Gleixner kernel tries to bring all present CPUs online during the 29365fd4cb6SThomas Gleixner boot process. "nosmt" makes sure that from each physical 29465fd4cb6SThomas Gleixner core only one - the so called primary (hyper) thread is 29565fd4cb6SThomas Gleixner activated. Due to a design flaw of Intel processors related 29665fd4cb6SThomas Gleixner to Machine Check Exceptions the non primary siblings have 29765fd4cb6SThomas Gleixner to be brought up at least partially and are then shut down 29865fd4cb6SThomas Gleixner again. "nosmt" can be undone via the sysfs interface. 29965fd4cb6SThomas Gleixner 30065fd4cb6SThomas Gleixner nosmt=force Has the same effect as "nosmt" but it does not allow to 30165fd4cb6SThomas Gleixner undo the SMT disable via the sysfs interface. 30265fd4cb6SThomas Gleixner =========== ========================================================== 30365fd4cb6SThomas Gleixner 30465fd4cb6SThomas Gleixner The sysfs interface provides two files: 30565fd4cb6SThomas Gleixner 30665fd4cb6SThomas Gleixner - /sys/devices/system/cpu/smt/control 30765fd4cb6SThomas Gleixner - /sys/devices/system/cpu/smt/active 30865fd4cb6SThomas Gleixner 30965fd4cb6SThomas Gleixner /sys/devices/system/cpu/smt/control: 31065fd4cb6SThomas Gleixner 31165fd4cb6SThomas Gleixner This file allows to read out the SMT control state and provides the 31265fd4cb6SThomas Gleixner ability to disable or (re)enable SMT. The possible states are: 31365fd4cb6SThomas Gleixner 31465fd4cb6SThomas Gleixner ============== =================================================== 31565fd4cb6SThomas Gleixner on SMT is supported by the CPU and enabled. All 31665fd4cb6SThomas Gleixner logical CPUs can be onlined and offlined without 31765fd4cb6SThomas Gleixner restrictions. 31865fd4cb6SThomas Gleixner 31965fd4cb6SThomas Gleixner off SMT is supported by the CPU and disabled. Only 32065fd4cb6SThomas Gleixner the so called primary SMT threads can be onlined 32165fd4cb6SThomas Gleixner and offlined without restrictions. An attempt to 32265fd4cb6SThomas Gleixner online a non-primary sibling is rejected 32365fd4cb6SThomas Gleixner 32465fd4cb6SThomas Gleixner forceoff Same as 'off' but the state cannot be controlled. 32565fd4cb6SThomas Gleixner Attempts to write to the control file are rejected. 32665fd4cb6SThomas Gleixner 32765fd4cb6SThomas Gleixner notsupported The processor does not support SMT. It's therefore 32865fd4cb6SThomas Gleixner not affected by the SMT implications of L1TF. 32965fd4cb6SThomas Gleixner Attempts to write to the control file are rejected. 33065fd4cb6SThomas Gleixner ============== =================================================== 33165fd4cb6SThomas Gleixner 33265fd4cb6SThomas Gleixner The possible states which can be written into this file to control SMT 33365fd4cb6SThomas Gleixner state are: 33465fd4cb6SThomas Gleixner 33565fd4cb6SThomas Gleixner - on 33665fd4cb6SThomas Gleixner - off 33765fd4cb6SThomas Gleixner - forceoff 33865fd4cb6SThomas Gleixner 33965fd4cb6SThomas Gleixner /sys/devices/system/cpu/smt/active: 34065fd4cb6SThomas Gleixner 34165fd4cb6SThomas Gleixner This file reports whether SMT is enabled and active, i.e. if on any 34265fd4cb6SThomas Gleixner physical core two or more sibling threads are online. 34365fd4cb6SThomas Gleixner 34465fd4cb6SThomas Gleixner SMT control is also possible at boot time via the l1tf kernel command 34565fd4cb6SThomas Gleixner line parameter in combination with L1D flush control. See 34665fd4cb6SThomas Gleixner :ref:`mitigation_control_command_line`. 34765fd4cb6SThomas Gleixner 34865fd4cb6SThomas Gleixner5. Disabling EPT 34965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^ 35065fd4cb6SThomas Gleixner 35165fd4cb6SThomas Gleixner Disabling EPT for virtual machines provides full mitigation for L1TF even 35265fd4cb6SThomas Gleixner with SMT enabled, because the effective page tables for guests are 35365fd4cb6SThomas Gleixner managed and sanitized by the hypervisor. Though disabling EPT has a 35465fd4cb6SThomas Gleixner significant performance impact especially when the Meltdown mitigation 35565fd4cb6SThomas Gleixner KPTI is enabled. 35665fd4cb6SThomas Gleixner 35765fd4cb6SThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 35865fd4cb6SThomas Gleixner 35965fd4cb6SThomas GleixnerThere is ongoing research and development for new mitigation mechanisms to 36065fd4cb6SThomas Gleixneraddress the performance impact of disabling SMT or EPT. 36165fd4cb6SThomas Gleixner 36265fd4cb6SThomas Gleixner.. _mitigation_control_command_line: 36365fd4cb6SThomas Gleixner 36465fd4cb6SThomas GleixnerMitigation control on the kernel command line 36565fd4cb6SThomas Gleixner--------------------------------------------- 36665fd4cb6SThomas Gleixner 36765fd4cb6SThomas GleixnerThe kernel command line allows to control the L1TF mitigations at boot 36865fd4cb6SThomas Gleixnertime with the option "l1tf=". The valid arguments for this option are: 36965fd4cb6SThomas Gleixner 37065fd4cb6SThomas Gleixner ============ ============================================================= 37165fd4cb6SThomas Gleixner full Provides all available mitigations for the L1TF 37265fd4cb6SThomas Gleixner vulnerability. Disables SMT and enables all mitigations in 37365fd4cb6SThomas Gleixner the hypervisors, i.e. unconditional L1D flushing 37465fd4cb6SThomas Gleixner 37565fd4cb6SThomas Gleixner SMT control and L1D flush control via the sysfs interface 37665fd4cb6SThomas Gleixner is still possible after boot. Hypervisors will issue a 37765fd4cb6SThomas Gleixner warning when the first VM is started in a potentially 37865fd4cb6SThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 37965fd4cb6SThomas Gleixner disabled. 38065fd4cb6SThomas Gleixner 38165fd4cb6SThomas Gleixner full,force Same as 'full', but disables SMT and L1D flush runtime 38265fd4cb6SThomas Gleixner control. Implies the 'nosmt=force' command line option. 38365fd4cb6SThomas Gleixner (i.e. sysfs control of SMT is disabled.) 38465fd4cb6SThomas Gleixner 38565fd4cb6SThomas Gleixner flush Leaves SMT enabled and enables the default hypervisor 38665fd4cb6SThomas Gleixner mitigation, i.e. conditional L1D flushing 38765fd4cb6SThomas Gleixner 38865fd4cb6SThomas Gleixner SMT control and L1D flush control via the sysfs interface 38965fd4cb6SThomas Gleixner is still possible after boot. Hypervisors will issue a 39065fd4cb6SThomas Gleixner warning when the first VM is started in a potentially 39165fd4cb6SThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 39265fd4cb6SThomas Gleixner disabled. 39365fd4cb6SThomas Gleixner 39465fd4cb6SThomas Gleixner flush,nosmt Disables SMT and enables the default hypervisor mitigation, 39565fd4cb6SThomas Gleixner i.e. conditional L1D flushing. 39665fd4cb6SThomas Gleixner 39765fd4cb6SThomas Gleixner SMT control and L1D flush control via the sysfs interface 39865fd4cb6SThomas Gleixner is still possible after boot. Hypervisors will issue a 39965fd4cb6SThomas Gleixner warning when the first VM is started in a potentially 40065fd4cb6SThomas Gleixner insecure configuration, i.e. SMT enabled or L1D flush 40165fd4cb6SThomas Gleixner disabled. 40265fd4cb6SThomas Gleixner 40365fd4cb6SThomas Gleixner flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is 40465fd4cb6SThomas Gleixner started in a potentially insecure configuration. 40565fd4cb6SThomas Gleixner 40665fd4cb6SThomas Gleixner off Disables hypervisor mitigations and doesn't emit any 40765fd4cb6SThomas Gleixner warnings. 40865fd4cb6SThomas Gleixner It also drops the swap size and available RAM limit restrictions 40965fd4cb6SThomas Gleixner on both hypervisor and bare metal. 41065fd4cb6SThomas Gleixner 41165fd4cb6SThomas Gleixner ============ ============================================================= 41265fd4cb6SThomas Gleixner 41365fd4cb6SThomas GleixnerThe default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`. 41465fd4cb6SThomas Gleixner 41565fd4cb6SThomas Gleixner 41665fd4cb6SThomas Gleixner.. _mitigation_control_kvm: 41765fd4cb6SThomas Gleixner 41865fd4cb6SThomas GleixnerMitigation control for KVM - module parameter 41965fd4cb6SThomas Gleixner------------------------------------------------------------- 42065fd4cb6SThomas Gleixner 42165fd4cb6SThomas GleixnerThe KVM hypervisor mitigation mechanism, flushing the L1D cache when 42265fd4cb6SThomas Gleixnerentering a guest, can be controlled with a module parameter. 42365fd4cb6SThomas Gleixner 42465fd4cb6SThomas GleixnerThe option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the 42565fd4cb6SThomas Gleixnerfollowing arguments: 42665fd4cb6SThomas Gleixner 42765fd4cb6SThomas Gleixner ============ ============================================================== 42865fd4cb6SThomas Gleixner always L1D cache flush on every VMENTER. 42965fd4cb6SThomas Gleixner 43065fd4cb6SThomas Gleixner cond Flush L1D on VMENTER only when the code between VMEXIT and 43165fd4cb6SThomas Gleixner VMENTER can leak host memory which is considered 43265fd4cb6SThomas Gleixner interesting for an attacker. This still can leak host memory 43365fd4cb6SThomas Gleixner which allows e.g. to determine the hosts address space layout. 43465fd4cb6SThomas Gleixner 43565fd4cb6SThomas Gleixner never Disables the mitigation 43665fd4cb6SThomas Gleixner ============ ============================================================== 43765fd4cb6SThomas Gleixner 43865fd4cb6SThomas GleixnerThe parameter can be provided on the kernel command line, as a module 43965fd4cb6SThomas Gleixnerparameter when loading the modules and at runtime modified via the sysfs 44065fd4cb6SThomas Gleixnerfile: 44165fd4cb6SThomas Gleixner 44265fd4cb6SThomas Gleixner/sys/module/kvm_intel/parameters/vmentry_l1d_flush 44365fd4cb6SThomas Gleixner 44465fd4cb6SThomas GleixnerThe default is 'cond'. If 'l1tf=full,force' is given on the kernel command 44565fd4cb6SThomas Gleixnerline, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush 44665fd4cb6SThomas Gleixnermodule parameter is ignored and writes to the sysfs file are rejected. 44765fd4cb6SThomas Gleixner 4485999bbe7SThomas Gleixner.. _mitigation_selection: 44965fd4cb6SThomas Gleixner 45065fd4cb6SThomas GleixnerMitigation selection guide 45165fd4cb6SThomas Gleixner-------------------------- 45265fd4cb6SThomas Gleixner 45365fd4cb6SThomas Gleixner1. No virtualization in use 45465fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^ 45565fd4cb6SThomas Gleixner 45665fd4cb6SThomas Gleixner The system is protected by the kernel unconditionally and no further 45765fd4cb6SThomas Gleixner action is required. 45865fd4cb6SThomas Gleixner 45965fd4cb6SThomas Gleixner2. Virtualization with trusted guests 46065fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 46165fd4cb6SThomas Gleixner 46265fd4cb6SThomas Gleixner If the guest comes from a trusted source and the guest OS kernel is 46365fd4cb6SThomas Gleixner guaranteed to have the L1TF mitigations in place the system is fully 46465fd4cb6SThomas Gleixner protected against L1TF and no further action is required. 46565fd4cb6SThomas Gleixner 46665fd4cb6SThomas Gleixner To avoid the overhead of the default L1D flushing on VMENTER the 46765fd4cb6SThomas Gleixner administrator can disable the flushing via the kernel command line and 46865fd4cb6SThomas Gleixner sysfs control files. See :ref:`mitigation_control_command_line` and 46965fd4cb6SThomas Gleixner :ref:`mitigation_control_kvm`. 47065fd4cb6SThomas Gleixner 47165fd4cb6SThomas Gleixner 47265fd4cb6SThomas Gleixner3. Virtualization with untrusted guests 47365fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 47465fd4cb6SThomas Gleixner 47565fd4cb6SThomas Gleixner3.1. SMT not supported or disabled 47665fd4cb6SThomas Gleixner"""""""""""""""""""""""""""""""""" 47765fd4cb6SThomas Gleixner 47865fd4cb6SThomas Gleixner If SMT is not supported by the processor or disabled in the BIOS or by 47965fd4cb6SThomas Gleixner the kernel, it's only required to enforce L1D flushing on VMENTER. 48065fd4cb6SThomas Gleixner 48165fd4cb6SThomas Gleixner Conditional L1D flushing is the default behaviour and can be tuned. See 48265fd4cb6SThomas Gleixner :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 48365fd4cb6SThomas Gleixner 48465fd4cb6SThomas Gleixner3.2. EPT not supported or disabled 48565fd4cb6SThomas Gleixner"""""""""""""""""""""""""""""""""" 48665fd4cb6SThomas Gleixner 48765fd4cb6SThomas Gleixner If EPT is not supported by the processor or disabled in the hypervisor, 48865fd4cb6SThomas Gleixner the system is fully protected. SMT can stay enabled and L1D flushing on 48965fd4cb6SThomas Gleixner VMENTER is not required. 49065fd4cb6SThomas Gleixner 49165fd4cb6SThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. 49265fd4cb6SThomas Gleixner 49365fd4cb6SThomas Gleixner3.3. SMT and EPT supported and active 49465fd4cb6SThomas Gleixner""""""""""""""""""""""""""""""""""""" 49565fd4cb6SThomas Gleixner 49665fd4cb6SThomas Gleixner If SMT and EPT are supported and active then various degrees of 49765fd4cb6SThomas Gleixner mitigations can be employed: 49865fd4cb6SThomas Gleixner 49965fd4cb6SThomas Gleixner - L1D flushing on VMENTER: 50065fd4cb6SThomas Gleixner 50165fd4cb6SThomas Gleixner L1D flushing on VMENTER is the minimal protection requirement, but it 50265fd4cb6SThomas Gleixner is only potent in combination with other mitigation methods. 50365fd4cb6SThomas Gleixner 50465fd4cb6SThomas Gleixner Conditional L1D flushing is the default behaviour and can be tuned. See 50565fd4cb6SThomas Gleixner :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`. 50665fd4cb6SThomas Gleixner 50765fd4cb6SThomas Gleixner - Guest confinement: 50865fd4cb6SThomas Gleixner 50965fd4cb6SThomas Gleixner Confinement of guests to a single or a group of physical cores which 51065fd4cb6SThomas Gleixner are not running any other processes, can reduce the attack surface 51165fd4cb6SThomas Gleixner significantly, but interrupts, soft interrupts and kernel threads can 51265fd4cb6SThomas Gleixner still expose valuable data to a potential attacker. See 51365fd4cb6SThomas Gleixner :ref:`guest_confinement`. 51465fd4cb6SThomas Gleixner 51565fd4cb6SThomas Gleixner - Interrupt isolation: 51665fd4cb6SThomas Gleixner 51765fd4cb6SThomas Gleixner Isolating the guest CPUs from interrupts can reduce the attack surface 51865fd4cb6SThomas Gleixner further, but still allows a malicious guest to explore a limited amount 51965fd4cb6SThomas Gleixner of host physical memory. This can at least be used to gain knowledge 52065fd4cb6SThomas Gleixner about the host address space layout. The interrupts which have a fixed 52165fd4cb6SThomas Gleixner affinity to the CPUs which run the untrusted guests can depending on 52265fd4cb6SThomas Gleixner the scenario still trigger soft interrupts and schedule kernel threads 52365fd4cb6SThomas Gleixner which might expose valuable information. See 52465fd4cb6SThomas Gleixner :ref:`interrupt_isolation`. 52565fd4cb6SThomas Gleixner 52665fd4cb6SThomas GleixnerThe above three mitigation methods combined can provide protection to a 52765fd4cb6SThomas Gleixnercertain degree, but the risk of the remaining attack surface has to be 52865fd4cb6SThomas Gleixnercarefully analyzed. For full protection the following methods are 52965fd4cb6SThomas Gleixneravailable: 53065fd4cb6SThomas Gleixner 53165fd4cb6SThomas Gleixner - Disabling SMT: 53265fd4cb6SThomas Gleixner 53365fd4cb6SThomas Gleixner Disabling SMT and enforcing the L1D flushing provides the maximum 53465fd4cb6SThomas Gleixner amount of protection. This mitigation is not depending on any of the 53565fd4cb6SThomas Gleixner above mitigation methods. 53665fd4cb6SThomas Gleixner 53765fd4cb6SThomas Gleixner SMT control and L1D flushing can be tuned by the command line 53865fd4cb6SThomas Gleixner parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run 53965fd4cb6SThomas Gleixner time with the matching sysfs control files. See :ref:`smt_control`, 54065fd4cb6SThomas Gleixner :ref:`mitigation_control_command_line` and 54165fd4cb6SThomas Gleixner :ref:`mitigation_control_kvm`. 54265fd4cb6SThomas Gleixner 54365fd4cb6SThomas Gleixner - Disabling EPT: 54465fd4cb6SThomas Gleixner 54565fd4cb6SThomas Gleixner Disabling EPT provides the maximum amount of protection as well. It is 54665fd4cb6SThomas Gleixner not depending on any of the above mitigation methods. SMT can stay 54765fd4cb6SThomas Gleixner enabled and L1D flushing is not required, but the performance impact is 54865fd4cb6SThomas Gleixner significant. 54965fd4cb6SThomas Gleixner 55065fd4cb6SThomas Gleixner EPT can be disabled in the hypervisor via the 'kvm-intel.ept' 55165fd4cb6SThomas Gleixner parameter. 55265fd4cb6SThomas Gleixner 55365fd4cb6SThomas Gleixner3.4. Nested virtual machines 55465fd4cb6SThomas Gleixner"""""""""""""""""""""""""""" 55565fd4cb6SThomas Gleixner 55665fd4cb6SThomas GleixnerWhen nested virtualization is in use, three operating systems are involved: 55765fd4cb6SThomas Gleixnerthe bare metal hypervisor, the nested hypervisor and the nested virtual 55865fd4cb6SThomas Gleixnermachine. VMENTER operations from the nested hypervisor into the nested 55965fd4cb6SThomas Gleixnerguest will always be processed by the bare metal hypervisor. If KVM is the 56065fd4cb6SThomas Gleixnerbare metal hypervisor it will: 56165fd4cb6SThomas Gleixner 56265fd4cb6SThomas Gleixner - Flush the L1D cache on every switch from the nested hypervisor to the 56365fd4cb6SThomas Gleixner nested virtual machine, so that the nested hypervisor's secrets are not 56465fd4cb6SThomas Gleixner exposed to the nested virtual machine; 56565fd4cb6SThomas Gleixner 56665fd4cb6SThomas Gleixner - Flush the L1D cache on every switch from the nested virtual machine to 56765fd4cb6SThomas Gleixner the nested hypervisor; this is a complex operation, and flushing the L1D 56865fd4cb6SThomas Gleixner cache avoids that the bare metal hypervisor's secrets are exposed to the 56965fd4cb6SThomas Gleixner nested virtual machine; 57065fd4cb6SThomas Gleixner 57165fd4cb6SThomas Gleixner - Instruct the nested hypervisor to not perform any L1D cache flush. This 57265fd4cb6SThomas Gleixner is an optimization to avoid double L1D flushing. 57365fd4cb6SThomas Gleixner 57465fd4cb6SThomas Gleixner 57565fd4cb6SThomas Gleixner.. _default_mitigations: 57665fd4cb6SThomas Gleixner 57765fd4cb6SThomas GleixnerDefault mitigations 57865fd4cb6SThomas Gleixner------------------- 57965fd4cb6SThomas Gleixner 58065fd4cb6SThomas Gleixner The kernel default mitigations for vulnerable processors are: 58165fd4cb6SThomas Gleixner 58265fd4cb6SThomas Gleixner - PTE inversion to protect against malicious user space. This is done 58365fd4cb6SThomas Gleixner unconditionally and cannot be controlled. The swap storage is limited 58465fd4cb6SThomas Gleixner to ~16TB. 58565fd4cb6SThomas Gleixner 58665fd4cb6SThomas Gleixner - L1D conditional flushing on VMENTER when EPT is enabled for 58765fd4cb6SThomas Gleixner a guest. 58865fd4cb6SThomas Gleixner 58965fd4cb6SThomas Gleixner The kernel does not by default enforce the disabling of SMT, which leaves 59065fd4cb6SThomas Gleixner SMT systems vulnerable when running untrusted guests with EPT enabled. 59165fd4cb6SThomas Gleixner 59265fd4cb6SThomas Gleixner The rationale for this choice is: 59365fd4cb6SThomas Gleixner 59465fd4cb6SThomas Gleixner - Force disabling SMT can break existing setups, especially with 59565fd4cb6SThomas Gleixner unattended updates. 59665fd4cb6SThomas Gleixner 59765fd4cb6SThomas Gleixner - If regular users run untrusted guests on their machine, then L1TF is 59865fd4cb6SThomas Gleixner just an add on to other malware which might be embedded in an untrusted 59965fd4cb6SThomas Gleixner guest, e.g. spam-bots or attacks on the local network. 60065fd4cb6SThomas Gleixner 60165fd4cb6SThomas Gleixner There is no technical way to prevent a user from running untrusted code 60265fd4cb6SThomas Gleixner on their machines blindly. 60365fd4cb6SThomas Gleixner 60465fd4cb6SThomas Gleixner - It's technically extremely unlikely and from today's knowledge even 60565fd4cb6SThomas Gleixner impossible that L1TF can be exploited via the most popular attack 60665fd4cb6SThomas Gleixner mechanisms like JavaScript because these mechanisms have no way to 60765fd4cb6SThomas Gleixner control PTEs. If this would be possible and not other mitigation would 60865fd4cb6SThomas Gleixner be possible, then the default might be different. 60965fd4cb6SThomas Gleixner 61065fd4cb6SThomas Gleixner - The administrators of cloud and hosting setups have to carefully 61165fd4cb6SThomas Gleixner analyze the risk for their scenarios and make the appropriate 61265fd4cb6SThomas Gleixner mitigation choices, which might even vary across their deployed 61365fd4cb6SThomas Gleixner machines and also result in other changes of their overall setup. 61465fd4cb6SThomas Gleixner There is no way for the kernel to provide a sensible default for this 61565fd4cb6SThomas Gleixner kind of scenarios. 616