xref: /openbmc/linux/Documentation/admin-guide/hw-vuln/l1tf.rst (revision 4b4193256c8d3bc3a5397b5cd9494c2ad386317d)
165fd4cb6SThomas GleixnerL1TF - L1 Terminal Fault
265fd4cb6SThomas Gleixner========================
365fd4cb6SThomas Gleixner
465fd4cb6SThomas GleixnerL1 Terminal Fault is a hardware vulnerability which allows unprivileged
565fd4cb6SThomas Gleixnerspeculative access to data which is available in the Level 1 Data Cache
665fd4cb6SThomas Gleixnerwhen the page table entry controlling the virtual address, which is used
765fd4cb6SThomas Gleixnerfor the access, has the Present bit cleared or other reserved bits set.
865fd4cb6SThomas Gleixner
965fd4cb6SThomas GleixnerAffected processors
1065fd4cb6SThomas Gleixner-------------------
1165fd4cb6SThomas Gleixner
1265fd4cb6SThomas GleixnerThis vulnerability affects a wide range of Intel processors. The
1365fd4cb6SThomas Gleixnervulnerability is not present on:
1465fd4cb6SThomas Gleixner
1565fd4cb6SThomas Gleixner   - Processors from AMD, Centaur and other non Intel vendors
1665fd4cb6SThomas Gleixner
1765fd4cb6SThomas Gleixner   - Older processor models, where the CPU family is < 6
1865fd4cb6SThomas Gleixner
1965fd4cb6SThomas Gleixner   - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
2065fd4cb6SThomas Gleixner     Penwell, Pineview, Silvermont, Airmont, Merrifield)
2165fd4cb6SThomas Gleixner
2265fd4cb6SThomas Gleixner   - The Intel XEON PHI family
2365fd4cb6SThomas Gleixner
2465fd4cb6SThomas Gleixner   - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
2565fd4cb6SThomas Gleixner     IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
2665fd4cb6SThomas Gleixner     by the Meltdown vulnerability either. These CPUs should become
2765fd4cb6SThomas Gleixner     available by end of 2018.
2865fd4cb6SThomas Gleixner
2965fd4cb6SThomas GleixnerWhether a processor is affected or not can be read out from the L1TF
3065fd4cb6SThomas Gleixnervulnerability file in sysfs. See :ref:`l1tf_sys_info`.
3165fd4cb6SThomas Gleixner
3265fd4cb6SThomas GleixnerRelated CVEs
3365fd4cb6SThomas Gleixner------------
3465fd4cb6SThomas Gleixner
3565fd4cb6SThomas GleixnerThe following CVE entries are related to the L1TF vulnerability:
3665fd4cb6SThomas Gleixner
3765fd4cb6SThomas Gleixner   =============  =================  ==============================
3865fd4cb6SThomas Gleixner   CVE-2018-3615  L1 Terminal Fault  SGX related aspects
3965fd4cb6SThomas Gleixner   CVE-2018-3620  L1 Terminal Fault  OS, SMM related aspects
4065fd4cb6SThomas Gleixner   CVE-2018-3646  L1 Terminal Fault  Virtualization related aspects
4165fd4cb6SThomas Gleixner   =============  =================  ==============================
4265fd4cb6SThomas Gleixner
4365fd4cb6SThomas GleixnerProblem
4465fd4cb6SThomas Gleixner-------
4565fd4cb6SThomas Gleixner
4665fd4cb6SThomas GleixnerIf an instruction accesses a virtual address for which the relevant page
4765fd4cb6SThomas Gleixnertable entry (PTE) has the Present bit cleared or other reserved bits set,
4865fd4cb6SThomas Gleixnerthen speculative execution ignores the invalid PTE and loads the referenced
4965fd4cb6SThomas Gleixnerdata if it is present in the Level 1 Data Cache, as if the page referenced
5065fd4cb6SThomas Gleixnerby the address bits in the PTE was still present and accessible.
5165fd4cb6SThomas Gleixner
5265fd4cb6SThomas GleixnerWhile this is a purely speculative mechanism and the instruction will raise
5365fd4cb6SThomas Gleixnera page fault when it is retired eventually, the pure act of loading the
5465fd4cb6SThomas Gleixnerdata and making it available to other speculative instructions opens up the
5565fd4cb6SThomas Gleixneropportunity for side channel attacks to unprivileged malicious code,
5665fd4cb6SThomas Gleixnersimilar to the Meltdown attack.
5765fd4cb6SThomas Gleixner
5865fd4cb6SThomas GleixnerWhile Meltdown breaks the user space to kernel space protection, L1TF
5965fd4cb6SThomas Gleixnerallows to attack any physical memory address in the system and the attack
6065fd4cb6SThomas Gleixnerworks across all protection domains. It allows an attack of SGX and also
6165fd4cb6SThomas Gleixnerworks from inside virtual machines because the speculation bypasses the
6265fd4cb6SThomas Gleixnerextended page table (EPT) protection mechanism.
6365fd4cb6SThomas Gleixner
6465fd4cb6SThomas Gleixner
6565fd4cb6SThomas GleixnerAttack scenarios
6665fd4cb6SThomas Gleixner----------------
6765fd4cb6SThomas Gleixner
6865fd4cb6SThomas Gleixner1. Malicious user space
6965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^
7065fd4cb6SThomas Gleixner
7165fd4cb6SThomas Gleixner   Operating Systems store arbitrary information in the address bits of a
7265fd4cb6SThomas Gleixner   PTE which is marked non present. This allows a malicious user space
7365fd4cb6SThomas Gleixner   application to attack the physical memory to which these PTEs resolve.
7465fd4cb6SThomas Gleixner   In some cases user-space can maliciously influence the information
7565fd4cb6SThomas Gleixner   encoded in the address bits of the PTE, thus making attacks more
7665fd4cb6SThomas Gleixner   deterministic and more practical.
7765fd4cb6SThomas Gleixner
7865fd4cb6SThomas Gleixner   The Linux kernel contains a mitigation for this attack vector, PTE
7965fd4cb6SThomas Gleixner   inversion, which is permanently enabled and has no performance
8065fd4cb6SThomas Gleixner   impact. The kernel ensures that the address bits of PTEs, which are not
8165fd4cb6SThomas Gleixner   marked present, never point to cacheable physical memory space.
8265fd4cb6SThomas Gleixner
8365fd4cb6SThomas Gleixner   A system with an up to date kernel is protected against attacks from
8465fd4cb6SThomas Gleixner   malicious user space applications.
8565fd4cb6SThomas Gleixner
8665fd4cb6SThomas Gleixner2. Malicious guest in a virtual machine
8765fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8865fd4cb6SThomas Gleixner
8965fd4cb6SThomas Gleixner   The fact that L1TF breaks all domain protections allows malicious guest
9065fd4cb6SThomas Gleixner   OSes, which can control the PTEs directly, and malicious guest user
9165fd4cb6SThomas Gleixner   space applications, which run on an unprotected guest kernel lacking the
9265fd4cb6SThomas Gleixner   PTE inversion mitigation for L1TF, to attack physical host memory.
9365fd4cb6SThomas Gleixner
9465fd4cb6SThomas Gleixner   A special aspect of L1TF in the context of virtualization is symmetric
9565fd4cb6SThomas Gleixner   multi threading (SMT). The Intel implementation of SMT is called
9665fd4cb6SThomas Gleixner   HyperThreading. The fact that Hyperthreads on the affected processors
9765fd4cb6SThomas Gleixner   share the L1 Data Cache (L1D) is important for this. As the flaw allows
9865fd4cb6SThomas Gleixner   only to attack data which is present in L1D, a malicious guest running
9965fd4cb6SThomas Gleixner   on one Hyperthread can attack the data which is brought into the L1D by
10065fd4cb6SThomas Gleixner   the context which runs on the sibling Hyperthread of the same physical
10165fd4cb6SThomas Gleixner   core. This context can be host OS, host user space or a different guest.
10265fd4cb6SThomas Gleixner
10365fd4cb6SThomas Gleixner   If the processor does not support Extended Page Tables, the attack is
10465fd4cb6SThomas Gleixner   only possible, when the hypervisor does not sanitize the content of the
10565fd4cb6SThomas Gleixner   effective (shadow) page tables.
10665fd4cb6SThomas Gleixner
10765fd4cb6SThomas Gleixner   While solutions exist to mitigate these attack vectors fully, these
10865fd4cb6SThomas Gleixner   mitigations are not enabled by default in the Linux kernel because they
10965fd4cb6SThomas Gleixner   can affect performance significantly. The kernel provides several
11065fd4cb6SThomas Gleixner   mechanisms which can be utilized to address the problem depending on the
11165fd4cb6SThomas Gleixner   deployment scenario. The mitigations, their protection scope and impact
11265fd4cb6SThomas Gleixner   are described in the next sections.
11365fd4cb6SThomas Gleixner
11465fd4cb6SThomas Gleixner   The default mitigations and the rationale for choosing them are explained
11565fd4cb6SThomas Gleixner   at the end of this document. See :ref:`default_mitigations`.
11665fd4cb6SThomas Gleixner
11765fd4cb6SThomas Gleixner.. _l1tf_sys_info:
11865fd4cb6SThomas Gleixner
11965fd4cb6SThomas GleixnerL1TF system information
12065fd4cb6SThomas Gleixner-----------------------
12165fd4cb6SThomas Gleixner
12265fd4cb6SThomas GleixnerThe Linux kernel provides a sysfs interface to enumerate the current L1TF
12365fd4cb6SThomas Gleixnerstatus of the system: whether the system is vulnerable, and which
12465fd4cb6SThomas Gleixnermitigations are active. The relevant sysfs file is:
12565fd4cb6SThomas Gleixner
12665fd4cb6SThomas Gleixner/sys/devices/system/cpu/vulnerabilities/l1tf
12765fd4cb6SThomas Gleixner
12865fd4cb6SThomas GleixnerThe possible values in this file are:
12965fd4cb6SThomas Gleixner
13065fd4cb6SThomas Gleixner  ===========================   ===============================
13165fd4cb6SThomas Gleixner  'Not affected'		The processor is not vulnerable
13265fd4cb6SThomas Gleixner  'Mitigation: PTE Inversion'	The host protection is active
13365fd4cb6SThomas Gleixner  ===========================   ===============================
13465fd4cb6SThomas Gleixner
13565fd4cb6SThomas GleixnerIf KVM/VMX is enabled and the processor is vulnerable then the following
13665fd4cb6SThomas Gleixnerinformation is appended to the 'Mitigation: PTE Inversion' part:
13765fd4cb6SThomas Gleixner
13865fd4cb6SThomas Gleixner  - SMT status:
13965fd4cb6SThomas Gleixner
14065fd4cb6SThomas Gleixner    =====================  ================
14165fd4cb6SThomas Gleixner    'VMX: SMT vulnerable'  SMT is enabled
14265fd4cb6SThomas Gleixner    'VMX: SMT disabled'    SMT is disabled
14365fd4cb6SThomas Gleixner    =====================  ================
14465fd4cb6SThomas Gleixner
14565fd4cb6SThomas Gleixner  - L1D Flush mode:
14665fd4cb6SThomas Gleixner
14765fd4cb6SThomas Gleixner    ================================  ====================================
14865fd4cb6SThomas Gleixner    'L1D vulnerable'		      L1D flushing is disabled
14965fd4cb6SThomas Gleixner
15065fd4cb6SThomas Gleixner    'L1D conditional cache flushes'   L1D flush is conditionally enabled
15165fd4cb6SThomas Gleixner
15265fd4cb6SThomas Gleixner    'L1D cache flushes'		      L1D flush is unconditionally enabled
15365fd4cb6SThomas Gleixner    ================================  ====================================
15465fd4cb6SThomas Gleixner
15565fd4cb6SThomas GleixnerThe resulting grade of protection is discussed in the following sections.
15665fd4cb6SThomas Gleixner
15765fd4cb6SThomas Gleixner
15865fd4cb6SThomas GleixnerHost mitigation mechanism
15965fd4cb6SThomas Gleixner-------------------------
16065fd4cb6SThomas Gleixner
16165fd4cb6SThomas GleixnerThe kernel is unconditionally protected against L1TF attacks from malicious
16265fd4cb6SThomas Gleixneruser space running on the host.
16365fd4cb6SThomas Gleixner
16465fd4cb6SThomas Gleixner
16565fd4cb6SThomas GleixnerGuest mitigation mechanisms
16665fd4cb6SThomas Gleixner---------------------------
16765fd4cb6SThomas Gleixner
16865fd4cb6SThomas Gleixner.. _l1d_flush:
16965fd4cb6SThomas Gleixner
17065fd4cb6SThomas Gleixner1. L1D flush on VMENTER
17165fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^
17265fd4cb6SThomas Gleixner
17365fd4cb6SThomas Gleixner   To make sure that a guest cannot attack data which is present in the L1D
17465fd4cb6SThomas Gleixner   the hypervisor flushes the L1D before entering the guest.
17565fd4cb6SThomas Gleixner
17665fd4cb6SThomas Gleixner   Flushing the L1D evicts not only the data which should not be accessed
17765fd4cb6SThomas Gleixner   by a potentially malicious guest, it also flushes the guest
17865fd4cb6SThomas Gleixner   data. Flushing the L1D has a performance impact as the processor has to
17965fd4cb6SThomas Gleixner   bring the flushed guest data back into the L1D. Depending on the
18065fd4cb6SThomas Gleixner   frequency of VMEXIT/VMENTER and the type of computations in the guest
18165fd4cb6SThomas Gleixner   performance degradation in the range of 1% to 50% has been observed. For
18265fd4cb6SThomas Gleixner   scenarios where guest VMEXIT/VMENTER are rare the performance impact is
18365fd4cb6SThomas Gleixner   minimal. Virtio and mechanisms like posted interrupts are designed to
18465fd4cb6SThomas Gleixner   confine the VMEXITs to a bare minimum, but specific configurations and
18565fd4cb6SThomas Gleixner   application scenarios might still suffer from a high VMEXIT rate.
18665fd4cb6SThomas Gleixner
18765fd4cb6SThomas Gleixner   The kernel provides two L1D flush modes:
18865fd4cb6SThomas Gleixner    - conditional ('cond')
18965fd4cb6SThomas Gleixner    - unconditional ('always')
19065fd4cb6SThomas Gleixner
19165fd4cb6SThomas Gleixner   The conditional mode avoids L1D flushing after VMEXITs which execute
19265fd4cb6SThomas Gleixner   only audited code paths before the corresponding VMENTER. These code
19365fd4cb6SThomas Gleixner   paths have been verified that they cannot expose secrets or other
19465fd4cb6SThomas Gleixner   interesting data to an attacker, but they can leak information about the
19565fd4cb6SThomas Gleixner   address space layout of the hypervisor.
19665fd4cb6SThomas Gleixner
19765fd4cb6SThomas Gleixner   Unconditional mode flushes L1D on all VMENTER invocations and provides
19865fd4cb6SThomas Gleixner   maximum protection. It has a higher overhead than the conditional
19965fd4cb6SThomas Gleixner   mode. The overhead cannot be quantified correctly as it depends on the
20065fd4cb6SThomas Gleixner   workload scenario and the resulting number of VMEXITs.
20165fd4cb6SThomas Gleixner
20265fd4cb6SThomas Gleixner   The general recommendation is to enable L1D flush on VMENTER. The kernel
20365fd4cb6SThomas Gleixner   defaults to conditional mode on affected processors.
20465fd4cb6SThomas Gleixner
20565fd4cb6SThomas Gleixner   **Note**, that L1D flush does not prevent the SMT problem because the
20665fd4cb6SThomas Gleixner   sibling thread will also bring back its data into the L1D which makes it
20765fd4cb6SThomas Gleixner   attackable again.
20865fd4cb6SThomas Gleixner
20965fd4cb6SThomas Gleixner   L1D flush can be controlled by the administrator via the kernel command
21065fd4cb6SThomas Gleixner   line and sysfs control files. See :ref:`mitigation_control_command_line`
21165fd4cb6SThomas Gleixner   and :ref:`mitigation_control_kvm`.
21265fd4cb6SThomas Gleixner
21365fd4cb6SThomas Gleixner.. _guest_confinement:
21465fd4cb6SThomas Gleixner
21565fd4cb6SThomas Gleixner2. Guest VCPU confinement to dedicated physical cores
21665fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21765fd4cb6SThomas Gleixner
21865fd4cb6SThomas Gleixner   To address the SMT problem, it is possible to make a guest or a group of
21965fd4cb6SThomas Gleixner   guests affine to one or more physical cores. The proper mechanism for
22065fd4cb6SThomas Gleixner   that is to utilize exclusive cpusets to ensure that no other guest or
22165fd4cb6SThomas Gleixner   host tasks can run on these cores.
22265fd4cb6SThomas Gleixner
22365fd4cb6SThomas Gleixner   If only a single guest or related guests run on sibling SMT threads on
22465fd4cb6SThomas Gleixner   the same physical core then they can only attack their own memory and
22565fd4cb6SThomas Gleixner   restricted parts of the host memory.
22665fd4cb6SThomas Gleixner
22765fd4cb6SThomas Gleixner   Host memory is attackable, when one of the sibling SMT threads runs in
22865fd4cb6SThomas Gleixner   host OS (hypervisor) context and the other in guest context. The amount
22965fd4cb6SThomas Gleixner   of valuable information from the host OS context depends on the context
23065fd4cb6SThomas Gleixner   which the host OS executes, i.e. interrupts, soft interrupts and kernel
23165fd4cb6SThomas Gleixner   threads. The amount of valuable data from these contexts cannot be
23265fd4cb6SThomas Gleixner   declared as non-interesting for an attacker without deep inspection of
23365fd4cb6SThomas Gleixner   the code.
23465fd4cb6SThomas Gleixner
23565fd4cb6SThomas Gleixner   **Note**, that assigning guests to a fixed set of physical cores affects
23665fd4cb6SThomas Gleixner   the ability of the scheduler to do load balancing and might have
23765fd4cb6SThomas Gleixner   negative effects on CPU utilization depending on the hosting
23865fd4cb6SThomas Gleixner   scenario. Disabling SMT might be a viable alternative for particular
23965fd4cb6SThomas Gleixner   scenarios.
24065fd4cb6SThomas Gleixner
24165fd4cb6SThomas Gleixner   For further information about confining guests to a single or to a group
24265fd4cb6SThomas Gleixner   of cores consult the cpusets documentation:
24365fd4cb6SThomas Gleixner
2444f4cfa6cSMauro Carvalho Chehab   https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
24565fd4cb6SThomas Gleixner
24665fd4cb6SThomas Gleixner.. _interrupt_isolation:
24765fd4cb6SThomas Gleixner
24865fd4cb6SThomas Gleixner3. Interrupt affinity
24965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^
25065fd4cb6SThomas Gleixner
25165fd4cb6SThomas Gleixner   Interrupts can be made affine to logical CPUs. This is not universally
25265fd4cb6SThomas Gleixner   true because there are types of interrupts which are truly per CPU
25365fd4cb6SThomas Gleixner   interrupts, e.g. the local timer interrupt. Aside of that multi queue
25465fd4cb6SThomas Gleixner   devices affine their interrupts to single CPUs or groups of CPUs per
25565fd4cb6SThomas Gleixner   queue without allowing the administrator to control the affinities.
25665fd4cb6SThomas Gleixner
25765fd4cb6SThomas Gleixner   Moving the interrupts, which can be affinity controlled, away from CPUs
25865fd4cb6SThomas Gleixner   which run untrusted guests, reduces the attack vector space.
25965fd4cb6SThomas Gleixner
26065fd4cb6SThomas Gleixner   Whether the interrupts with are affine to CPUs, which run untrusted
26165fd4cb6SThomas Gleixner   guests, provide interesting data for an attacker depends on the system
26265fd4cb6SThomas Gleixner   configuration and the scenarios which run on the system. While for some
26365fd4cb6SThomas Gleixner   of the interrupts it can be assumed that they won't expose interesting
26465fd4cb6SThomas Gleixner   information beyond exposing hints about the host OS memory layout, there
26565fd4cb6SThomas Gleixner   is no way to make general assumptions.
26665fd4cb6SThomas Gleixner
26765fd4cb6SThomas Gleixner   Interrupt affinity can be controlled by the administrator via the
26865fd4cb6SThomas Gleixner   /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
26965fd4cb6SThomas Gleixner   available at:
27065fd4cb6SThomas Gleixner
271*e00b0ab8SMauro Carvalho Chehab   https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
27265fd4cb6SThomas Gleixner
27365fd4cb6SThomas Gleixner.. _smt_control:
27465fd4cb6SThomas Gleixner
27565fd4cb6SThomas Gleixner4. SMT control
27665fd4cb6SThomas Gleixner^^^^^^^^^^^^^^
27765fd4cb6SThomas Gleixner
27865fd4cb6SThomas Gleixner   To prevent the SMT issues of L1TF it might be necessary to disable SMT
27965fd4cb6SThomas Gleixner   completely. Disabling SMT can have a significant performance impact, but
28065fd4cb6SThomas Gleixner   the impact depends on the hosting scenario and the type of workloads.
28165fd4cb6SThomas Gleixner   The impact of disabling SMT needs also to be weighted against the impact
28265fd4cb6SThomas Gleixner   of other mitigation solutions like confining guests to dedicated cores.
28365fd4cb6SThomas Gleixner
28465fd4cb6SThomas Gleixner   The kernel provides a sysfs interface to retrieve the status of SMT and
28565fd4cb6SThomas Gleixner   to control it. It also provides a kernel command line interface to
28665fd4cb6SThomas Gleixner   control SMT.
28765fd4cb6SThomas Gleixner
28865fd4cb6SThomas Gleixner   The kernel command line interface consists of the following options:
28965fd4cb6SThomas Gleixner
29065fd4cb6SThomas Gleixner     =========== ==========================================================
29165fd4cb6SThomas Gleixner     nosmt	 Affects the bring up of the secondary CPUs during boot. The
29265fd4cb6SThomas Gleixner		 kernel tries to bring all present CPUs online during the
29365fd4cb6SThomas Gleixner		 boot process. "nosmt" makes sure that from each physical
29465fd4cb6SThomas Gleixner		 core only one - the so called primary (hyper) thread is
29565fd4cb6SThomas Gleixner		 activated. Due to a design flaw of Intel processors related
29665fd4cb6SThomas Gleixner		 to Machine Check Exceptions the non primary siblings have
29765fd4cb6SThomas Gleixner		 to be brought up at least partially and are then shut down
29865fd4cb6SThomas Gleixner		 again.  "nosmt" can be undone via the sysfs interface.
29965fd4cb6SThomas Gleixner
30065fd4cb6SThomas Gleixner     nosmt=force Has the same effect as "nosmt" but it does not allow to
30165fd4cb6SThomas Gleixner		 undo the SMT disable via the sysfs interface.
30265fd4cb6SThomas Gleixner     =========== ==========================================================
30365fd4cb6SThomas Gleixner
30465fd4cb6SThomas Gleixner   The sysfs interface provides two files:
30565fd4cb6SThomas Gleixner
30665fd4cb6SThomas Gleixner   - /sys/devices/system/cpu/smt/control
30765fd4cb6SThomas Gleixner   - /sys/devices/system/cpu/smt/active
30865fd4cb6SThomas Gleixner
30965fd4cb6SThomas Gleixner   /sys/devices/system/cpu/smt/control:
31065fd4cb6SThomas Gleixner
31165fd4cb6SThomas Gleixner     This file allows to read out the SMT control state and provides the
31265fd4cb6SThomas Gleixner     ability to disable or (re)enable SMT. The possible states are:
31365fd4cb6SThomas Gleixner
31465fd4cb6SThomas Gleixner	==============  ===================================================
31565fd4cb6SThomas Gleixner	on		SMT is supported by the CPU and enabled. All
31665fd4cb6SThomas Gleixner			logical CPUs can be onlined and offlined without
31765fd4cb6SThomas Gleixner			restrictions.
31865fd4cb6SThomas Gleixner
31965fd4cb6SThomas Gleixner	off		SMT is supported by the CPU and disabled. Only
32065fd4cb6SThomas Gleixner			the so called primary SMT threads can be onlined
32165fd4cb6SThomas Gleixner			and offlined without restrictions. An attempt to
32265fd4cb6SThomas Gleixner			online a non-primary sibling is rejected
32365fd4cb6SThomas Gleixner
32465fd4cb6SThomas Gleixner	forceoff	Same as 'off' but the state cannot be controlled.
32565fd4cb6SThomas Gleixner			Attempts to write to the control file are rejected.
32665fd4cb6SThomas Gleixner
32765fd4cb6SThomas Gleixner	notsupported	The processor does not support SMT. It's therefore
32865fd4cb6SThomas Gleixner			not affected by the SMT implications of L1TF.
32965fd4cb6SThomas Gleixner			Attempts to write to the control file are rejected.
33065fd4cb6SThomas Gleixner	==============  ===================================================
33165fd4cb6SThomas Gleixner
33265fd4cb6SThomas Gleixner     The possible states which can be written into this file to control SMT
33365fd4cb6SThomas Gleixner     state are:
33465fd4cb6SThomas Gleixner
33565fd4cb6SThomas Gleixner     - on
33665fd4cb6SThomas Gleixner     - off
33765fd4cb6SThomas Gleixner     - forceoff
33865fd4cb6SThomas Gleixner
33965fd4cb6SThomas Gleixner   /sys/devices/system/cpu/smt/active:
34065fd4cb6SThomas Gleixner
34165fd4cb6SThomas Gleixner     This file reports whether SMT is enabled and active, i.e. if on any
34265fd4cb6SThomas Gleixner     physical core two or more sibling threads are online.
34365fd4cb6SThomas Gleixner
34465fd4cb6SThomas Gleixner   SMT control is also possible at boot time via the l1tf kernel command
34565fd4cb6SThomas Gleixner   line parameter in combination with L1D flush control. See
34665fd4cb6SThomas Gleixner   :ref:`mitigation_control_command_line`.
34765fd4cb6SThomas Gleixner
34865fd4cb6SThomas Gleixner5. Disabling EPT
34965fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^
35065fd4cb6SThomas Gleixner
35165fd4cb6SThomas Gleixner  Disabling EPT for virtual machines provides full mitigation for L1TF even
35265fd4cb6SThomas Gleixner  with SMT enabled, because the effective page tables for guests are
35365fd4cb6SThomas Gleixner  managed and sanitized by the hypervisor. Though disabling EPT has a
35465fd4cb6SThomas Gleixner  significant performance impact especially when the Meltdown mitigation
35565fd4cb6SThomas Gleixner  KPTI is enabled.
35665fd4cb6SThomas Gleixner
35765fd4cb6SThomas Gleixner  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
35865fd4cb6SThomas Gleixner
35965fd4cb6SThomas GleixnerThere is ongoing research and development for new mitigation mechanisms to
36065fd4cb6SThomas Gleixneraddress the performance impact of disabling SMT or EPT.
36165fd4cb6SThomas Gleixner
36265fd4cb6SThomas Gleixner.. _mitigation_control_command_line:
36365fd4cb6SThomas Gleixner
36465fd4cb6SThomas GleixnerMitigation control on the kernel command line
36565fd4cb6SThomas Gleixner---------------------------------------------
36665fd4cb6SThomas Gleixner
36765fd4cb6SThomas GleixnerThe kernel command line allows to control the L1TF mitigations at boot
36865fd4cb6SThomas Gleixnertime with the option "l1tf=". The valid arguments for this option are:
36965fd4cb6SThomas Gleixner
37065fd4cb6SThomas Gleixner  ============  =============================================================
37165fd4cb6SThomas Gleixner  full		Provides all available mitigations for the L1TF
37265fd4cb6SThomas Gleixner		vulnerability. Disables SMT and enables all mitigations in
37365fd4cb6SThomas Gleixner		the hypervisors, i.e. unconditional L1D flushing
37465fd4cb6SThomas Gleixner
37565fd4cb6SThomas Gleixner		SMT control and L1D flush control via the sysfs interface
37665fd4cb6SThomas Gleixner		is still possible after boot.  Hypervisors will issue a
37765fd4cb6SThomas Gleixner		warning when the first VM is started in a potentially
37865fd4cb6SThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
37965fd4cb6SThomas Gleixner		disabled.
38065fd4cb6SThomas Gleixner
38165fd4cb6SThomas Gleixner  full,force	Same as 'full', but disables SMT and L1D flush runtime
38265fd4cb6SThomas Gleixner		control. Implies the 'nosmt=force' command line option.
38365fd4cb6SThomas Gleixner		(i.e. sysfs control of SMT is disabled.)
38465fd4cb6SThomas Gleixner
38565fd4cb6SThomas Gleixner  flush		Leaves SMT enabled and enables the default hypervisor
38665fd4cb6SThomas Gleixner		mitigation, i.e. conditional L1D flushing
38765fd4cb6SThomas Gleixner
38865fd4cb6SThomas Gleixner		SMT control and L1D flush control via the sysfs interface
38965fd4cb6SThomas Gleixner		is still possible after boot.  Hypervisors will issue a
39065fd4cb6SThomas Gleixner		warning when the first VM is started in a potentially
39165fd4cb6SThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
39265fd4cb6SThomas Gleixner		disabled.
39365fd4cb6SThomas Gleixner
39465fd4cb6SThomas Gleixner  flush,nosmt	Disables SMT and enables the default hypervisor mitigation,
39565fd4cb6SThomas Gleixner		i.e. conditional L1D flushing.
39665fd4cb6SThomas Gleixner
39765fd4cb6SThomas Gleixner		SMT control and L1D flush control via the sysfs interface
39865fd4cb6SThomas Gleixner		is still possible after boot.  Hypervisors will issue a
39965fd4cb6SThomas Gleixner		warning when the first VM is started in a potentially
40065fd4cb6SThomas Gleixner		insecure configuration, i.e. SMT enabled or L1D flush
40165fd4cb6SThomas Gleixner		disabled.
40265fd4cb6SThomas Gleixner
40365fd4cb6SThomas Gleixner  flush,nowarn	Same as 'flush', but hypervisors will not warn when a VM is
40465fd4cb6SThomas Gleixner		started in a potentially insecure configuration.
40565fd4cb6SThomas Gleixner
40665fd4cb6SThomas Gleixner  off		Disables hypervisor mitigations and doesn't emit any
40765fd4cb6SThomas Gleixner		warnings.
40865fd4cb6SThomas Gleixner		It also drops the swap size and available RAM limit restrictions
40965fd4cb6SThomas Gleixner		on both hypervisor and bare metal.
41065fd4cb6SThomas Gleixner
41165fd4cb6SThomas Gleixner  ============  =============================================================
41265fd4cb6SThomas Gleixner
41365fd4cb6SThomas GleixnerThe default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
41465fd4cb6SThomas Gleixner
41565fd4cb6SThomas Gleixner
41665fd4cb6SThomas Gleixner.. _mitigation_control_kvm:
41765fd4cb6SThomas Gleixner
41865fd4cb6SThomas GleixnerMitigation control for KVM - module parameter
41965fd4cb6SThomas Gleixner-------------------------------------------------------------
42065fd4cb6SThomas Gleixner
42165fd4cb6SThomas GleixnerThe KVM hypervisor mitigation mechanism, flushing the L1D cache when
42265fd4cb6SThomas Gleixnerentering a guest, can be controlled with a module parameter.
42365fd4cb6SThomas Gleixner
42465fd4cb6SThomas GleixnerThe option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
42565fd4cb6SThomas Gleixnerfollowing arguments:
42665fd4cb6SThomas Gleixner
42765fd4cb6SThomas Gleixner  ============  ==============================================================
42865fd4cb6SThomas Gleixner  always	L1D cache flush on every VMENTER.
42965fd4cb6SThomas Gleixner
43065fd4cb6SThomas Gleixner  cond		Flush L1D on VMENTER only when the code between VMEXIT and
43165fd4cb6SThomas Gleixner		VMENTER can leak host memory which is considered
43265fd4cb6SThomas Gleixner		interesting for an attacker. This still can leak host memory
43365fd4cb6SThomas Gleixner		which allows e.g. to determine the hosts address space layout.
43465fd4cb6SThomas Gleixner
43565fd4cb6SThomas Gleixner  never		Disables the mitigation
43665fd4cb6SThomas Gleixner  ============  ==============================================================
43765fd4cb6SThomas Gleixner
43865fd4cb6SThomas GleixnerThe parameter can be provided on the kernel command line, as a module
43965fd4cb6SThomas Gleixnerparameter when loading the modules and at runtime modified via the sysfs
44065fd4cb6SThomas Gleixnerfile:
44165fd4cb6SThomas Gleixner
44265fd4cb6SThomas Gleixner/sys/module/kvm_intel/parameters/vmentry_l1d_flush
44365fd4cb6SThomas Gleixner
44465fd4cb6SThomas GleixnerThe default is 'cond'. If 'l1tf=full,force' is given on the kernel command
44565fd4cb6SThomas Gleixnerline, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
44665fd4cb6SThomas Gleixnermodule parameter is ignored and writes to the sysfs file are rejected.
44765fd4cb6SThomas Gleixner
4485999bbe7SThomas Gleixner.. _mitigation_selection:
44965fd4cb6SThomas Gleixner
45065fd4cb6SThomas GleixnerMitigation selection guide
45165fd4cb6SThomas Gleixner--------------------------
45265fd4cb6SThomas Gleixner
45365fd4cb6SThomas Gleixner1. No virtualization in use
45465fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^
45565fd4cb6SThomas Gleixner
45665fd4cb6SThomas Gleixner   The system is protected by the kernel unconditionally and no further
45765fd4cb6SThomas Gleixner   action is required.
45865fd4cb6SThomas Gleixner
45965fd4cb6SThomas Gleixner2. Virtualization with trusted guests
46065fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46165fd4cb6SThomas Gleixner
46265fd4cb6SThomas Gleixner   If the guest comes from a trusted source and the guest OS kernel is
46365fd4cb6SThomas Gleixner   guaranteed to have the L1TF mitigations in place the system is fully
46465fd4cb6SThomas Gleixner   protected against L1TF and no further action is required.
46565fd4cb6SThomas Gleixner
46665fd4cb6SThomas Gleixner   To avoid the overhead of the default L1D flushing on VMENTER the
46765fd4cb6SThomas Gleixner   administrator can disable the flushing via the kernel command line and
46865fd4cb6SThomas Gleixner   sysfs control files. See :ref:`mitigation_control_command_line` and
46965fd4cb6SThomas Gleixner   :ref:`mitigation_control_kvm`.
47065fd4cb6SThomas Gleixner
47165fd4cb6SThomas Gleixner
47265fd4cb6SThomas Gleixner3. Virtualization with untrusted guests
47365fd4cb6SThomas Gleixner^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47465fd4cb6SThomas Gleixner
47565fd4cb6SThomas Gleixner3.1. SMT not supported or disabled
47665fd4cb6SThomas Gleixner""""""""""""""""""""""""""""""""""
47765fd4cb6SThomas Gleixner
47865fd4cb6SThomas Gleixner  If SMT is not supported by the processor or disabled in the BIOS or by
47965fd4cb6SThomas Gleixner  the kernel, it's only required to enforce L1D flushing on VMENTER.
48065fd4cb6SThomas Gleixner
48165fd4cb6SThomas Gleixner  Conditional L1D flushing is the default behaviour and can be tuned. See
48265fd4cb6SThomas Gleixner  :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
48365fd4cb6SThomas Gleixner
48465fd4cb6SThomas Gleixner3.2. EPT not supported or disabled
48565fd4cb6SThomas Gleixner""""""""""""""""""""""""""""""""""
48665fd4cb6SThomas Gleixner
48765fd4cb6SThomas Gleixner  If EPT is not supported by the processor or disabled in the hypervisor,
48865fd4cb6SThomas Gleixner  the system is fully protected. SMT can stay enabled and L1D flushing on
48965fd4cb6SThomas Gleixner  VMENTER is not required.
49065fd4cb6SThomas Gleixner
49165fd4cb6SThomas Gleixner  EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
49265fd4cb6SThomas Gleixner
49365fd4cb6SThomas Gleixner3.3. SMT and EPT supported and active
49465fd4cb6SThomas Gleixner"""""""""""""""""""""""""""""""""""""
49565fd4cb6SThomas Gleixner
49665fd4cb6SThomas Gleixner  If SMT and EPT are supported and active then various degrees of
49765fd4cb6SThomas Gleixner  mitigations can be employed:
49865fd4cb6SThomas Gleixner
49965fd4cb6SThomas Gleixner  - L1D flushing on VMENTER:
50065fd4cb6SThomas Gleixner
50165fd4cb6SThomas Gleixner    L1D flushing on VMENTER is the minimal protection requirement, but it
50265fd4cb6SThomas Gleixner    is only potent in combination with other mitigation methods.
50365fd4cb6SThomas Gleixner
50465fd4cb6SThomas Gleixner    Conditional L1D flushing is the default behaviour and can be tuned. See
50565fd4cb6SThomas Gleixner    :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
50665fd4cb6SThomas Gleixner
50765fd4cb6SThomas Gleixner  - Guest confinement:
50865fd4cb6SThomas Gleixner
50965fd4cb6SThomas Gleixner    Confinement of guests to a single or a group of physical cores which
51065fd4cb6SThomas Gleixner    are not running any other processes, can reduce the attack surface
51165fd4cb6SThomas Gleixner    significantly, but interrupts, soft interrupts and kernel threads can
51265fd4cb6SThomas Gleixner    still expose valuable data to a potential attacker. See
51365fd4cb6SThomas Gleixner    :ref:`guest_confinement`.
51465fd4cb6SThomas Gleixner
51565fd4cb6SThomas Gleixner  - Interrupt isolation:
51665fd4cb6SThomas Gleixner
51765fd4cb6SThomas Gleixner    Isolating the guest CPUs from interrupts can reduce the attack surface
51865fd4cb6SThomas Gleixner    further, but still allows a malicious guest to explore a limited amount
51965fd4cb6SThomas Gleixner    of host physical memory. This can at least be used to gain knowledge
52065fd4cb6SThomas Gleixner    about the host address space layout. The interrupts which have a fixed
52165fd4cb6SThomas Gleixner    affinity to the CPUs which run the untrusted guests can depending on
52265fd4cb6SThomas Gleixner    the scenario still trigger soft interrupts and schedule kernel threads
52365fd4cb6SThomas Gleixner    which might expose valuable information. See
52465fd4cb6SThomas Gleixner    :ref:`interrupt_isolation`.
52565fd4cb6SThomas Gleixner
52665fd4cb6SThomas GleixnerThe above three mitigation methods combined can provide protection to a
52765fd4cb6SThomas Gleixnercertain degree, but the risk of the remaining attack surface has to be
52865fd4cb6SThomas Gleixnercarefully analyzed. For full protection the following methods are
52965fd4cb6SThomas Gleixneravailable:
53065fd4cb6SThomas Gleixner
53165fd4cb6SThomas Gleixner  - Disabling SMT:
53265fd4cb6SThomas Gleixner
53365fd4cb6SThomas Gleixner    Disabling SMT and enforcing the L1D flushing provides the maximum
53465fd4cb6SThomas Gleixner    amount of protection. This mitigation is not depending on any of the
53565fd4cb6SThomas Gleixner    above mitigation methods.
53665fd4cb6SThomas Gleixner
53765fd4cb6SThomas Gleixner    SMT control and L1D flushing can be tuned by the command line
53865fd4cb6SThomas Gleixner    parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
53965fd4cb6SThomas Gleixner    time with the matching sysfs control files. See :ref:`smt_control`,
54065fd4cb6SThomas Gleixner    :ref:`mitigation_control_command_line` and
54165fd4cb6SThomas Gleixner    :ref:`mitigation_control_kvm`.
54265fd4cb6SThomas Gleixner
54365fd4cb6SThomas Gleixner  - Disabling EPT:
54465fd4cb6SThomas Gleixner
54565fd4cb6SThomas Gleixner    Disabling EPT provides the maximum amount of protection as well. It is
54665fd4cb6SThomas Gleixner    not depending on any of the above mitigation methods. SMT can stay
54765fd4cb6SThomas Gleixner    enabled and L1D flushing is not required, but the performance impact is
54865fd4cb6SThomas Gleixner    significant.
54965fd4cb6SThomas Gleixner
55065fd4cb6SThomas Gleixner    EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
55165fd4cb6SThomas Gleixner    parameter.
55265fd4cb6SThomas Gleixner
55365fd4cb6SThomas Gleixner3.4. Nested virtual machines
55465fd4cb6SThomas Gleixner""""""""""""""""""""""""""""
55565fd4cb6SThomas Gleixner
55665fd4cb6SThomas GleixnerWhen nested virtualization is in use, three operating systems are involved:
55765fd4cb6SThomas Gleixnerthe bare metal hypervisor, the nested hypervisor and the nested virtual
55865fd4cb6SThomas Gleixnermachine.  VMENTER operations from the nested hypervisor into the nested
55965fd4cb6SThomas Gleixnerguest will always be processed by the bare metal hypervisor. If KVM is the
56065fd4cb6SThomas Gleixnerbare metal hypervisor it will:
56165fd4cb6SThomas Gleixner
56265fd4cb6SThomas Gleixner - Flush the L1D cache on every switch from the nested hypervisor to the
56365fd4cb6SThomas Gleixner   nested virtual machine, so that the nested hypervisor's secrets are not
56465fd4cb6SThomas Gleixner   exposed to the nested virtual machine;
56565fd4cb6SThomas Gleixner
56665fd4cb6SThomas Gleixner - Flush the L1D cache on every switch from the nested virtual machine to
56765fd4cb6SThomas Gleixner   the nested hypervisor; this is a complex operation, and flushing the L1D
56865fd4cb6SThomas Gleixner   cache avoids that the bare metal hypervisor's secrets are exposed to the
56965fd4cb6SThomas Gleixner   nested virtual machine;
57065fd4cb6SThomas Gleixner
57165fd4cb6SThomas Gleixner - Instruct the nested hypervisor to not perform any L1D cache flush. This
57265fd4cb6SThomas Gleixner   is an optimization to avoid double L1D flushing.
57365fd4cb6SThomas Gleixner
57465fd4cb6SThomas Gleixner
57565fd4cb6SThomas Gleixner.. _default_mitigations:
57665fd4cb6SThomas Gleixner
57765fd4cb6SThomas GleixnerDefault mitigations
57865fd4cb6SThomas Gleixner-------------------
57965fd4cb6SThomas Gleixner
58065fd4cb6SThomas Gleixner  The kernel default mitigations for vulnerable processors are:
58165fd4cb6SThomas Gleixner
58265fd4cb6SThomas Gleixner  - PTE inversion to protect against malicious user space. This is done
58365fd4cb6SThomas Gleixner    unconditionally and cannot be controlled. The swap storage is limited
58465fd4cb6SThomas Gleixner    to ~16TB.
58565fd4cb6SThomas Gleixner
58665fd4cb6SThomas Gleixner  - L1D conditional flushing on VMENTER when EPT is enabled for
58765fd4cb6SThomas Gleixner    a guest.
58865fd4cb6SThomas Gleixner
58965fd4cb6SThomas Gleixner  The kernel does not by default enforce the disabling of SMT, which leaves
59065fd4cb6SThomas Gleixner  SMT systems vulnerable when running untrusted guests with EPT enabled.
59165fd4cb6SThomas Gleixner
59265fd4cb6SThomas Gleixner  The rationale for this choice is:
59365fd4cb6SThomas Gleixner
59465fd4cb6SThomas Gleixner  - Force disabling SMT can break existing setups, especially with
59565fd4cb6SThomas Gleixner    unattended updates.
59665fd4cb6SThomas Gleixner
59765fd4cb6SThomas Gleixner  - If regular users run untrusted guests on their machine, then L1TF is
59865fd4cb6SThomas Gleixner    just an add on to other malware which might be embedded in an untrusted
59965fd4cb6SThomas Gleixner    guest, e.g. spam-bots or attacks on the local network.
60065fd4cb6SThomas Gleixner
60165fd4cb6SThomas Gleixner    There is no technical way to prevent a user from running untrusted code
60265fd4cb6SThomas Gleixner    on their machines blindly.
60365fd4cb6SThomas Gleixner
60465fd4cb6SThomas Gleixner  - It's technically extremely unlikely and from today's knowledge even
60565fd4cb6SThomas Gleixner    impossible that L1TF can be exploited via the most popular attack
60665fd4cb6SThomas Gleixner    mechanisms like JavaScript because these mechanisms have no way to
60765fd4cb6SThomas Gleixner    control PTEs. If this would be possible and not other mitigation would
60865fd4cb6SThomas Gleixner    be possible, then the default might be different.
60965fd4cb6SThomas Gleixner
61065fd4cb6SThomas Gleixner  - The administrators of cloud and hosting setups have to carefully
61165fd4cb6SThomas Gleixner    analyze the risk for their scenarios and make the appropriate
61265fd4cb6SThomas Gleixner    mitigation choices, which might even vary across their deployed
61365fd4cb6SThomas Gleixner    machines and also result in other changes of their overall setup.
61465fd4cb6SThomas Gleixner    There is no way for the kernel to provide a sensible default for this
61565fd4cb6SThomas Gleixner    kind of scenarios.
616