xref: /openbmc/linux/Documentation/virt/ne_overview.rst (revision bf15d79ce142fe1d01eb88bdad96367a3887648c)
1*bf15d79cSAndra Paraschiv.. SPDX-License-Identifier: GPL-2.0
2*bf15d79cSAndra Paraschiv
3*bf15d79cSAndra Paraschiv==============
4*bf15d79cSAndra ParaschivNitro Enclaves
5*bf15d79cSAndra Paraschiv==============
6*bf15d79cSAndra Paraschiv
7*bf15d79cSAndra ParaschivOverview
8*bf15d79cSAndra Paraschiv========
9*bf15d79cSAndra Paraschiv
10*bf15d79cSAndra ParaschivNitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
11*bf15d79cSAndra Paraschivthat allows customers to carve out isolated compute environments within EC2
12*bf15d79cSAndra Paraschivinstances [1].
13*bf15d79cSAndra Paraschiv
14*bf15d79cSAndra ParaschivFor example, an application that processes sensitive data and runs in a VM,
15*bf15d79cSAndra Paraschivcan be separated from other applications running in the same VM. This
16*bf15d79cSAndra Paraschivapplication then runs in a separate VM than the primary VM, namely an enclave.
17*bf15d79cSAndra Paraschiv
18*bf15d79cSAndra ParaschivAn enclave runs alongside the VM that spawned it. This setup matches low latency
19*bf15d79cSAndra Paraschivapplications needs. The resources that are allocated for the enclave, such as
20*bf15d79cSAndra Paraschivmemory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
21*bf15d79cSAndra Paraschivprocess running in the primary VM, that communicates with the NE driver via an
22*bf15d79cSAndra Paraschivioctl interface.
23*bf15d79cSAndra Paraschiv
24*bf15d79cSAndra ParaschivIn this sense, there are two components:
25*bf15d79cSAndra Paraschiv
26*bf15d79cSAndra Paraschiv1. An enclave abstraction process - a user space process running in the primary
27*bf15d79cSAndra ParaschivVM guest that uses the provided ioctl interface of the NE driver to spawn an
28*bf15d79cSAndra Paraschivenclave VM (that's 2 below).
29*bf15d79cSAndra Paraschiv
30*bf15d79cSAndra ParaschivThere is a NE emulated PCI device exposed to the primary VM. The driver for this
31*bf15d79cSAndra Paraschivnew PCI device is included in the NE driver.
32*bf15d79cSAndra Paraschiv
33*bf15d79cSAndra ParaschivThe ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
34*bf15d79cSAndra Paraschivmaps to an enclave start PCI command. The PCI device commands are then
35*bf15d79cSAndra Paraschivtranslated into  actions taken on the hypervisor side; that's the Nitro
36*bf15d79cSAndra Paraschivhypervisor running on the host where the primary VM is running. The Nitro
37*bf15d79cSAndra Paraschivhypervisor is based on core KVM technology.
38*bf15d79cSAndra Paraschiv
39*bf15d79cSAndra Paraschiv2. The enclave itself - a VM running on the same host as the primary VM that
40*bf15d79cSAndra Paraschivspawned it. Memory and CPUs are carved out of the primary VM and are dedicated
41*bf15d79cSAndra Paraschivfor the enclave VM. An enclave does not have persistent storage attached.
42*bf15d79cSAndra Paraschiv
43*bf15d79cSAndra ParaschivThe memory regions carved out of the primary VM and given to an enclave need to
44*bf15d79cSAndra Paraschivbe aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
45*bf15d79cSAndra Paraschivthis size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
46*bf15d79cSAndra Paraschivuser space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
47*bf15d79cSAndra ParaschivThe enclave memory and CPUs need to be from the same NUMA node.
48*bf15d79cSAndra Paraschiv
49*bf15d79cSAndra ParaschivAn enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
50*bf15d79cSAndra Paraschivavailable for the primary VM. A CPU pool has to be set for NE purposes by an
51*bf15d79cSAndra Paraschivuser with admin capability. See the cpu list section from the kernel
52*bf15d79cSAndra Paraschivdocumentation [4] for how a CPU pool format looks.
53*bf15d79cSAndra Paraschiv
54*bf15d79cSAndra ParaschivAn enclave communicates with the primary VM via a local communication channel,
55*bf15d79cSAndra Paraschivusing virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
56*bf15d79cSAndra Paraschivwhile the enclave VM has a virtio-mmio vsock emulated device. The vsock device
57*bf15d79cSAndra Paraschivuses eventfd for signaling. The enclave VM sees the usual interfaces - local
58*bf15d79cSAndra ParaschivAPIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
59*bf15d79cSAndra Paraschivdevice is placed in memory below the typical 4 GiB.
60*bf15d79cSAndra Paraschiv
61*bf15d79cSAndra ParaschivThe application that runs in the enclave needs to be packaged in an enclave
62*bf15d79cSAndra Paraschivimage together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
63*bf15d79cSAndra Paraschivenclave VM. The enclave VM has its own kernel and follows the standard Linux
64*bf15d79cSAndra Paraschivboot protocol [6].
65*bf15d79cSAndra Paraschiv
66*bf15d79cSAndra ParaschivThe kernel bzImage, the kernel command line, the ramdisk(s) are part of the
67*bf15d79cSAndra ParaschivEnclave Image Format (EIF); plus an EIF header including metadata such as magic
68*bf15d79cSAndra Paraschivnumber, eif version, image size and CRC.
69*bf15d79cSAndra Paraschiv
70*bf15d79cSAndra ParaschivHash values are computed for the entire enclave image (EIF), the kernel and
71*bf15d79cSAndra Paraschivramdisk(s). That's used, for example, to check that the enclave image that is
72*bf15d79cSAndra Paraschivloaded in the enclave VM is the one that was intended to be run.
73*bf15d79cSAndra Paraschiv
74*bf15d79cSAndra ParaschivThese crypto measurements are included in a signed attestation document
75*bf15d79cSAndra Paraschivgenerated by the Nitro Hypervisor and further used to prove the identity of the
76*bf15d79cSAndra Paraschivenclave; KMS is an example of service that NE is integrated with and that checks
77*bf15d79cSAndra Paraschivthe attestation doc.
78*bf15d79cSAndra Paraschiv
79*bf15d79cSAndra ParaschivThe enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
80*bf15d79cSAndra Paraschivinit process in the enclave connects to the vsock CID of the primary VM and a
81*bf15d79cSAndra Paraschivpredefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
82*bf15d79cSAndra Paraschivused to check in the primary VM that the enclave has booted. The CID of the
83*bf15d79cSAndra Paraschivprimary VM is 3.
84*bf15d79cSAndra Paraschiv
85*bf15d79cSAndra ParaschivIf the enclave VM crashes or gracefully exits, an interrupt event is received by
86*bf15d79cSAndra Paraschivthe NE driver. This event is sent further to the user space enclave process
87*bf15d79cSAndra Paraschivrunning in the primary VM via a poll notification mechanism. Then the user space
88*bf15d79cSAndra Paraschivenclave process can exit.
89*bf15d79cSAndra Paraschiv
90*bf15d79cSAndra Paraschiv[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
91*bf15d79cSAndra Paraschiv[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
92*bf15d79cSAndra Paraschiv[3] https://lwn.net/Articles/807108/
93*bf15d79cSAndra Paraschiv[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
94*bf15d79cSAndra Paraschiv[5] https://man7.org/linux/man-pages/man7/vsock.7.html
95*bf15d79cSAndra Paraschiv[6] https://www.kernel.org/doc/html/latest/x86/boot.html
96