1bf15d79cSAndra Paraschiv.. SPDX-License-Identifier: GPL-2.0
2bf15d79cSAndra Paraschiv
3bf15d79cSAndra Paraschiv==============
4bf15d79cSAndra ParaschivNitro Enclaves
5bf15d79cSAndra Paraschiv==============
6bf15d79cSAndra Paraschiv
7bf15d79cSAndra ParaschivOverview
8bf15d79cSAndra Paraschiv========
9bf15d79cSAndra Paraschiv
10bf15d79cSAndra ParaschivNitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
11bf15d79cSAndra Paraschivthat allows customers to carve out isolated compute environments within EC2
12bf15d79cSAndra Paraschivinstances [1].
13bf15d79cSAndra Paraschiv
14bf15d79cSAndra ParaschivFor example, an application that processes sensitive data and runs in a VM,
15bf15d79cSAndra Paraschivcan be separated from other applications running in the same VM. This
16bf15d79cSAndra Paraschivapplication then runs in a separate VM than the primary VM, namely an enclave.
17*cfa3c18cSAndra ParaschivIt runs alongside the VM that spawned it. This setup matches low latency
18*cfa3c18cSAndra Paraschivapplications needs.
19bf15d79cSAndra Paraschiv
20*cfa3c18cSAndra ParaschivThe current supported architectures for the NE kernel driver, available in the
21*cfa3c18cSAndra Paraschivupstream Linux kernel, are x86 and ARM64.
22*cfa3c18cSAndra Paraschiv
23*cfa3c18cSAndra ParaschivThe resources that are allocated for the enclave, such as memory and CPUs, are
24*cfa3c18cSAndra Paraschivcarved out of the primary VM. Each enclave is mapped to a process running in the
25*cfa3c18cSAndra Paraschivprimary VM, that communicates with the NE kernel driver via an ioctl interface.
26bf15d79cSAndra Paraschiv
27bf15d79cSAndra ParaschivIn this sense, there are two components:
28bf15d79cSAndra Paraschiv
29bf15d79cSAndra Paraschiv1. An enclave abstraction process - a user space process running in the primary
30bf15d79cSAndra ParaschivVM guest that uses the provided ioctl interface of the NE driver to spawn an
31bf15d79cSAndra Paraschivenclave VM (that's 2 below).
32bf15d79cSAndra Paraschiv
33bf15d79cSAndra ParaschivThere is a NE emulated PCI device exposed to the primary VM. The driver for this
34bf15d79cSAndra Paraschivnew PCI device is included in the NE driver.
35bf15d79cSAndra Paraschiv
36bf15d79cSAndra ParaschivThe ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
37bf15d79cSAndra Paraschivmaps to an enclave start PCI command. The PCI device commands are then
38bf15d79cSAndra Paraschivtranslated into  actions taken on the hypervisor side; that's the Nitro
39bf15d79cSAndra Paraschivhypervisor running on the host where the primary VM is running. The Nitro
40bf15d79cSAndra Paraschivhypervisor is based on core KVM technology.
41bf15d79cSAndra Paraschiv
42bf15d79cSAndra Paraschiv2. The enclave itself - a VM running on the same host as the primary VM that
43bf15d79cSAndra Paraschivspawned it. Memory and CPUs are carved out of the primary VM and are dedicated
44bf15d79cSAndra Paraschivfor the enclave VM. An enclave does not have persistent storage attached.
45bf15d79cSAndra Paraschiv
46bf15d79cSAndra ParaschivThe memory regions carved out of the primary VM and given to an enclave need to
47bf15d79cSAndra Paraschivbe aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
48bf15d79cSAndra Paraschivthis size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
49*cfa3c18cSAndra Paraschivuser space [2][3][7]. The memory size for an enclave needs to be at least
50*cfa3c18cSAndra Paraschiv64 MiB. The enclave memory and CPUs need to be from the same NUMA node.
51bf15d79cSAndra Paraschiv
52bf15d79cSAndra ParaschivAn enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
53bf15d79cSAndra Paraschivavailable for the primary VM. A CPU pool has to be set for NE purposes by an
54bf15d79cSAndra Paraschivuser with admin capability. See the cpu list section from the kernel
55bf15d79cSAndra Paraschivdocumentation [4] for how a CPU pool format looks.
56bf15d79cSAndra Paraschiv
57bf15d79cSAndra ParaschivAn enclave communicates with the primary VM via a local communication channel,
58bf15d79cSAndra Paraschivusing virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
59bf15d79cSAndra Paraschivwhile the enclave VM has a virtio-mmio vsock emulated device. The vsock device
60bf15d79cSAndra Paraschivuses eventfd for signaling. The enclave VM sees the usual interfaces - local
61bf15d79cSAndra ParaschivAPIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
62bf15d79cSAndra Paraschivdevice is placed in memory below the typical 4 GiB.
63bf15d79cSAndra Paraschiv
64bf15d79cSAndra ParaschivThe application that runs in the enclave needs to be packaged in an enclave
65bf15d79cSAndra Paraschivimage together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
66bf15d79cSAndra Paraschivenclave VM. The enclave VM has its own kernel and follows the standard Linux
67*cfa3c18cSAndra Paraschivboot protocol [6][8].
68bf15d79cSAndra Paraschiv
69bf15d79cSAndra ParaschivThe kernel bzImage, the kernel command line, the ramdisk(s) are part of the
70bf15d79cSAndra ParaschivEnclave Image Format (EIF); plus an EIF header including metadata such as magic
71bf15d79cSAndra Paraschivnumber, eif version, image size and CRC.
72bf15d79cSAndra Paraschiv
73bf15d79cSAndra ParaschivHash values are computed for the entire enclave image (EIF), the kernel and
74bf15d79cSAndra Paraschivramdisk(s). That's used, for example, to check that the enclave image that is
75bf15d79cSAndra Paraschivloaded in the enclave VM is the one that was intended to be run.
76bf15d79cSAndra Paraschiv
77bf15d79cSAndra ParaschivThese crypto measurements are included in a signed attestation document
78bf15d79cSAndra Paraschivgenerated by the Nitro Hypervisor and further used to prove the identity of the
79bf15d79cSAndra Paraschivenclave; KMS is an example of service that NE is integrated with and that checks
80bf15d79cSAndra Paraschivthe attestation doc.
81bf15d79cSAndra Paraschiv
82bf15d79cSAndra ParaschivThe enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
83bf15d79cSAndra Paraschivinit process in the enclave connects to the vsock CID of the primary VM and a
84bf15d79cSAndra Paraschivpredefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
85bf15d79cSAndra Paraschivused to check in the primary VM that the enclave has booted. The CID of the
86bf15d79cSAndra Paraschivprimary VM is 3.
87bf15d79cSAndra Paraschiv
88bf15d79cSAndra ParaschivIf the enclave VM crashes or gracefully exits, an interrupt event is received by
89bf15d79cSAndra Paraschivthe NE driver. This event is sent further to the user space enclave process
90bf15d79cSAndra Paraschivrunning in the primary VM via a poll notification mechanism. Then the user space
91bf15d79cSAndra Paraschivenclave process can exit.
92bf15d79cSAndra Paraschiv
93bf15d79cSAndra Paraschiv[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
94bf15d79cSAndra Paraschiv[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
95bf15d79cSAndra Paraschiv[3] https://lwn.net/Articles/807108/
96bf15d79cSAndra Paraschiv[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
97bf15d79cSAndra Paraschiv[5] https://man7.org/linux/man-pages/man7/vsock.7.html
98bf15d79cSAndra Paraschiv[6] https://www.kernel.org/doc/html/latest/x86/boot.html
99*cfa3c18cSAndra Paraschiv[7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html
100*cfa3c18cSAndra Paraschiv[8] https://www.kernel.org/doc/html/latest/arm64/booting.html
101