1bf15d79cSAndra Paraschiv.. SPDX-License-Identifier: GPL-2.0 2bf15d79cSAndra Paraschiv 3bf15d79cSAndra Paraschiv============== 4bf15d79cSAndra ParaschivNitro Enclaves 5bf15d79cSAndra Paraschiv============== 6bf15d79cSAndra Paraschiv 7bf15d79cSAndra ParaschivOverview 8bf15d79cSAndra Paraschiv======== 9bf15d79cSAndra Paraschiv 10bf15d79cSAndra ParaschivNitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability 11bf15d79cSAndra Paraschivthat allows customers to carve out isolated compute environments within EC2 12bf15d79cSAndra Paraschivinstances [1]. 13bf15d79cSAndra Paraschiv 14bf15d79cSAndra ParaschivFor example, an application that processes sensitive data and runs in a VM, 15bf15d79cSAndra Paraschivcan be separated from other applications running in the same VM. This 16bf15d79cSAndra Paraschivapplication then runs in a separate VM than the primary VM, namely an enclave. 17*cfa3c18cSAndra ParaschivIt runs alongside the VM that spawned it. This setup matches low latency 18*cfa3c18cSAndra Paraschivapplications needs. 19bf15d79cSAndra Paraschiv 20*cfa3c18cSAndra ParaschivThe current supported architectures for the NE kernel driver, available in the 21*cfa3c18cSAndra Paraschivupstream Linux kernel, are x86 and ARM64. 22*cfa3c18cSAndra Paraschiv 23*cfa3c18cSAndra ParaschivThe resources that are allocated for the enclave, such as memory and CPUs, are 24*cfa3c18cSAndra Paraschivcarved out of the primary VM. Each enclave is mapped to a process running in the 25*cfa3c18cSAndra Paraschivprimary VM, that communicates with the NE kernel driver via an ioctl interface. 26bf15d79cSAndra Paraschiv 27bf15d79cSAndra ParaschivIn this sense, there are two components: 28bf15d79cSAndra Paraschiv 29bf15d79cSAndra Paraschiv1. An enclave abstraction process - a user space process running in the primary 30bf15d79cSAndra ParaschivVM guest that uses the provided ioctl interface of the NE driver to spawn an 31bf15d79cSAndra Paraschivenclave VM (that's 2 below). 32bf15d79cSAndra Paraschiv 33bf15d79cSAndra ParaschivThere is a NE emulated PCI device exposed to the primary VM. The driver for this 34bf15d79cSAndra Paraschivnew PCI device is included in the NE driver. 35bf15d79cSAndra Paraschiv 36bf15d79cSAndra ParaschivThe ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl 37bf15d79cSAndra Paraschivmaps to an enclave start PCI command. The PCI device commands are then 38bf15d79cSAndra Paraschivtranslated into actions taken on the hypervisor side; that's the Nitro 39bf15d79cSAndra Paraschivhypervisor running on the host where the primary VM is running. The Nitro 40bf15d79cSAndra Paraschivhypervisor is based on core KVM technology. 41bf15d79cSAndra Paraschiv 42bf15d79cSAndra Paraschiv2. The enclave itself - a VM running on the same host as the primary VM that 43bf15d79cSAndra Paraschivspawned it. Memory and CPUs are carved out of the primary VM and are dedicated 44bf15d79cSAndra Paraschivfor the enclave VM. An enclave does not have persistent storage attached. 45bf15d79cSAndra Paraschiv 46bf15d79cSAndra ParaschivThe memory regions carved out of the primary VM and given to an enclave need to 47bf15d79cSAndra Paraschivbe aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of 48bf15d79cSAndra Paraschivthis size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from 49*cfa3c18cSAndra Paraschivuser space [2][3][7]. The memory size for an enclave needs to be at least 50*cfa3c18cSAndra Paraschiv64 MiB. The enclave memory and CPUs need to be from the same NUMA node. 51bf15d79cSAndra Paraschiv 52bf15d79cSAndra ParaschivAn enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain 53bf15d79cSAndra Paraschivavailable for the primary VM. A CPU pool has to be set for NE purposes by an 54bf15d79cSAndra Paraschivuser with admin capability. See the cpu list section from the kernel 55bf15d79cSAndra Paraschivdocumentation [4] for how a CPU pool format looks. 56bf15d79cSAndra Paraschiv 57bf15d79cSAndra ParaschivAn enclave communicates with the primary VM via a local communication channel, 58bf15d79cSAndra Paraschivusing virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device, 59bf15d79cSAndra Paraschivwhile the enclave VM has a virtio-mmio vsock emulated device. The vsock device 60bf15d79cSAndra Paraschivuses eventfd for signaling. The enclave VM sees the usual interfaces - local 61bf15d79cSAndra ParaschivAPIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio 62bf15d79cSAndra Paraschivdevice is placed in memory below the typical 4 GiB. 63bf15d79cSAndra Paraschiv 64bf15d79cSAndra ParaschivThe application that runs in the enclave needs to be packaged in an enclave 65bf15d79cSAndra Paraschivimage together with the OS ( e.g. kernel, ramdisk, init ) that will run in the 66bf15d79cSAndra Paraschivenclave VM. The enclave VM has its own kernel and follows the standard Linux 67*cfa3c18cSAndra Paraschivboot protocol [6][8]. 68bf15d79cSAndra Paraschiv 69bf15d79cSAndra ParaschivThe kernel bzImage, the kernel command line, the ramdisk(s) are part of the 70bf15d79cSAndra ParaschivEnclave Image Format (EIF); plus an EIF header including metadata such as magic 71bf15d79cSAndra Paraschivnumber, eif version, image size and CRC. 72bf15d79cSAndra Paraschiv 73bf15d79cSAndra ParaschivHash values are computed for the entire enclave image (EIF), the kernel and 74bf15d79cSAndra Paraschivramdisk(s). That's used, for example, to check that the enclave image that is 75bf15d79cSAndra Paraschivloaded in the enclave VM is the one that was intended to be run. 76bf15d79cSAndra Paraschiv 77bf15d79cSAndra ParaschivThese crypto measurements are included in a signed attestation document 78bf15d79cSAndra Paraschivgenerated by the Nitro Hypervisor and further used to prove the identity of the 79bf15d79cSAndra Paraschivenclave; KMS is an example of service that NE is integrated with and that checks 80bf15d79cSAndra Paraschivthe attestation doc. 81bf15d79cSAndra Paraschiv 82bf15d79cSAndra ParaschivThe enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The 83bf15d79cSAndra Paraschivinit process in the enclave connects to the vsock CID of the primary VM and a 84bf15d79cSAndra Paraschivpredefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is 85bf15d79cSAndra Paraschivused to check in the primary VM that the enclave has booted. The CID of the 86bf15d79cSAndra Paraschivprimary VM is 3. 87bf15d79cSAndra Paraschiv 88bf15d79cSAndra ParaschivIf the enclave VM crashes or gracefully exits, an interrupt event is received by 89bf15d79cSAndra Paraschivthe NE driver. This event is sent further to the user space enclave process 90bf15d79cSAndra Paraschivrunning in the primary VM via a poll notification mechanism. Then the user space 91bf15d79cSAndra Paraschivenclave process can exit. 92bf15d79cSAndra Paraschiv 93bf15d79cSAndra Paraschiv[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/ 94bf15d79cSAndra Paraschiv[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html 95bf15d79cSAndra Paraschiv[3] https://lwn.net/Articles/807108/ 96bf15d79cSAndra Paraschiv[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 97bf15d79cSAndra Paraschiv[5] https://man7.org/linux/man-pages/man7/vsock.7.html 98bf15d79cSAndra Paraschiv[6] https://www.kernel.org/doc/html/latest/x86/boot.html 99*cfa3c18cSAndra Paraschiv[7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html 100*cfa3c18cSAndra Paraschiv[8] https://www.kernel.org/doc/html/latest/arm64/booting.html 101