1bf15d79cSAndra Paraschiv.. SPDX-License-Identifier: GPL-2.0 2bf15d79cSAndra Paraschiv 3bf15d79cSAndra Paraschiv============== 4bf15d79cSAndra ParaschivNitro Enclaves 5bf15d79cSAndra Paraschiv============== 6bf15d79cSAndra Paraschiv 7bf15d79cSAndra ParaschivOverview 8bf15d79cSAndra Paraschiv======== 9bf15d79cSAndra Paraschiv 10bf15d79cSAndra ParaschivNitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability 11bf15d79cSAndra Paraschivthat allows customers to carve out isolated compute environments within EC2 12bf15d79cSAndra Paraschivinstances [1]. 13bf15d79cSAndra Paraschiv 14bf15d79cSAndra ParaschivFor example, an application that processes sensitive data and runs in a VM, 15bf15d79cSAndra Paraschivcan be separated from other applications running in the same VM. This 16bf15d79cSAndra Paraschivapplication then runs in a separate VM than the primary VM, namely an enclave. 17bf15d79cSAndra Paraschiv 18bf15d79cSAndra ParaschivAn enclave runs alongside the VM that spawned it. This setup matches low latency 19bf15d79cSAndra Paraschivapplications needs. The resources that are allocated for the enclave, such as 20bf15d79cSAndra Paraschivmemory and CPUs, are carved out of the primary VM. Each enclave is mapped to a 21bf15d79cSAndra Paraschivprocess running in the primary VM, that communicates with the NE driver via an 22bf15d79cSAndra Paraschivioctl interface. 23bf15d79cSAndra Paraschiv 24bf15d79cSAndra ParaschivIn this sense, there are two components: 25bf15d79cSAndra Paraschiv 26bf15d79cSAndra Paraschiv1. An enclave abstraction process - a user space process running in the primary 27bf15d79cSAndra ParaschivVM guest that uses the provided ioctl interface of the NE driver to spawn an 28bf15d79cSAndra Paraschivenclave VM (that's 2 below). 29bf15d79cSAndra Paraschiv 30bf15d79cSAndra ParaschivThere is a NE emulated PCI device exposed to the primary VM. The driver for this 31bf15d79cSAndra Paraschivnew PCI device is included in the NE driver. 32bf15d79cSAndra Paraschiv 33bf15d79cSAndra ParaschivThe ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl 34bf15d79cSAndra Paraschivmaps to an enclave start PCI command. The PCI device commands are then 35bf15d79cSAndra Paraschivtranslated into actions taken on the hypervisor side; that's the Nitro 36bf15d79cSAndra Paraschivhypervisor running on the host where the primary VM is running. The Nitro 37bf15d79cSAndra Paraschivhypervisor is based on core KVM technology. 38bf15d79cSAndra Paraschiv 39bf15d79cSAndra Paraschiv2. The enclave itself - a VM running on the same host as the primary VM that 40bf15d79cSAndra Paraschivspawned it. Memory and CPUs are carved out of the primary VM and are dedicated 41bf15d79cSAndra Paraschivfor the enclave VM. An enclave does not have persistent storage attached. 42bf15d79cSAndra Paraschiv 43bf15d79cSAndra ParaschivThe memory regions carved out of the primary VM and given to an enclave need to 44bf15d79cSAndra Paraschivbe aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of 45bf15d79cSAndra Paraschivthis size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from 46bf15d79cSAndra Paraschivuser space [2][3]. The memory size for an enclave needs to be at least 64 MiB. 47bf15d79cSAndra ParaschivThe enclave memory and CPUs need to be from the same NUMA node. 48bf15d79cSAndra Paraschiv 49bf15d79cSAndra ParaschivAn enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain 50bf15d79cSAndra Paraschivavailable for the primary VM. A CPU pool has to be set for NE purposes by an 51bf15d79cSAndra Paraschivuser with admin capability. See the cpu list section from the kernel 52bf15d79cSAndra Paraschivdocumentation [4] for how a CPU pool format looks. 53bf15d79cSAndra Paraschiv 54bf15d79cSAndra ParaschivAn enclave communicates with the primary VM via a local communication channel, 55bf15d79cSAndra Paraschivusing virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device, 56bf15d79cSAndra Paraschivwhile the enclave VM has a virtio-mmio vsock emulated device. The vsock device 57bf15d79cSAndra Paraschivuses eventfd for signaling. The enclave VM sees the usual interfaces - local 58bf15d79cSAndra ParaschivAPIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio 59bf15d79cSAndra Paraschivdevice is placed in memory below the typical 4 GiB. 60bf15d79cSAndra Paraschiv 61bf15d79cSAndra ParaschivThe application that runs in the enclave needs to be packaged in an enclave 62bf15d79cSAndra Paraschivimage together with the OS ( e.g. kernel, ramdisk, init ) that will run in the 63bf15d79cSAndra Paraschivenclave VM. The enclave VM has its own kernel and follows the standard Linux 64bf15d79cSAndra Paraschivboot protocol [6]. 65bf15d79cSAndra Paraschiv 66bf15d79cSAndra ParaschivThe kernel bzImage, the kernel command line, the ramdisk(s) are part of the 67bf15d79cSAndra ParaschivEnclave Image Format (EIF); plus an EIF header including metadata such as magic 68bf15d79cSAndra Paraschivnumber, eif version, image size and CRC. 69bf15d79cSAndra Paraschiv 70bf15d79cSAndra ParaschivHash values are computed for the entire enclave image (EIF), the kernel and 71bf15d79cSAndra Paraschivramdisk(s). That's used, for example, to check that the enclave image that is 72bf15d79cSAndra Paraschivloaded in the enclave VM is the one that was intended to be run. 73bf15d79cSAndra Paraschiv 74bf15d79cSAndra ParaschivThese crypto measurements are included in a signed attestation document 75bf15d79cSAndra Paraschivgenerated by the Nitro Hypervisor and further used to prove the identity of the 76bf15d79cSAndra Paraschivenclave; KMS is an example of service that NE is integrated with and that checks 77bf15d79cSAndra Paraschivthe attestation doc. 78bf15d79cSAndra Paraschiv 79bf15d79cSAndra ParaschivThe enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The 80bf15d79cSAndra Paraschivinit process in the enclave connects to the vsock CID of the primary VM and a 81bf15d79cSAndra Paraschivpredefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is 82bf15d79cSAndra Paraschivused to check in the primary VM that the enclave has booted. The CID of the 83bf15d79cSAndra Paraschivprimary VM is 3. 84bf15d79cSAndra Paraschiv 85bf15d79cSAndra ParaschivIf the enclave VM crashes or gracefully exits, an interrupt event is received by 86bf15d79cSAndra Paraschivthe NE driver. This event is sent further to the user space enclave process 87bf15d79cSAndra Paraschivrunning in the primary VM via a poll notification mechanism. Then the user space 88bf15d79cSAndra Paraschivenclave process can exit. 89bf15d79cSAndra Paraschiv 90bf15d79cSAndra Paraschiv[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/ 91bf15d79cSAndra Paraschiv[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html 92bf15d79cSAndra Paraschiv[3] https://lwn.net/Articles/807108/ 93bf15d79cSAndra Paraschiv[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 94bf15d79cSAndra Paraschiv[5] https://man7.org/linux/man-pages/man7/vsock.7.html 95bf15d79cSAndra Paraschiv[6] https://www.kernel.org/doc/html/latest/x86/boot.html 96