1# eMMC Storage Design 2 3Author: Adriana Kobylak < anoo! > 4 5Other contributors: Joel Stanley < shenki! >, Milton Miller 6 7Created: 2019-06-20 8 9## Problem Description 10 11Proposal to define an initial storage design for an eMMC device. This includes 12filesystem type, partitioning, volume management, boot options and 13initialization, etc. 14 15## Background and References 16 17OpenBMC currently supports raw flash such as the SPI NOR found in the systems 18based on AST2400 and AST2500, but there is no design for managed NAND. 19 20## Requirements 21 22- Security: Ability to enforce read-only, verification of official/signed images 23 for production. 24 25- Updatable: Ensure that the filesystem design allows for an effective and 26 simple update mechanism to be implemented. 27 28- Simplicity: Make the system easy to understand, so that it is easy to develop, 29 test, use, and recover. 30 31- Code reuse: Try to use something that already exists instead of re-inventing 32 the wheel. 33 34## Proposed Design 35 36- The eMMC image layout and characteristics are specified in a meta layer. This 37 allows OpenBMC to support different layouts and configurations. The tarball to 38 perform a code update is still built by image_types_phosphor, so a separate 39 IMAGE_TYPES would need to be created to support a different filesystem type. 40 41- Code update: Support two versions on flash. This allows a known good image to 42 be retained and a new image to be validated. 43 44- GPT partitioning for the eMMC User Data Area: This is chosen over dynamic 45 partitioning due to the lack of offline tools to build an LVM image (see 46 Logical Volumes in the Alternatives section below). 47 48- Initramfs: An initramfs is needed to run sgdisk on first boot to move the 49 secondary GPT to the end of the device where it belongs, since the yocto wic 50 tool does not currently support building an image of a specified size and 51 therefore the generated image may not be exactly the size of the device that 52 is flashed into. 53 54- Read-only and read-write filesystem: ext4. This is a stable and widely used 55 filesystem for eMMC. 56 57- Filesystem layout: The root filesystem is hosted in a read-only volume. The 58 /var directory is mounted in a read-write volume that persists through code 59 updates. The /home directory needs to be writable to store user data such as 60 ssh keys, so it is a bind mount to a directory in the read-write volume. A 61 bind mount is more reliable than an overlay, and has been around longer. Since 62 there are no contents delivered by the image in the /home directory, a bind 63 mount can be used. On the other hand, the /etc directory has content delivered 64 by the image, so it is an overlayfs to have the ability to restore its 65 configuration content on a factory reset. 66 67 +------------------+ +-----------------------------+ 68 | Read-only volume | | Read-write volume | 69 |------------------| |-----------------------------| 70 | | | | 71 | / (rootfs) | | /var | 72 | | | | 73 | /etc +------------->/var/etc-work/ (overlayfs) | 74 | | | | 75 | /home +------------->/var/home-work/ (bind mount)| 76 | | | | 77 | | | | 78 +------------------+ +-----------------------------+ 79 80- Provisioning: OpenBMC will produce as a build artifact a flashable eMMC image 81 as it currently does for NOR chips. 82 83## Alternatives Considered 84 85- Store U-Boot and the Linux kernel in a separate SPI NOR flash device, since 86 SOCs such as the AST2500 do not support executing U-Boot from an eMMC. In 87 addition, having the Linux kernel on the NOR saves from requiring U-Boot 88 support for the eMMC. The U-Boot and kernel are less than 10MB in size, so a 89 fairly small chip such as a 32MB one would suffice. Therefore, in order to 90 support two firmware versions, the kernel for each version would need to be 91 stored in the NOR. A second NOR device could be added as redundancy in case 92 U-Boot or the kernel failed to run. 93 94 Format the NOR as it is currently done for a system that supports UBI: a fixed 95 MTD partition for U-Boot, one for its environment, and a UBI volume spanning 96 the remaining of the flash. Store the dual kernel volumes in the UBI 97 partition. This approach allows the re-use of the existing code update 98 interfaces, since the static approach does not currently support storing two 99 kernel images. Selection of the desired kernel image would be done with the 100 existing U-Boot environment approach. 101 102 Static MTD partitions could be created to store the kernel images, but 103 additional work would be required to introduce a new method to select the 104 desired kernel image, because the static layout does not currently have dual 105 image support. 106 107 The AST2600 supports executing U-Boot from the eMMC, so that provides the 108 flexibility of just having the eMMC chip on a system, or still have U-Boot in 109 a separate chip for recovery in cases where the eMMC goes bad. 110 111- Filesystem: f2fs (Flash-Friendly File System). The f2fs is an up-and-coming 112 filesystem, and therefore it may be seen as less mature and stable than the 113 ext4 filesystem, although it is unknown how any of the two would perform in an 114 OpenBMC environment. 115 116 A suitable alternative would be btrfs, which has checksums for both metadata 117 and data in the filesystem, and therefore provides stronger guarantees on the 118 data integrity. 119 120- All Code update artifacts combined into a single image. 121 122 This provides simple code maintenance where an image is intact or not, and 123 works or not, with no additional fragments lying around. U-Boot has one choice 124 to make - which image to load, and one piece of information to forward to the 125 kernel. 126 127 To reduce boot time by limiting IO reading unneeded sectors into memory, a 128 small FS is placed at the beginning of the partition to contain any artifacts 129 that must be accessed by U-Boot. 130 131 This file system will be selected from ext2, FAT12, and cramfs, as these are 132 all supported in both the Linux kernel and U-Boot. (If we desire the U-Boot 133 environment to be per-side, then choose one of ext2 or FAT12 (squashfs support 134 has not been merged, it was last updated in 2018 -- two years ago). 135 136- No initramfs: It may be possible to boot the rootfs by passing the UUID of the 137 logical volume to the kernel, although a [pre-init script][] will likely still 138 be needed. Therefore, having an initramfs would offer a more standard 139 implementation for initialization. 140 141- FAT MBR partitioning: FAT is a simple and well understood partition table 142 format. There is space for 4 independent partitions. Alternatively one slot 143 can be chained into extended partitions, but each partition in the chan 144 depends on the prior partition. Four partitions may be sufficient to meet the 145 initial demand for a shared (single) boot filesystem design (boot, rofs-a, 146 rofs-b, and read-write). Additional partitions would be needed for a dual boot 147 volume design. 148 149 If common space is needed for the U-Boot environment, is is redundantly stored 150 as file in partition 1. The U-Boot SPL will be located here. If this is not 151 needed, partition 1 can remain unallocated. 152 153 The two code sides are created in slots 2 and 3. 154 155 The read-write filesystem occupies partition 4. 156 157 If in the future there is demand for additional partitions, partition can be 158 moved into an extended partition in a future code update. 159 160- Device Mapper: The eMMC is divided using the device-mapper linear target, 161 which allows for the expansion of devices if necessary without having to 162 physically repartition since the device-mapper devices expose logical blocks. 163 This is achieved by changing the device-mapper configuration table entries 164 provided to the kernel to append unused physical blocks. 165 166- Logical Volumes: 167 168 - Volume management: LVM. This allows for dynamic partition/removal, similar 169 to the current UBI implementation. LVM support increases the size of the 170 kernel by ~100kB, but the increase in size is worth the ability of being 171 able to resize the partition if needed. In addition, UBI volume management 172 works in a similar way, so it would not be complex to implement LVM 173 management in the code update application. 174 175 - Partitioning: If the eMMC is used to store the boot loader, a ext4 (or vfat) 176 partition would hold the FIT image containing the kernel, initrd and device 177 tree. This volume would be mounted as /boot. This allows U-Boot to load the 178 kernel since it doesn't have support for LVM. After the boot partition, 179 assign the remaining eMMC flash as a single physical volume containing 180 logical volumes, instead of fixed-size partitions. This provides flexibility 181 for cases where the contents of a partition outgrow a fixed size. This also 182 means that other firmware images, such as BIOS and PSU, can be stored in 183 volumes in the single eMMC device. 184 185 - Initramfs: Use an initramfs, which is the default in OpenBMC, to boot the 186 rootfs from a logical volume. An initramfs allows for flexibility if 187 additional boot actions are needed, such as mounting overlays. It also 188 provides a point of departure (environment) to provision and format the eMMC 189 volume(s). To boot the rootfs, the initramfs would search for the desired 190 rootfs volume to be mounted, instead of using the U-Boot environments. 191 192 - Mount points: For firmware images such as BIOS that currently reside in 193 separate SPI NOR modules, the logical volume in the eMMC would be mounted in 194 the same paths as to prevent changes to the applications that rely on the 195 location of that data. 196 197 - Provisioning: Since the LVM userspace tools don't offer an offline mode, 198 it's not straightforward to assemble an LVM disk image from a bitbake task. 199 Therefore, have the initramfs create the LVM volume and fetch the rootfs 200 file into tmpfs from an external source to flash the volume. The rootfs file 201 can be fetched using DHCP, UART, USB key, etc. An alternative option include 202 to build the image from QEMU, this would require booting QEMU as part of the 203 build process to setup the LVM volume and create the image file. 204 205## Impacts 206 207This design would impact the OpenBMC build process and code update internal 208implementations but should not affect the external interfaces. 209 210- openbmc/linux: Kernel changes to support the eMMC chip and its filesystem. 211- openbmc/openbmc: Changes to create an eMMC image. 212- openbmc/openpower-pnor-code-mgmt: Changes to support updating the new 213 filesystem. 214- openbmc/phosphor-bmc-code-mgmt: Changes to support updating the new 215 filesystem. 216 217## Testing 218 219Verify OpenBMC functionality in a system containing an eMMC. This system could 220be added to the CI pool. 221 222[pre-init script]: 223 https://github.com/openbmc/openbmc/blob/master/meta-phosphor/recipes-phosphor/preinit-mounts/preinit-mounts/init 224