18979fc9aSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
28979fc9aSMauro Carvalho Chehab
38979fc9aSMauro Carvalho Chehab===========================
48979fc9aSMauro Carvalho ChehabRamfs, rootfs and initramfs
58979fc9aSMauro Carvalho Chehab===========================
68979fc9aSMauro Carvalho Chehab
78979fc9aSMauro Carvalho ChehabOctober 17, 2005
88979fc9aSMauro Carvalho Chehab
9*bd415b5cSRandy Dunlap:Author: Rob Landley <rob@landley.net>
108979fc9aSMauro Carvalho Chehab
118979fc9aSMauro Carvalho ChehabWhat is ramfs?
128979fc9aSMauro Carvalho Chehab--------------
138979fc9aSMauro Carvalho Chehab
148979fc9aSMauro Carvalho ChehabRamfs is a very simple filesystem that exports Linux's disk caching
158979fc9aSMauro Carvalho Chehabmechanisms (the page cache and dentry cache) as a dynamically resizable
168979fc9aSMauro Carvalho ChehabRAM-based filesystem.
178979fc9aSMauro Carvalho Chehab
188979fc9aSMauro Carvalho ChehabNormally all files are cached in memory by Linux.  Pages of data read from
198979fc9aSMauro Carvalho Chehabbacking store (usually the block device the filesystem is mounted on) are kept
208979fc9aSMauro Carvalho Chehabaround in case it's needed again, but marked as clean (freeable) in case the
218979fc9aSMauro Carvalho ChehabVirtual Memory system needs the memory for something else.  Similarly, data
228979fc9aSMauro Carvalho Chehabwritten to files is marked clean as soon as it has been written to backing
238979fc9aSMauro Carvalho Chehabstore, but kept around for caching purposes until the VM reallocates the
248979fc9aSMauro Carvalho Chehabmemory.  A similar mechanism (the dentry cache) greatly speeds up access to
258979fc9aSMauro Carvalho Chehabdirectories.
268979fc9aSMauro Carvalho Chehab
278979fc9aSMauro Carvalho ChehabWith ramfs, there is no backing store.  Files written into ramfs allocate
288979fc9aSMauro Carvalho Chehabdentries and page cache as usual, but there's nowhere to write them to.
298979fc9aSMauro Carvalho ChehabThis means the pages are never marked clean, so they can't be freed by the
308979fc9aSMauro Carvalho ChehabVM when it's looking to recycle memory.
318979fc9aSMauro Carvalho Chehab
328979fc9aSMauro Carvalho ChehabThe amount of code required to implement ramfs is tiny, because all the
338979fc9aSMauro Carvalho Chehabwork is done by the existing Linux caching infrastructure.  Basically,
348979fc9aSMauro Carvalho Chehabyou're mounting the disk cache as a filesystem.  Because of this, ramfs is not
358979fc9aSMauro Carvalho Chehaban optional component removable via menuconfig, since there would be negligible
368979fc9aSMauro Carvalho Chehabspace savings.
378979fc9aSMauro Carvalho Chehab
388979fc9aSMauro Carvalho Chehabramfs and ramdisk:
398979fc9aSMauro Carvalho Chehab------------------
408979fc9aSMauro Carvalho Chehab
418979fc9aSMauro Carvalho ChehabThe older "ram disk" mechanism created a synthetic block device out of
428979fc9aSMauro Carvalho Chehaban area of RAM and used it as backing store for a filesystem.  This block
438979fc9aSMauro Carvalho Chehabdevice was of fixed size, so the filesystem mounted on it was of fixed
448979fc9aSMauro Carvalho Chehabsize.  Using a ram disk also required unnecessarily copying memory from the
458979fc9aSMauro Carvalho Chehabfake block device into the page cache (and copying changes back out), as well
468979fc9aSMauro Carvalho Chehabas creating and destroying dentries.  Plus it needed a filesystem driver
478979fc9aSMauro Carvalho Chehab(such as ext2) to format and interpret this data.
488979fc9aSMauro Carvalho Chehab
498979fc9aSMauro Carvalho ChehabCompared to ramfs, this wastes memory (and memory bus bandwidth), creates
508979fc9aSMauro Carvalho Chehabunnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
518979fc9aSMauro Carvalho Chehabto avoid this copying by playing with the page tables, but they're unpleasantly
528979fc9aSMauro Carvalho Chehabcomplicated and turn out to be about as expensive as the copying anyway.)
538979fc9aSMauro Carvalho ChehabMore to the point, all the work ramfs is doing has to happen _anyway_,
548979fc9aSMauro Carvalho Chehabsince all file access goes through the page and dentry caches.  The RAM
558979fc9aSMauro Carvalho Chehabdisk is simply unnecessary; ramfs is internally much simpler.
568979fc9aSMauro Carvalho Chehab
578979fc9aSMauro Carvalho ChehabAnother reason ramdisks are semi-obsolete is that the introduction of
588979fc9aSMauro Carvalho Chehabloopback devices offered a more flexible and convenient way to create
598979fc9aSMauro Carvalho Chehabsynthetic block devices, now from files instead of from chunks of memory.
608979fc9aSMauro Carvalho ChehabSee losetup (8) for details.
618979fc9aSMauro Carvalho Chehab
628979fc9aSMauro Carvalho Chehabramfs and tmpfs:
638979fc9aSMauro Carvalho Chehab----------------
648979fc9aSMauro Carvalho Chehab
658979fc9aSMauro Carvalho ChehabOne downside of ramfs is you can keep writing data into it until you fill
668979fc9aSMauro Carvalho Chehabup all memory, and the VM can't free it because the VM thinks that files
678979fc9aSMauro Carvalho Chehabshould get written to backing store (rather than swap space), but ramfs hasn't
688979fc9aSMauro Carvalho Chehabgot any backing store.  Because of this, only root (or a trusted user) should
698979fc9aSMauro Carvalho Chehabbe allowed write access to a ramfs mount.
708979fc9aSMauro Carvalho Chehab
718979fc9aSMauro Carvalho ChehabA ramfs derivative called tmpfs was created to add size limits, and the ability
728979fc9aSMauro Carvalho Chehabto write the data to swap space.  Normal users can be allowed write access to
730c1bc6b8SMauro Carvalho Chehabtmpfs mounts.  See Documentation/filesystems/tmpfs.rst for more information.
748979fc9aSMauro Carvalho Chehab
758979fc9aSMauro Carvalho ChehabWhat is rootfs?
768979fc9aSMauro Carvalho Chehab---------------
778979fc9aSMauro Carvalho Chehab
788979fc9aSMauro Carvalho ChehabRootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
798979fc9aSMauro Carvalho Chehabalways present in 2.6 systems.  You can't unmount rootfs for approximately the
808979fc9aSMauro Carvalho Chehabsame reason you can't kill the init process; rather than having special code
818979fc9aSMauro Carvalho Chehabto check for and handle an empty list, it's smaller and simpler for the kernel
828979fc9aSMauro Carvalho Chehabto just make sure certain lists can't become empty.
838979fc9aSMauro Carvalho Chehab
848979fc9aSMauro Carvalho ChehabMost systems just mount another filesystem over rootfs and ignore it.  The
858979fc9aSMauro Carvalho Chehabamount of space an empty instance of ramfs takes up is tiny.
868979fc9aSMauro Carvalho Chehab
878979fc9aSMauro Carvalho ChehabIf CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
888979fc9aSMauro Carvalho Chehabdefault.  To force ramfs, add "rootfstype=ramfs" to the kernel command
898979fc9aSMauro Carvalho Chehabline.
908979fc9aSMauro Carvalho Chehab
918979fc9aSMauro Carvalho ChehabWhat is initramfs?
928979fc9aSMauro Carvalho Chehab------------------
938979fc9aSMauro Carvalho Chehab
948979fc9aSMauro Carvalho ChehabAll 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
958979fc9aSMauro Carvalho Chehabextracted into rootfs when the kernel boots up.  After extracting, the kernel
968979fc9aSMauro Carvalho Chehabchecks to see if rootfs contains a file "init", and if so it executes it as PID
978979fc9aSMauro Carvalho Chehab1.  If found, this init process is responsible for bringing the system the
988979fc9aSMauro Carvalho Chehabrest of the way up, including locating and mounting the real root device (if
998979fc9aSMauro Carvalho Chehabany).  If rootfs does not contain an init program after the embedded cpio
1008979fc9aSMauro Carvalho Chehabarchive is extracted into it, the kernel will fall through to the older code
1018979fc9aSMauro Carvalho Chehabto locate and mount a root partition, then exec some variant of /sbin/init
1028979fc9aSMauro Carvalho Chehabout of that.
1038979fc9aSMauro Carvalho Chehab
1048979fc9aSMauro Carvalho ChehabAll this differs from the old initrd in several ways:
1058979fc9aSMauro Carvalho Chehab
1068979fc9aSMauro Carvalho Chehab  - The old initrd was always a separate file, while the initramfs archive is
1078979fc9aSMauro Carvalho Chehab    linked into the linux kernel image.  (The directory ``linux-*/usr`` is
1088979fc9aSMauro Carvalho Chehab    devoted to generating this archive during the build.)
1098979fc9aSMauro Carvalho Chehab
1108979fc9aSMauro Carvalho Chehab  - The old initrd file was a gzipped filesystem image (in some file format,
1118979fc9aSMauro Carvalho Chehab    such as ext2, that needed a driver built into the kernel), while the new
1128979fc9aSMauro Carvalho Chehab    initramfs archive is a gzipped cpio archive (like tar only simpler,
1138979fc9aSMauro Carvalho Chehab    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).
1148979fc9aSMauro Carvalho Chehab    The kernel's cpio extraction code is not only extremely small, it's also
1158979fc9aSMauro Carvalho Chehab    __init text and data that can be discarded during the boot process.
1168979fc9aSMauro Carvalho Chehab
1178979fc9aSMauro Carvalho Chehab  - The program run by the old initrd (which was called /initrd, not /init) did
1188979fc9aSMauro Carvalho Chehab    some setup and then returned to the kernel, while the init program from
1198979fc9aSMauro Carvalho Chehab    initramfs is not expected to return to the kernel.  (If /init needs to hand
1208979fc9aSMauro Carvalho Chehab    off control it can overmount / with a new root device and exec another init
1218979fc9aSMauro Carvalho Chehab    program.  See the switch_root utility, below.)
1228979fc9aSMauro Carvalho Chehab
1238979fc9aSMauro Carvalho Chehab  - When switching another root device, initrd would pivot_root and then
1248979fc9aSMauro Carvalho Chehab    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
1258979fc9aSMauro Carvalho Chehab    rootfs, nor unmount it.  Instead delete everything out of rootfs to
1268979fc9aSMauro Carvalho Chehab    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
1278979fc9aSMauro Carvalho Chehab    with the new root (cd /newmount; mount --move . /; chroot .), attach
1288979fc9aSMauro Carvalho Chehab    stdin/stdout/stderr to the new /dev/console, and exec the new init.
1298979fc9aSMauro Carvalho Chehab
1308979fc9aSMauro Carvalho Chehab    Since this is a remarkably persnickety process (and involves deleting
1318979fc9aSMauro Carvalho Chehab    commands before you can run them), the klibc package introduced a helper
1328979fc9aSMauro Carvalho Chehab    program (utils/run_init.c) to do all this for you.  Most other packages
1338979fc9aSMauro Carvalho Chehab    (such as busybox) have named this command "switch_root".
1348979fc9aSMauro Carvalho Chehab
1358979fc9aSMauro Carvalho ChehabPopulating initramfs:
1368979fc9aSMauro Carvalho Chehab---------------------
1378979fc9aSMauro Carvalho Chehab
1388979fc9aSMauro Carvalho ChehabThe 2.6 kernel build process always creates a gzipped cpio format initramfs
1398979fc9aSMauro Carvalho Chehabarchive and links it into the resulting kernel binary.  By default, this
1408979fc9aSMauro Carvalho Chehabarchive is empty (consuming 134 bytes on x86).
1418979fc9aSMauro Carvalho Chehab
1428979fc9aSMauro Carvalho ChehabThe config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig,
1438979fc9aSMauro Carvalho Chehaband living in usr/Kconfig) can be used to specify a source for the
1448979fc9aSMauro Carvalho Chehabinitramfs archive, which will automatically be incorporated into the
1458979fc9aSMauro Carvalho Chehabresulting binary.  This option can point to an existing gzipped cpio
1468979fc9aSMauro Carvalho Chehabarchive, a directory containing files to be archived, or a text file
1478979fc9aSMauro Carvalho Chehabspecification such as the following example::
1488979fc9aSMauro Carvalho Chehab
1498979fc9aSMauro Carvalho Chehab  dir /dev 755 0 0
1508979fc9aSMauro Carvalho Chehab  nod /dev/console 644 0 0 c 5 1
1518979fc9aSMauro Carvalho Chehab  nod /dev/loop0 644 0 0 b 7 0
1528979fc9aSMauro Carvalho Chehab  dir /bin 755 1000 1000
1538979fc9aSMauro Carvalho Chehab  slink /bin/sh busybox 777 0 0
1548979fc9aSMauro Carvalho Chehab  file /bin/busybox initramfs/busybox 755 0 0
1558979fc9aSMauro Carvalho Chehab  dir /proc 755 0 0
1568979fc9aSMauro Carvalho Chehab  dir /sys 755 0 0
1578979fc9aSMauro Carvalho Chehab  dir /mnt 755 0 0
1588979fc9aSMauro Carvalho Chehab  file /init initramfs/init.sh 755 0 0
1598979fc9aSMauro Carvalho Chehab
1608979fc9aSMauro Carvalho ChehabRun "usr/gen_init_cpio" (after the kernel build) to get a usage message
1618979fc9aSMauro Carvalho Chehabdocumenting the above file format.
1628979fc9aSMauro Carvalho Chehab
1638979fc9aSMauro Carvalho ChehabOne advantage of the configuration file is that root access is not required to
1648979fc9aSMauro Carvalho Chehabset permissions or create device nodes in the new archive.  (Note that those
1658979fc9aSMauro Carvalho Chehabtwo example "file" entries expect to find files named "init.sh" and "busybox" in
1668979fc9aSMauro Carvalho Chehaba directory called "initramfs", under the linux-2.6.* directory.  See
1678979fc9aSMauro Carvalho ChehabDocumentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
1688979fc9aSMauro Carvalho Chehab
1698979fc9aSMauro Carvalho ChehabThe kernel does not depend on external cpio tools.  If you specify a
1708979fc9aSMauro Carvalho Chehabdirectory instead of a configuration file, the kernel's build infrastructure
1718979fc9aSMauro Carvalho Chehabcreates a configuration file from that directory (usr/Makefile calls
1725e60f363SRobert Richterusr/gen_initramfs.sh), and proceeds to package up that directory
1738979fc9aSMauro Carvalho Chehabusing the config file (by feeding it to usr/gen_init_cpio, which is created
1748979fc9aSMauro Carvalho Chehabfrom usr/gen_init_cpio.c).  The kernel's build-time cpio creation code is
1758979fc9aSMauro Carvalho Chehabentirely self-contained, and the kernel's boot-time extractor is also
1768979fc9aSMauro Carvalho Chehab(obviously) self-contained.
1778979fc9aSMauro Carvalho Chehab
1788979fc9aSMauro Carvalho ChehabThe one thing you might need external cpio utilities installed for is creating
1798979fc9aSMauro Carvalho Chehabor extracting your own preprepared cpio files to feed to the kernel build
1808979fc9aSMauro Carvalho Chehab(instead of a config file or directory).
1818979fc9aSMauro Carvalho Chehab
1828979fc9aSMauro Carvalho ChehabThe following command line can extract a cpio image (either by the above script
1838979fc9aSMauro Carvalho Chehabor by the kernel build) back into its component files::
1848979fc9aSMauro Carvalho Chehab
1858979fc9aSMauro Carvalho Chehab  cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
1868979fc9aSMauro Carvalho Chehab
1878979fc9aSMauro Carvalho ChehabThe following shell script can create a prebuilt cpio archive you can
1888979fc9aSMauro Carvalho Chehabuse in place of the above config file::
1898979fc9aSMauro Carvalho Chehab
1908979fc9aSMauro Carvalho Chehab  #!/bin/sh
1918979fc9aSMauro Carvalho Chehab
1928979fc9aSMauro Carvalho Chehab  # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
1938979fc9aSMauro Carvalho Chehab  # Licensed under GPL version 2
1948979fc9aSMauro Carvalho Chehab
1958979fc9aSMauro Carvalho Chehab  if [ $# -ne 2 ]
1968979fc9aSMauro Carvalho Chehab  then
1978979fc9aSMauro Carvalho Chehab    echo "usage: mkinitramfs directory imagename.cpio.gz"
1988979fc9aSMauro Carvalho Chehab    exit 1
1998979fc9aSMauro Carvalho Chehab  fi
2008979fc9aSMauro Carvalho Chehab
2018979fc9aSMauro Carvalho Chehab  if [ -d "$1" ]
2028979fc9aSMauro Carvalho Chehab  then
2038979fc9aSMauro Carvalho Chehab    echo "creating $2 from $1"
2048979fc9aSMauro Carvalho Chehab    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
2058979fc9aSMauro Carvalho Chehab  else
2068979fc9aSMauro Carvalho Chehab    echo "First argument must be a directory"
2078979fc9aSMauro Carvalho Chehab    exit 1
2088979fc9aSMauro Carvalho Chehab  fi
2098979fc9aSMauro Carvalho Chehab
2108979fc9aSMauro Carvalho Chehab.. Note::
2118979fc9aSMauro Carvalho Chehab
2128979fc9aSMauro Carvalho Chehab   The cpio man page contains some bad advice that will break your initramfs
2138979fc9aSMauro Carvalho Chehab   archive if you follow it.  It says "A typical way to generate the list
2148979fc9aSMauro Carvalho Chehab   of filenames is with the find command; you should give find the -depth
2158979fc9aSMauro Carvalho Chehab   option to minimize problems with permissions on directories that are
2168979fc9aSMauro Carvalho Chehab   unwritable or not searchable."  Don't do this when creating
2178979fc9aSMauro Carvalho Chehab   initramfs.cpio.gz images, it won't work.  The Linux kernel cpio extractor
2188979fc9aSMauro Carvalho Chehab   won't create files in a directory that doesn't exist, so the directory
2198979fc9aSMauro Carvalho Chehab   entries must go before the files that go in those directories.
2208979fc9aSMauro Carvalho Chehab   The above script gets them in the right order.
2218979fc9aSMauro Carvalho Chehab
2228979fc9aSMauro Carvalho ChehabExternal initramfs images:
2238979fc9aSMauro Carvalho Chehab--------------------------
2248979fc9aSMauro Carvalho Chehab
2258979fc9aSMauro Carvalho ChehabIf the kernel has initrd support enabled, an external cpio.gz archive can also
2268979fc9aSMauro Carvalho Chehabbe passed into a 2.6 kernel in place of an initrd.  In this case, the kernel
2278979fc9aSMauro Carvalho Chehabwill autodetect the type (initramfs, not initrd) and extract the external cpio
2288979fc9aSMauro Carvalho Chehabarchive into rootfs before trying to run /init.
2298979fc9aSMauro Carvalho Chehab
2308979fc9aSMauro Carvalho ChehabThis has the memory efficiency advantages of initramfs (no ramdisk block
2318979fc9aSMauro Carvalho Chehabdevice) but the separate packaging of initrd (which is nice if you have
2328979fc9aSMauro Carvalho Chehabnon-GPL code you'd like to run from initramfs, without conflating it with
2338979fc9aSMauro Carvalho Chehabthe GPL licensed Linux kernel binary).
2348979fc9aSMauro Carvalho Chehab
2358979fc9aSMauro Carvalho ChehabIt can also be used to supplement the kernel's built-in initramfs image.  The
2368979fc9aSMauro Carvalho Chehabfiles in the external archive will overwrite any conflicting files in
2378979fc9aSMauro Carvalho Chehabthe built-in initramfs archive.  Some distributors also prefer to customize
2388979fc9aSMauro Carvalho Chehaba single kernel image with task-specific initramfs images, without recompiling.
2398979fc9aSMauro Carvalho Chehab
2408979fc9aSMauro Carvalho ChehabContents of initramfs:
2418979fc9aSMauro Carvalho Chehab----------------------
2428979fc9aSMauro Carvalho Chehab
2438979fc9aSMauro Carvalho ChehabAn initramfs archive is a complete self-contained root filesystem for Linux.
2448979fc9aSMauro Carvalho ChehabIf you don't already understand what shared libraries, devices, and paths
2458979fc9aSMauro Carvalho Chehabyou need to get a minimal root filesystem up and running, here are some
2468979fc9aSMauro Carvalho Chehabreferences:
2478979fc9aSMauro Carvalho Chehab
248c69f22f2SAlexander A. Klimov- https://www.tldp.org/HOWTO/Bootdisk-HOWTO/
249c69f22f2SAlexander A. Klimov- https://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
2508979fc9aSMauro Carvalho Chehab- http://www.linuxfromscratch.org/lfs/view/stable/
2518979fc9aSMauro Carvalho Chehab
252c69f22f2SAlexander A. KlimovThe "klibc" package (https://www.kernel.org/pub/linux/libs/klibc) is
2538979fc9aSMauro Carvalho Chehabdesigned to be a tiny C library to statically link early userspace
2548979fc9aSMauro Carvalho Chehabcode against, along with some related utilities.  It is BSD licensed.
2558979fc9aSMauro Carvalho Chehab
256c69f22f2SAlexander A. KlimovI use uClibc (https://www.uclibc.org) and busybox (https://www.busybox.net)
2578979fc9aSMauro Carvalho Chehabmyself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
2588979fc9aSMauro Carvalho Chehabpackage is planned for the busybox 1.3 release.)
2598979fc9aSMauro Carvalho Chehab
2608979fc9aSMauro Carvalho ChehabIn theory you could use glibc, but that's not well suited for small embedded
2618979fc9aSMauro Carvalho Chehabuses like this.  (A "hello world" program statically linked against glibc is
2628979fc9aSMauro Carvalho Chehabover 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
2638979fc9aSMauro Carvalho Chehabname lookups, even when otherwise statically linked.)
2648979fc9aSMauro Carvalho Chehab
2658979fc9aSMauro Carvalho ChehabA good first step is to get initramfs to run a statically linked "hello world"
2668979fc9aSMauro Carvalho Chehabprogram as init, and test it under an emulator like qemu (www.qemu.org) or
2678979fc9aSMauro Carvalho ChehabUser Mode Linux, like so::
2688979fc9aSMauro Carvalho Chehab
2698979fc9aSMauro Carvalho Chehab  cat > hello.c << EOF
2708979fc9aSMauro Carvalho Chehab  #include <stdio.h>
2718979fc9aSMauro Carvalho Chehab  #include <unistd.h>
2728979fc9aSMauro Carvalho Chehab
2738979fc9aSMauro Carvalho Chehab  int main(int argc, char *argv[])
2748979fc9aSMauro Carvalho Chehab  {
2758979fc9aSMauro Carvalho Chehab    printf("Hello world!\n");
2768979fc9aSMauro Carvalho Chehab    sleep(999999999);
2778979fc9aSMauro Carvalho Chehab  }
2788979fc9aSMauro Carvalho Chehab  EOF
2798979fc9aSMauro Carvalho Chehab  gcc -static hello.c -o init
2808979fc9aSMauro Carvalho Chehab  echo init | cpio -o -H newc | gzip > test.cpio.gz
2818979fc9aSMauro Carvalho Chehab  # Testing external initramfs using the initrd loading mechanism.
2828979fc9aSMauro Carvalho Chehab  qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
2838979fc9aSMauro Carvalho Chehab
2848979fc9aSMauro Carvalho ChehabWhen debugging a normal root filesystem, it's nice to be able to boot with
2858979fc9aSMauro Carvalho Chehab"init=/bin/sh".  The initramfs equivalent is "rdinit=/bin/sh", and it's
2868979fc9aSMauro Carvalho Chehabjust as useful.
2878979fc9aSMauro Carvalho Chehab
2888979fc9aSMauro Carvalho ChehabWhy cpio rather than tar?
2898979fc9aSMauro Carvalho Chehab-------------------------
2908979fc9aSMauro Carvalho Chehab
2918979fc9aSMauro Carvalho ChehabThis decision was made back in December, 2001.  The discussion started here:
2928979fc9aSMauro Carvalho Chehab
2938979fc9aSMauro Carvalho Chehab  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
2948979fc9aSMauro Carvalho Chehab
2958979fc9aSMauro Carvalho ChehabAnd spawned a second thread (specifically on tar vs cpio), starting here:
2968979fc9aSMauro Carvalho Chehab
2978979fc9aSMauro Carvalho Chehab  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
2988979fc9aSMauro Carvalho Chehab
2998979fc9aSMauro Carvalho ChehabThe quick and dirty summary version (which is no substitute for reading
3008979fc9aSMauro Carvalho Chehabthe above threads) is:
3018979fc9aSMauro Carvalho Chehab
3028979fc9aSMauro Carvalho Chehab1) cpio is a standard.  It's decades old (from the AT&T days), and already
3038979fc9aSMauro Carvalho Chehab   widely used on Linux (inside RPM, Red Hat's device driver disks).  Here's
3048979fc9aSMauro Carvalho Chehab   a Linux Journal article about it from 1996:
3058979fc9aSMauro Carvalho Chehab
3068979fc9aSMauro Carvalho Chehab      http://www.linuxjournal.com/article/1213
3078979fc9aSMauro Carvalho Chehab
3088979fc9aSMauro Carvalho Chehab   It's not as popular as tar because the traditional cpio command line tools
3098979fc9aSMauro Carvalho Chehab   require _truly_hideous_ command line arguments.  But that says nothing
3108979fc9aSMauro Carvalho Chehab   either way about the archive format, and there are alternative tools,
3118979fc9aSMauro Carvalho Chehab   such as:
3128979fc9aSMauro Carvalho Chehab
3138979fc9aSMauro Carvalho Chehab     http://freecode.com/projects/afio
3148979fc9aSMauro Carvalho Chehab
3158979fc9aSMauro Carvalho Chehab2) The cpio archive format chosen by the kernel is simpler and cleaner (and
3168979fc9aSMauro Carvalho Chehab   thus easier to create and parse) than any of the (literally dozens of)
3178979fc9aSMauro Carvalho Chehab   various tar archive formats.  The complete initramfs archive format is
3188979fc9aSMauro Carvalho Chehab   explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
3198979fc9aSMauro Carvalho Chehab   extracted in init/initramfs.c.  All three together come to less than 26k
3208979fc9aSMauro Carvalho Chehab   total of human-readable text.
3218979fc9aSMauro Carvalho Chehab
3228979fc9aSMauro Carvalho Chehab3) The GNU project standardizing on tar is approximately as relevant as
3238979fc9aSMauro Carvalho Chehab   Windows standardizing on zip.  Linux is not part of either, and is free
3248979fc9aSMauro Carvalho Chehab   to make its own technical decisions.
3258979fc9aSMauro Carvalho Chehab
3268979fc9aSMauro Carvalho Chehab4) Since this is a kernel internal format, it could easily have been
3278979fc9aSMauro Carvalho Chehab   something brand new.  The kernel provides its own tools to create and
3288979fc9aSMauro Carvalho Chehab   extract this format anyway.  Using an existing standard was preferable,
3298979fc9aSMauro Carvalho Chehab   but not essential.
3308979fc9aSMauro Carvalho Chehab
3318979fc9aSMauro Carvalho Chehab5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
3328979fc9aSMauro Carvalho Chehab   supported on the kernel side"):
3338979fc9aSMauro Carvalho Chehab
3348979fc9aSMauro Carvalho Chehab      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
3358979fc9aSMauro Carvalho Chehab
3368979fc9aSMauro Carvalho Chehab   explained his reasoning:
3378979fc9aSMauro Carvalho Chehab
3388979fc9aSMauro Carvalho Chehab     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
3398979fc9aSMauro Carvalho Chehab     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
3408979fc9aSMauro Carvalho Chehab
3418979fc9aSMauro Carvalho Chehab   and, most importantly, designed and implemented the initramfs code.
3428979fc9aSMauro Carvalho Chehab
3438979fc9aSMauro Carvalho ChehabFuture directions:
3448979fc9aSMauro Carvalho Chehab------------------
3458979fc9aSMauro Carvalho Chehab
3468979fc9aSMauro Carvalho ChehabToday (2.6.16), initramfs is always compiled in, but not always used.  The
3478979fc9aSMauro Carvalho Chehabkernel falls back to legacy boot code that is reached only if initramfs does
3488979fc9aSMauro Carvalho Chehabnot contain an /init program.  The fallback is legacy code, there to ensure a
3498979fc9aSMauro Carvalho Chehabsmooth transition and allowing early boot functionality to gradually move to
3508979fc9aSMauro Carvalho Chehab"early userspace" (I.E. initramfs).
3518979fc9aSMauro Carvalho Chehab
3528979fc9aSMauro Carvalho ChehabThe move to early userspace is necessary because finding and mounting the real
3538979fc9aSMauro Carvalho Chehabroot device is complex.  Root partitions can span multiple devices (raid or
3548979fc9aSMauro Carvalho Chehabseparate journal).  They can be out on the network (requiring dhcp, setting a
3558979fc9aSMauro Carvalho Chehabspecific MAC address, logging into a server, etc).  They can live on removable
3568979fc9aSMauro Carvalho Chehabmedia, with dynamically allocated major/minor numbers and persistent naming
3578979fc9aSMauro Carvalho Chehabissues requiring a full udev implementation to sort out.  They can be
3588979fc9aSMauro Carvalho Chehabcompressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
3598979fc9aSMauro Carvalho Chehaband so on.
3608979fc9aSMauro Carvalho Chehab
3618979fc9aSMauro Carvalho ChehabThis kind of complexity (which inevitably includes policy) is rightly handled
3628979fc9aSMauro Carvalho Chehabin userspace.  Both klibc and busybox/uClibc are working on simple initramfs
3638979fc9aSMauro Carvalho Chehabpackages to drop into a kernel build.
3648979fc9aSMauro Carvalho Chehab
3658979fc9aSMauro Carvalho ChehabThe klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
3668979fc9aSMauro Carvalho ChehabThe kernel's current early boot code (partition detection, etc) will probably
3678979fc9aSMauro Carvalho Chehabbe migrated into a default initramfs, automatically created and used by the
3688979fc9aSMauro Carvalho Chehabkernel build.
369