xref: /openbmc/linux/fs/cramfs/README (revision 4f2c0a4acffbec01079c28f839422e64ddeff004)
11da177e4SLinus TorvaldsNotes on Filesystem Layout
21da177e4SLinus Torvalds--------------------------
31da177e4SLinus Torvalds
41da177e4SLinus TorvaldsThese notes describe what mkcramfs generates.  Kernel requirements are
51da177e4SLinus Torvaldsa bit looser, e.g. it doesn't care if the <file_data> items are
61da177e4SLinus Torvaldsswapped around (though it does care that directory entries (inodes) in
71da177e4SLinus Torvaldsa given directory are contiguous, as this is used by readdir).
81da177e4SLinus Torvalds
91da177e4SLinus TorvaldsAll data is currently in host-endian format; neither mkcramfs nor the
101da177e4SLinus Torvaldskernel ever do swabbing.  (See section `Block Size' below.)
111da177e4SLinus Torvalds
121da177e4SLinus Torvalds<filesystem>:
131da177e4SLinus Torvalds	<superblock>
141da177e4SLinus Torvalds	<directory_structure>
151da177e4SLinus Torvalds	<data>
161da177e4SLinus Torvalds
171da177e4SLinus Torvalds<superblock>: struct cramfs_super (see cramfs_fs.h).
181da177e4SLinus Torvalds
191da177e4SLinus Torvalds<directory_structure>:
201da177e4SLinus Torvalds	For each file:
211da177e4SLinus Torvalds		struct cramfs_inode (see cramfs_fs.h).
221da177e4SLinus Torvalds		Filename.  Not generally null-terminated, but it is
231da177e4SLinus Torvalds		 null-padded to a multiple of 4 bytes.
241da177e4SLinus Torvalds
251da177e4SLinus TorvaldsThe order of inode traversal is described as "width-first" (not to be
261da177e4SLinus Torvaldsconfused with breadth-first); i.e. like depth-first but listing all of
271da177e4SLinus Torvaldsa directory's entries before recursing down its subdirectories: the
281da177e4SLinus Torvaldssame order as `ls -AUR' (but without the /^\..*:$/ directory header
291da177e4SLinus Torvaldslines); put another way, the same order as `find -type d -exec
301da177e4SLinus Torvaldsls -AU1 {} \;'.
311da177e4SLinus Torvalds
321da177e4SLinus TorvaldsBeginning in 2.4.7, directory entries are sorted.  This optimization
331da177e4SLinus Torvaldsallows cramfs_lookup to return more quickly when a filename does not
341da177e4SLinus Torvaldsexist, speeds up user-space directory sorts, etc.
351da177e4SLinus Torvalds
361da177e4SLinus Torvalds<data>:
371da177e4SLinus Torvalds	One <file_data> for each file that's either a symlink or a
381da177e4SLinus Torvalds	 regular file of non-zero st_size.
391da177e4SLinus Torvalds
401da177e4SLinus Torvalds<file_data>:
411da177e4SLinus Torvalds	nblocks * <block_pointer>
421da177e4SLinus Torvalds	 (where nblocks = (st_size - 1) / blksize + 1)
431da177e4SLinus Torvalds	nblocks * <block>
441da177e4SLinus Torvalds	padding to multiple of 4 bytes
451da177e4SLinus Torvalds
461da177e4SLinus TorvaldsThe i'th <block_pointer> for a file stores the byte offset of the
471da177e4SLinus Torvalds*end* of the i'th <block> (i.e. one past the last byte, which is the
481da177e4SLinus Torvaldssame as the start of the (i+1)'th <block> if there is one).  The first
491da177e4SLinus Torvalds<block> immediately follows the last <block_pointer> for the file.
501da177e4SLinus Torvalds<block_pointer>s are each 32 bits long.
511da177e4SLinus Torvalds
52fd4f6f2aSNicolas PitreWhen the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each
53fd4f6f2aSNicolas Pitre<block_pointer>'s top bits may contain special flags as follows:
54fd4f6f2aSNicolas Pitre
55fd4f6f2aSNicolas PitreCRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31):
56fd4f6f2aSNicolas Pitre	The block data is not compressed and should be copied verbatim.
57fd4f6f2aSNicolas Pitre
58fd4f6f2aSNicolas PitreCRAMFS_BLK_FLAG_DIRECT_PTR (bit 30):
59fd4f6f2aSNicolas Pitre	The <block_pointer> stores the actual block start offset and not
60fd4f6f2aSNicolas Pitre	its end, shifted right by 2 bits. The block must therefore be
61fd4f6f2aSNicolas Pitre	aligned to a 4-byte boundary. The block size is either blksize
62fd4f6f2aSNicolas Pitre	if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise
63fd4f6f2aSNicolas Pitre	the compressed data length is included in the first 2 bytes of
64fd4f6f2aSNicolas Pitre	the block data. This is used to allow discontiguous data layout
65fd4f6f2aSNicolas Pitre	and specific data block alignments e.g. for XIP applications.
66fd4f6f2aSNicolas Pitre
67fd4f6f2aSNicolas Pitre
681da177e4SLinus TorvaldsThe order of <file_data>'s is a depth-first descent of the directory
691da177e4SLinus Torvaldstree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
701da177e4SLinus Torvalds-print'.
711da177e4SLinus Torvalds
721da177e4SLinus Torvalds
731da177e4SLinus Torvalds<block>: The i'th <block> is the output of zlib's compress function
74fd4f6f2aSNicolas Pitreapplied to the i'th blksize-sized chunk of the input data if the
75fd4f6f2aSNicolas Pitrecorresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set,
76fd4f6f2aSNicolas Pitreotherwise it is the input data directly.
771da177e4SLinus Torvalds(For the last <block> of the file, the input may of course be smaller.)
781da177e4SLinus TorvaldsEach <block> may be a different size.  (See <block_pointer> above.)
79fd4f6f2aSNicolas Pitre
801da177e4SLinus Torvalds<block>s are merely byte-aligned, not generally u32-aligned.
811da177e4SLinus Torvalds
82fd4f6f2aSNicolas PitreWhen CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding
83fd4f6f2aSNicolas Pitre<block> may be located anywhere and not necessarily contiguous with
84fd4f6f2aSNicolas Pitrethe previous/next blocks. In that case it is minimally u32-aligned.
85fd4f6f2aSNicolas PitreIf CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always
86fd4f6f2aSNicolas Pitreblksize except for the last block which is limited by the file length.
87fd4f6f2aSNicolas PitreIf CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED
88fd4f6f2aSNicolas Pitreis not set then the first 2 bytes of the block contains the size of the
89fd4f6f2aSNicolas Pitreremaining block data as this cannot be determined from the placement of
90fd4f6f2aSNicolas Pitrelogically adjacent blocks.
91fd4f6f2aSNicolas Pitre
921da177e4SLinus Torvalds
931da177e4SLinus TorvaldsHoles
941da177e4SLinus Torvalds-----
951da177e4SLinus Torvalds
961da177e4SLinus TorvaldsThis kernel supports cramfs holes (i.e. [efficient representation of]
971da177e4SLinus Torvaldsblocks in uncompressed data consisting entirely of NUL bytes), but by
981da177e4SLinus Torvaldsdefault mkcramfs doesn't test for & create holes, since cramfs in
991da177e4SLinus Torvaldskernels up to at least 2.3.39 didn't support holes.  Run mkcramfs
1001da177e4SLinus Torvaldswith -z if you want it to create files that can have holes in them.
1011da177e4SLinus Torvalds
1021da177e4SLinus Torvalds
1031da177e4SLinus TorvaldsTools
1041da177e4SLinus Torvalds-----
1051da177e4SLinus Torvalds
1061da177e4SLinus TorvaldsThe cramfs user-space tools, including mkcramfs and cramfsck, are
1071da177e4SLinus Torvaldslocated at <http://sourceforge.net/projects/cramfs/>.
1081da177e4SLinus Torvalds
1091da177e4SLinus Torvalds
1101da177e4SLinus TorvaldsFuture Development
1111da177e4SLinus Torvalds==================
1121da177e4SLinus Torvalds
1131da177e4SLinus TorvaldsBlock Size
1141da177e4SLinus Torvalds----------
1151da177e4SLinus Torvalds
1161da177e4SLinus Torvalds(Block size in cramfs refers to the size of input data that is
1171da177e4SLinus Torvaldscompressed at a time.  It's intended to be somewhere around
118*5aab331aSMatthew Wilcox (Oracle)PAGE_SIZE for cramfs_read_folio's convenience.)
1191da177e4SLinus Torvalds
1201da177e4SLinus TorvaldsThe superblock ought to indicate the block size that the fs was
1211da177e4SLinus Torvaldswritten for, since comments in <linux/pagemap.h> indicate that
122ea1754a0SKirill A. ShutemovPAGE_SIZE may grow in future (if I interpret the comment
1231da177e4SLinus Torvaldscorrectly).
1241da177e4SLinus Torvalds
125ea1754a0SKirill A. ShutemovCurrently, mkcramfs #define's PAGE_SIZE as 4096 and uses that
126ea1754a0SKirill A. Shutemovfor blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in
1271da177e4SLinus Torvaldsturn is defined as PAGE_SIZE (which can be as large as 32KB on arm).
1281da177e4SLinus TorvaldsThis discrepancy is a bug, though it's not clear which should be
1291da177e4SLinus Torvaldschanged.
1301da177e4SLinus Torvalds
131ea1754a0SKirill A. ShutemovOne option is to change mkcramfs to take its PAGE_SIZE from
1321da177e4SLinus Torvalds<asm/page.h>.  Personally I don't like this option, but it does
1331da177e4SLinus Torvaldsrequire the least amount of change: just change `#define
134ea1754a0SKirill A. ShutemovPAGE_SIZE (4096)' to `#include <asm/page.h>'.  The disadvantage
1351da177e4SLinus Torvaldsis that the generated cramfs cannot always be shared between different
1361da177e4SLinus Torvaldskernels, not even necessarily kernels of the same architecture if
137ea1754a0SKirill A. ShutemovPAGE_SIZE is subject to change between kernel versions
1381da177e4SLinus Torvalds(currently possible with arm and ia64).
1391da177e4SLinus Torvalds
1401da177e4SLinus TorvaldsThe remaining options try to make cramfs more sharable.
1411da177e4SLinus Torvalds
1421da177e4SLinus TorvaldsOne part of that is addressing endianness.  The two options here are
1431da177e4SLinus Torvalds`always use little-endian' (like ext2fs) or `writer chooses
1441da177e4SLinus Torvaldsendianness; kernel adapts at runtime'.  Little-endian wins because of
1451da177e4SLinus Torvaldscode simplicity and little CPU overhead even on big-endian machines.
1461da177e4SLinus Torvalds
1471da177e4SLinus TorvaldsThe cost of swabbing is changing the code to use the le32_to_cpu
1481da177e4SLinus Torvaldsetc. macros as used by ext2fs.  We don't need to swab the compressed
1491da177e4SLinus Torvaldsdata, only the superblock, inodes and block pointers.
1501da177e4SLinus Torvalds
1511da177e4SLinus Torvalds
1521da177e4SLinus TorvaldsThe other part of making cramfs more sharable is choosing a block
1531da177e4SLinus Torvaldssize.  The options are:
1541da177e4SLinus Torvalds
1551da177e4SLinus Torvalds  1. Always 4096 bytes.
1561da177e4SLinus Torvalds
1571da177e4SLinus Torvalds  2. Writer chooses blocksize; kernel adapts but rejects blocksize >
158ea1754a0SKirill A. Shutemov     PAGE_SIZE.
1591da177e4SLinus Torvalds
1601da177e4SLinus Torvalds  3. Writer chooses blocksize; kernel adapts even to blocksize >
161ea1754a0SKirill A. Shutemov     PAGE_SIZE.
1621da177e4SLinus Torvalds
1631da177e4SLinus TorvaldsIt's easy enough to change the kernel to use a smaller value than
164*5aab331aSMatthew Wilcox (Oracle)PAGE_SIZE: just make cramfs_read_folio read multiple blocks.
1651da177e4SLinus Torvalds
166ea1754a0SKirill A. ShutemovThe cost of option 1 is that kernels with a larger PAGE_SIZE
1671da177e4SLinus Torvaldsvalue don't get as good compression as they can.
1681da177e4SLinus Torvalds
1691da177e4SLinus TorvaldsThe cost of option 2 relative to option 1 is that the code uses
1701da177e4SLinus Torvaldsvariables instead of #define'd constants.  The gain is that people
171ea1754a0SKirill A. Shutemovwith kernels having larger PAGE_SIZE can make use of that if
1721da177e4SLinus Torvaldsthey don't mind their cramfs being inaccessible to kernels with
173ea1754a0SKirill A. Shutemovsmaller PAGE_SIZE values.
1741da177e4SLinus Torvalds
1751da177e4SLinus TorvaldsOption 3 is easy to implement if we don't mind being CPU-inefficient:
176*5aab331aSMatthew Wilcox (Oracle)e.g. get read_folio to decompress to a buffer of size MAX_BLKSIZE (which
1771da177e4SLinus Torvaldsmust be no larger than 32KB) and discard what it doesn't need.
178*5aab331aSMatthew Wilcox (Oracle)Getting read_folio to read into all the covered pages is harder.
1791da177e4SLinus Torvalds
1801da177e4SLinus TorvaldsThe main advantage of option 3 over 1, 2, is better compression.  The
1811da177e4SLinus Torvaldscost is greater complexity.  Probably not worth it, but I hope someone
1821da177e4SLinus Torvaldswill disagree.  (If it is implemented, then I'll re-use that code in
1831da177e4SLinus Torvaldse2compr.)
1841da177e4SLinus Torvalds
1851da177e4SLinus Torvalds
1861da177e4SLinus TorvaldsAnother cost of 2 and 3 over 1 is making mkcramfs use a different
1871da177e4SLinus Torvaldsblock size, but that just means adding and parsing a -b option.
1881da177e4SLinus Torvalds
1891da177e4SLinus Torvalds
1901da177e4SLinus TorvaldsInode Size
1911da177e4SLinus Torvalds----------
1921da177e4SLinus Torvalds
1931da177e4SLinus TorvaldsGiven that cramfs will probably be used for CDs etc. as well as just
1941da177e4SLinus Torvaldssilicon ROMs, it might make sense to expand the inode a little from
1951da177e4SLinus Torvaldsits current 12 bytes.  Inodes other than the root inode are followed
1961da177e4SLinus Torvaldsby filename, so the expansion doesn't even have to be a multiple of 4
1971da177e4SLinus Torvaldsbytes.
198