11da177e4SLinus TorvaldsNotes on Filesystem Layout 21da177e4SLinus Torvalds-------------------------- 31da177e4SLinus Torvalds 41da177e4SLinus TorvaldsThese notes describe what mkcramfs generates. Kernel requirements are 51da177e4SLinus Torvaldsa bit looser, e.g. it doesn't care if the <file_data> items are 61da177e4SLinus Torvaldsswapped around (though it does care that directory entries (inodes) in 71da177e4SLinus Torvaldsa given directory are contiguous, as this is used by readdir). 81da177e4SLinus Torvalds 91da177e4SLinus TorvaldsAll data is currently in host-endian format; neither mkcramfs nor the 101da177e4SLinus Torvaldskernel ever do swabbing. (See section `Block Size' below.) 111da177e4SLinus Torvalds 121da177e4SLinus Torvalds<filesystem>: 131da177e4SLinus Torvalds <superblock> 141da177e4SLinus Torvalds <directory_structure> 151da177e4SLinus Torvalds <data> 161da177e4SLinus Torvalds 171da177e4SLinus Torvalds<superblock>: struct cramfs_super (see cramfs_fs.h). 181da177e4SLinus Torvalds 191da177e4SLinus Torvalds<directory_structure>: 201da177e4SLinus Torvalds For each file: 211da177e4SLinus Torvalds struct cramfs_inode (see cramfs_fs.h). 221da177e4SLinus Torvalds Filename. Not generally null-terminated, but it is 231da177e4SLinus Torvalds null-padded to a multiple of 4 bytes. 241da177e4SLinus Torvalds 251da177e4SLinus TorvaldsThe order of inode traversal is described as "width-first" (not to be 261da177e4SLinus Torvaldsconfused with breadth-first); i.e. like depth-first but listing all of 271da177e4SLinus Torvaldsa directory's entries before recursing down its subdirectories: the 281da177e4SLinus Torvaldssame order as `ls -AUR' (but without the /^\..*:$/ directory header 291da177e4SLinus Torvaldslines); put another way, the same order as `find -type d -exec 301da177e4SLinus Torvaldsls -AU1 {} \;'. 311da177e4SLinus Torvalds 321da177e4SLinus TorvaldsBeginning in 2.4.7, directory entries are sorted. This optimization 331da177e4SLinus Torvaldsallows cramfs_lookup to return more quickly when a filename does not 341da177e4SLinus Torvaldsexist, speeds up user-space directory sorts, etc. 351da177e4SLinus Torvalds 361da177e4SLinus Torvalds<data>: 371da177e4SLinus Torvalds One <file_data> for each file that's either a symlink or a 381da177e4SLinus Torvalds regular file of non-zero st_size. 391da177e4SLinus Torvalds 401da177e4SLinus Torvalds<file_data>: 411da177e4SLinus Torvalds nblocks * <block_pointer> 421da177e4SLinus Torvalds (where nblocks = (st_size - 1) / blksize + 1) 431da177e4SLinus Torvalds nblocks * <block> 441da177e4SLinus Torvalds padding to multiple of 4 bytes 451da177e4SLinus Torvalds 461da177e4SLinus TorvaldsThe i'th <block_pointer> for a file stores the byte offset of the 471da177e4SLinus Torvalds*end* of the i'th <block> (i.e. one past the last byte, which is the 481da177e4SLinus Torvaldssame as the start of the (i+1)'th <block> if there is one). The first 491da177e4SLinus Torvalds<block> immediately follows the last <block_pointer> for the file. 501da177e4SLinus Torvalds<block_pointer>s are each 32 bits long. 511da177e4SLinus Torvalds 52fd4f6f2aSNicolas PitreWhen the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each 53fd4f6f2aSNicolas Pitre<block_pointer>'s top bits may contain special flags as follows: 54fd4f6f2aSNicolas Pitre 55fd4f6f2aSNicolas PitreCRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31): 56fd4f6f2aSNicolas Pitre The block data is not compressed and should be copied verbatim. 57fd4f6f2aSNicolas Pitre 58fd4f6f2aSNicolas PitreCRAMFS_BLK_FLAG_DIRECT_PTR (bit 30): 59fd4f6f2aSNicolas Pitre The <block_pointer> stores the actual block start offset and not 60fd4f6f2aSNicolas Pitre its end, shifted right by 2 bits. The block must therefore be 61fd4f6f2aSNicolas Pitre aligned to a 4-byte boundary. The block size is either blksize 62fd4f6f2aSNicolas Pitre if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise 63fd4f6f2aSNicolas Pitre the compressed data length is included in the first 2 bytes of 64fd4f6f2aSNicolas Pitre the block data. This is used to allow discontiguous data layout 65fd4f6f2aSNicolas Pitre and specific data block alignments e.g. for XIP applications. 66fd4f6f2aSNicolas Pitre 67fd4f6f2aSNicolas Pitre 681da177e4SLinus TorvaldsThe order of <file_data>'s is a depth-first descent of the directory 691da177e4SLinus Torvaldstree, i.e. the same order as `find -size +0 \( -type f -o -type l \) 701da177e4SLinus Torvalds-print'. 711da177e4SLinus Torvalds 721da177e4SLinus Torvalds 731da177e4SLinus Torvalds<block>: The i'th <block> is the output of zlib's compress function 74fd4f6f2aSNicolas Pitreapplied to the i'th blksize-sized chunk of the input data if the 75fd4f6f2aSNicolas Pitrecorresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set, 76fd4f6f2aSNicolas Pitreotherwise it is the input data directly. 771da177e4SLinus Torvalds(For the last <block> of the file, the input may of course be smaller.) 781da177e4SLinus TorvaldsEach <block> may be a different size. (See <block_pointer> above.) 79fd4f6f2aSNicolas Pitre 801da177e4SLinus Torvalds<block>s are merely byte-aligned, not generally u32-aligned. 811da177e4SLinus Torvalds 82fd4f6f2aSNicolas PitreWhen CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding 83fd4f6f2aSNicolas Pitre<block> may be located anywhere and not necessarily contiguous with 84fd4f6f2aSNicolas Pitrethe previous/next blocks. In that case it is minimally u32-aligned. 85fd4f6f2aSNicolas PitreIf CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always 86fd4f6f2aSNicolas Pitreblksize except for the last block which is limited by the file length. 87fd4f6f2aSNicolas PitreIf CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED 88fd4f6f2aSNicolas Pitreis not set then the first 2 bytes of the block contains the size of the 89fd4f6f2aSNicolas Pitreremaining block data as this cannot be determined from the placement of 90fd4f6f2aSNicolas Pitrelogically adjacent blocks. 91fd4f6f2aSNicolas Pitre 921da177e4SLinus Torvalds 931da177e4SLinus TorvaldsHoles 941da177e4SLinus Torvalds----- 951da177e4SLinus Torvalds 961da177e4SLinus TorvaldsThis kernel supports cramfs holes (i.e. [efficient representation of] 971da177e4SLinus Torvaldsblocks in uncompressed data consisting entirely of NUL bytes), but by 981da177e4SLinus Torvaldsdefault mkcramfs doesn't test for & create holes, since cramfs in 991da177e4SLinus Torvaldskernels up to at least 2.3.39 didn't support holes. Run mkcramfs 1001da177e4SLinus Torvaldswith -z if you want it to create files that can have holes in them. 1011da177e4SLinus Torvalds 1021da177e4SLinus Torvalds 1031da177e4SLinus TorvaldsTools 1041da177e4SLinus Torvalds----- 1051da177e4SLinus Torvalds 1061da177e4SLinus TorvaldsThe cramfs user-space tools, including mkcramfs and cramfsck, are 1071da177e4SLinus Torvaldslocated at <http://sourceforge.net/projects/cramfs/>. 1081da177e4SLinus Torvalds 1091da177e4SLinus Torvalds 1101da177e4SLinus TorvaldsFuture Development 1111da177e4SLinus Torvalds================== 1121da177e4SLinus Torvalds 1131da177e4SLinus TorvaldsBlock Size 1141da177e4SLinus Torvalds---------- 1151da177e4SLinus Torvalds 1161da177e4SLinus Torvalds(Block size in cramfs refers to the size of input data that is 1171da177e4SLinus Torvaldscompressed at a time. It's intended to be somewhere around 118*5aab331aSMatthew Wilcox (Oracle)PAGE_SIZE for cramfs_read_folio's convenience.) 1191da177e4SLinus Torvalds 1201da177e4SLinus TorvaldsThe superblock ought to indicate the block size that the fs was 1211da177e4SLinus Torvaldswritten for, since comments in <linux/pagemap.h> indicate that 122ea1754a0SKirill A. ShutemovPAGE_SIZE may grow in future (if I interpret the comment 1231da177e4SLinus Torvaldscorrectly). 1241da177e4SLinus Torvalds 125ea1754a0SKirill A. ShutemovCurrently, mkcramfs #define's PAGE_SIZE as 4096 and uses that 126ea1754a0SKirill A. Shutemovfor blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in 1271da177e4SLinus Torvaldsturn is defined as PAGE_SIZE (which can be as large as 32KB on arm). 1281da177e4SLinus TorvaldsThis discrepancy is a bug, though it's not clear which should be 1291da177e4SLinus Torvaldschanged. 1301da177e4SLinus Torvalds 131ea1754a0SKirill A. ShutemovOne option is to change mkcramfs to take its PAGE_SIZE from 1321da177e4SLinus Torvalds<asm/page.h>. Personally I don't like this option, but it does 1331da177e4SLinus Torvaldsrequire the least amount of change: just change `#define 134ea1754a0SKirill A. ShutemovPAGE_SIZE (4096)' to `#include <asm/page.h>'. The disadvantage 1351da177e4SLinus Torvaldsis that the generated cramfs cannot always be shared between different 1361da177e4SLinus Torvaldskernels, not even necessarily kernels of the same architecture if 137ea1754a0SKirill A. ShutemovPAGE_SIZE is subject to change between kernel versions 1381da177e4SLinus Torvalds(currently possible with arm and ia64). 1391da177e4SLinus Torvalds 1401da177e4SLinus TorvaldsThe remaining options try to make cramfs more sharable. 1411da177e4SLinus Torvalds 1421da177e4SLinus TorvaldsOne part of that is addressing endianness. The two options here are 1431da177e4SLinus Torvalds`always use little-endian' (like ext2fs) or `writer chooses 1441da177e4SLinus Torvaldsendianness; kernel adapts at runtime'. Little-endian wins because of 1451da177e4SLinus Torvaldscode simplicity and little CPU overhead even on big-endian machines. 1461da177e4SLinus Torvalds 1471da177e4SLinus TorvaldsThe cost of swabbing is changing the code to use the le32_to_cpu 1481da177e4SLinus Torvaldsetc. macros as used by ext2fs. We don't need to swab the compressed 1491da177e4SLinus Torvaldsdata, only the superblock, inodes and block pointers. 1501da177e4SLinus Torvalds 1511da177e4SLinus Torvalds 1521da177e4SLinus TorvaldsThe other part of making cramfs more sharable is choosing a block 1531da177e4SLinus Torvaldssize. The options are: 1541da177e4SLinus Torvalds 1551da177e4SLinus Torvalds 1. Always 4096 bytes. 1561da177e4SLinus Torvalds 1571da177e4SLinus Torvalds 2. Writer chooses blocksize; kernel adapts but rejects blocksize > 158ea1754a0SKirill A. Shutemov PAGE_SIZE. 1591da177e4SLinus Torvalds 1601da177e4SLinus Torvalds 3. Writer chooses blocksize; kernel adapts even to blocksize > 161ea1754a0SKirill A. Shutemov PAGE_SIZE. 1621da177e4SLinus Torvalds 1631da177e4SLinus TorvaldsIt's easy enough to change the kernel to use a smaller value than 164*5aab331aSMatthew Wilcox (Oracle)PAGE_SIZE: just make cramfs_read_folio read multiple blocks. 1651da177e4SLinus Torvalds 166ea1754a0SKirill A. ShutemovThe cost of option 1 is that kernels with a larger PAGE_SIZE 1671da177e4SLinus Torvaldsvalue don't get as good compression as they can. 1681da177e4SLinus Torvalds 1691da177e4SLinus TorvaldsThe cost of option 2 relative to option 1 is that the code uses 1701da177e4SLinus Torvaldsvariables instead of #define'd constants. The gain is that people 171ea1754a0SKirill A. Shutemovwith kernels having larger PAGE_SIZE can make use of that if 1721da177e4SLinus Torvaldsthey don't mind their cramfs being inaccessible to kernels with 173ea1754a0SKirill A. Shutemovsmaller PAGE_SIZE values. 1741da177e4SLinus Torvalds 1751da177e4SLinus TorvaldsOption 3 is easy to implement if we don't mind being CPU-inefficient: 176*5aab331aSMatthew Wilcox (Oracle)e.g. get read_folio to decompress to a buffer of size MAX_BLKSIZE (which 1771da177e4SLinus Torvaldsmust be no larger than 32KB) and discard what it doesn't need. 178*5aab331aSMatthew Wilcox (Oracle)Getting read_folio to read into all the covered pages is harder. 1791da177e4SLinus Torvalds 1801da177e4SLinus TorvaldsThe main advantage of option 3 over 1, 2, is better compression. The 1811da177e4SLinus Torvaldscost is greater complexity. Probably not worth it, but I hope someone 1821da177e4SLinus Torvaldswill disagree. (If it is implemented, then I'll re-use that code in 1831da177e4SLinus Torvaldse2compr.) 1841da177e4SLinus Torvalds 1851da177e4SLinus Torvalds 1861da177e4SLinus TorvaldsAnother cost of 2 and 3 over 1 is making mkcramfs use a different 1871da177e4SLinus Torvaldsblock size, but that just means adding and parsing a -b option. 1881da177e4SLinus Torvalds 1891da177e4SLinus Torvalds 1901da177e4SLinus TorvaldsInode Size 1911da177e4SLinus Torvalds---------- 1921da177e4SLinus Torvalds 1931da177e4SLinus TorvaldsGiven that cramfs will probably be used for CDs etc. as well as just 1941da177e4SLinus Torvaldssilicon ROMs, it might make sense to expand the inode a little from 1951da177e4SLinus Torvaldsits current 12 bytes. Inodes other than the root inode are followed 1961da177e4SLinus Torvaldsby filename, so the expansion doesn't even have to be a multiple of 4 1971da177e4SLinus Torvaldsbytes. 198