1e66d8631SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2e66d8631SMauro Carvalho Chehab
3e66d8631SMauro Carvalho Chehab======================================
4e66d8631SMauro Carvalho ChehabEnhanced Read-Only File System - EROFS
5e66d8631SMauro Carvalho Chehab======================================
6e66d8631SMauro Carvalho Chehab
7e66d8631SMauro Carvalho ChehabOverview
8e66d8631SMauro Carvalho Chehab========
9e66d8631SMauro Carvalho Chehab
10e66d8631SMauro Carvalho ChehabEROFS file-system stands for Enhanced Read-Only File System. Different
11e66d8631SMauro Carvalho Chehabfrom other read-only file systems, it aims to be designed for flexibility,
12e66d8631SMauro Carvalho Chehabscalability, but be kept simple and high performance.
13e66d8631SMauro Carvalho Chehab
14e66d8631SMauro Carvalho ChehabIt is designed as a better filesystem solution for the following scenarios:
15e66d8631SMauro Carvalho Chehab
16e66d8631SMauro Carvalho Chehab - read-only storage media or
17e66d8631SMauro Carvalho Chehab
18e66d8631SMauro Carvalho Chehab - part of a fully trusted read-only solution, which means it needs to be
19e66d8631SMauro Carvalho Chehab   immutable and bit-for-bit identical to the official golden image for
20e66d8631SMauro Carvalho Chehab   their releases due to security and other considerations and
21e66d8631SMauro Carvalho Chehab
22e66d8631SMauro Carvalho Chehab - hope to save some extra storage space with guaranteed end-to-end performance
23e66d8631SMauro Carvalho Chehab   by using reduced metadata and transparent file compression, especially
24e66d8631SMauro Carvalho Chehab   for those embedded devices with limited memory (ex, smartphone);
25e66d8631SMauro Carvalho Chehab
26e66d8631SMauro Carvalho ChehabHere is the main features of EROFS:
27e66d8631SMauro Carvalho Chehab
28e66d8631SMauro Carvalho Chehab - Little endian on-disk design;
29e66d8631SMauro Carvalho Chehab
30e66d8631SMauro Carvalho Chehab - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
31e66d8631SMauro Carvalho Chehab
32e66d8631SMauro Carvalho Chehab - Metadata & data could be mixed by design;
33e66d8631SMauro Carvalho Chehab
34e66d8631SMauro Carvalho Chehab - 2 inode versions for different requirements:
35e66d8631SMauro Carvalho Chehab
36e66d8631SMauro Carvalho Chehab   =====================  ============  =====================================
37e66d8631SMauro Carvalho Chehab                          compact (v1)  extended (v2)
38e66d8631SMauro Carvalho Chehab   =====================  ============  =====================================
39e66d8631SMauro Carvalho Chehab   Inode metadata size    32 bytes      64 bytes
40e66d8631SMauro Carvalho Chehab   Max file size          4 GB          16 EB (also limited by max. vol size)
41e66d8631SMauro Carvalho Chehab   Max uids/gids          65536         4294967296
42e66d8631SMauro Carvalho Chehab   File change time       no            yes (64 + 32-bit timestamp)
43e66d8631SMauro Carvalho Chehab   Max hardlinks          65536         4294967296
44e66d8631SMauro Carvalho Chehab   Metadata reserved      4 bytes       14 bytes
45e66d8631SMauro Carvalho Chehab   =====================  ============  =====================================
46e66d8631SMauro Carvalho Chehab
47e66d8631SMauro Carvalho Chehab - Support extended attributes (xattrs) as an option;
48e66d8631SMauro Carvalho Chehab
49e66d8631SMauro Carvalho Chehab - Support xattr inline and tail-end data inline for all files;
50e66d8631SMauro Carvalho Chehab
51e66d8631SMauro Carvalho Chehab - Support POSIX.1e ACLs by using xattrs;
52e66d8631SMauro Carvalho Chehab
53e66d8631SMauro Carvalho Chehab - Support transparent file compression as an option:
54e66d8631SMauro Carvalho Chehab   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
55e66d8631SMauro Carvalho Chehab
56e66d8631SMauro Carvalho ChehabThe following git tree provides the file system user-space tools under
57e66d8631SMauro Carvalho Chehabdevelopment (ex, formatting tool mkfs.erofs):
58e66d8631SMauro Carvalho Chehab
59e66d8631SMauro Carvalho Chehab- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
60e66d8631SMauro Carvalho Chehab
61e66d8631SMauro Carvalho ChehabBugs and patches are welcome, please kindly help us and send to the following
62e66d8631SMauro Carvalho Chehablinux-erofs mailing list:
63e66d8631SMauro Carvalho Chehab
64e66d8631SMauro Carvalho Chehab- linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
65e66d8631SMauro Carvalho Chehab
66e66d8631SMauro Carvalho ChehabMount options
67e66d8631SMauro Carvalho Chehab=============
68e66d8631SMauro Carvalho Chehab
69e66d8631SMauro Carvalho Chehab===================    =========================================================
70e66d8631SMauro Carvalho Chehab(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
71e66d8631SMauro Carvalho Chehab                       by default if CONFIG_EROFS_FS_XATTR is selected.
72e66d8631SMauro Carvalho Chehab(no)acl                Setup POSIX Access Control List. Note: acl is enabled
73e66d8631SMauro Carvalho Chehab                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
74e66d8631SMauro Carvalho Chehabcache_strategy=%s      Select a strategy for cached decompression from now on:
75e66d8631SMauro Carvalho Chehab
76e66d8631SMauro Carvalho Chehab		       ==========  =============================================
77e66d8631SMauro Carvalho Chehab                         disabled  In-place I/O decompression only;
78e66d8631SMauro Carvalho Chehab                        readahead  Cache the last incomplete compressed physical
79e66d8631SMauro Carvalho Chehab                                   cluster for further reading. It still does
80e66d8631SMauro Carvalho Chehab                                   in-place I/O decompression for the rest
81e66d8631SMauro Carvalho Chehab                                   compressed physical clusters;
82e66d8631SMauro Carvalho Chehab                       readaround  Cache the both ends of incomplete compressed
83e66d8631SMauro Carvalho Chehab                                   physical clusters for further reading.
84e66d8631SMauro Carvalho Chehab                                   It still does in-place I/O decompression
85e66d8631SMauro Carvalho Chehab                                   for the rest compressed physical clusters.
86e66d8631SMauro Carvalho Chehab		       ==========  =============================================
87e66d8631SMauro Carvalho Chehab===================    =========================================================
88e66d8631SMauro Carvalho Chehab
89e66d8631SMauro Carvalho ChehabOn-disk details
90e66d8631SMauro Carvalho Chehab===============
91e66d8631SMauro Carvalho Chehab
92e66d8631SMauro Carvalho ChehabSummary
93e66d8631SMauro Carvalho Chehab-------
94e66d8631SMauro Carvalho ChehabDifferent from other read-only file systems, an EROFS volume is designed
95e66d8631SMauro Carvalho Chehabto be as simple as possible::
96e66d8631SMauro Carvalho Chehab
97e66d8631SMauro Carvalho Chehab                                |-> aligned with the block size
98e66d8631SMauro Carvalho Chehab   ____________________________________________________________
99e66d8631SMauro Carvalho Chehab  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
100e66d8631SMauro Carvalho Chehab  |_|__|_|_____|__________|_____|______|__________|_____|______|
101e66d8631SMauro Carvalho Chehab  0 +1K
102e66d8631SMauro Carvalho Chehab
103e66d8631SMauro Carvalho ChehabAll data areas should be aligned with the block size, but metadata areas
104e66d8631SMauro Carvalho Chehabmay not. All metadatas can be now observed in two different spaces (views):
105e66d8631SMauro Carvalho Chehab
106e66d8631SMauro Carvalho Chehab 1. Inode metadata space
107e66d8631SMauro Carvalho Chehab
108e66d8631SMauro Carvalho Chehab    Each valid inode should be aligned with an inode slot, which is a fixed
109e66d8631SMauro Carvalho Chehab    value (32 bytes) and designed to be kept in line with compact inode size.
110e66d8631SMauro Carvalho Chehab
111e66d8631SMauro Carvalho Chehab    Each inode can be directly found with the following formula:
112e66d8631SMauro Carvalho Chehab         inode offset = meta_blkaddr * block_size + 32 * nid
113e66d8631SMauro Carvalho Chehab
114e66d8631SMauro Carvalho Chehab    ::
115e66d8631SMauro Carvalho Chehab
116e66d8631SMauro Carvalho Chehab				    |-> aligned with 8B
117e66d8631SMauro Carvalho Chehab					    |-> followed closely
118e66d8631SMauro Carvalho Chehab	+ meta_blkaddr blocks                                      |-> another slot
119e66d8631SMauro Carvalho Chehab	_____________________________________________________________________
120e66d8631SMauro Carvalho Chehab	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
121e66d8631SMauro Carvalho Chehab	|________|_______|(optional)|(optional)|__(optional)_|_____|__________
122e66d8631SMauro Carvalho Chehab		|-> aligned with the inode slot size
123e66d8631SMauro Carvalho Chehab		    .                   .
124e66d8631SMauro Carvalho Chehab		    .                         .
125e66d8631SMauro Carvalho Chehab		.                              .
126e66d8631SMauro Carvalho Chehab		.                                    .
127e66d8631SMauro Carvalho Chehab	    .                                         .
128e66d8631SMauro Carvalho Chehab	    .                                              .
129e66d8631SMauro Carvalho Chehab	.____________________________________________________|-> aligned with 4B
130e66d8631SMauro Carvalho Chehab	| xattr_ibody_header | shared xattrs | inline xattrs |
131e66d8631SMauro Carvalho Chehab	|____________________|_______________|_______________|
132e66d8631SMauro Carvalho Chehab	|->    12 bytes    <-|->x * 4 bytes<-|               .
133e66d8631SMauro Carvalho Chehab			    .                .                 .
134e66d8631SMauro Carvalho Chehab			.                      .                   .
135e66d8631SMauro Carvalho Chehab		.                           .                     .
136e66d8631SMauro Carvalho Chehab	    ._______________________________.______________________.
137e66d8631SMauro Carvalho Chehab	    | id | id | id | id |  ... | id | ent | ... | ent| ... |
138e66d8631SMauro Carvalho Chehab	    |____|____|____|____|______|____|_____|_____|____|_____|
139e66d8631SMauro Carvalho Chehab					    |-> aligned with 4B
140e66d8631SMauro Carvalho Chehab							|-> aligned with 4B
141e66d8631SMauro Carvalho Chehab
142e66d8631SMauro Carvalho Chehab    Inode could be 32 or 64 bytes, which can be distinguished from a common
143e66d8631SMauro Carvalho Chehab    field which all inode versions have -- i_format::
144e66d8631SMauro Carvalho Chehab
145e66d8631SMauro Carvalho Chehab        __________________               __________________
146e66d8631SMauro Carvalho Chehab       |     i_format     |             |     i_format     |
147e66d8631SMauro Carvalho Chehab       |__________________|             |__________________|
148e66d8631SMauro Carvalho Chehab       |        ...       |             |        ...       |
149e66d8631SMauro Carvalho Chehab       |                  |             |                  |
150e66d8631SMauro Carvalho Chehab       |__________________| 32 bytes    |                  |
151e66d8631SMauro Carvalho Chehab                                        |                  |
152e66d8631SMauro Carvalho Chehab                                        |__________________| 64 bytes
153e66d8631SMauro Carvalho Chehab
154e66d8631SMauro Carvalho Chehab    Xattrs, extents, data inline are followed by the corresponding inode with
155e66d8631SMauro Carvalho Chehab    proper alignment, and they could be optional for different data mappings.
156e66d8631SMauro Carvalho Chehab    _currently_ total 4 valid data mappings are supported:
157e66d8631SMauro Carvalho Chehab
158e66d8631SMauro Carvalho Chehab    ==  ====================================================================
159e66d8631SMauro Carvalho Chehab     0  flat file data without data inline (no extent);
160e66d8631SMauro Carvalho Chehab     1  fixed-sized output data compression (with non-compacted indexes);
161e66d8631SMauro Carvalho Chehab     2  flat file data with tail packing data inline (no extent);
162e66d8631SMauro Carvalho Chehab     3  fixed-sized output data compression (with compacted indexes, v5.3+).
163e66d8631SMauro Carvalho Chehab    ==  ====================================================================
164e66d8631SMauro Carvalho Chehab
165e66d8631SMauro Carvalho Chehab    The size of the optional xattrs is indicated by i_xattr_count in inode
166e66d8631SMauro Carvalho Chehab    header. Large xattrs or xattrs shared by many different files can be
167e66d8631SMauro Carvalho Chehab    stored in shared xattrs metadata rather than inlined right after inode.
168e66d8631SMauro Carvalho Chehab
169e66d8631SMauro Carvalho Chehab 2. Shared xattrs metadata space
170e66d8631SMauro Carvalho Chehab
171e66d8631SMauro Carvalho Chehab    Shared xattrs space is similar to the above inode space, started with
172e66d8631SMauro Carvalho Chehab    a specific block indicated by xattr_blkaddr, organized one by one with
173e66d8631SMauro Carvalho Chehab    proper align.
174e66d8631SMauro Carvalho Chehab
175e66d8631SMauro Carvalho Chehab    Each share xattr can also be directly found by the following formula:
176e66d8631SMauro Carvalho Chehab         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
177e66d8631SMauro Carvalho Chehab
178e66d8631SMauro Carvalho Chehab    ::
179e66d8631SMauro Carvalho Chehab
180e66d8631SMauro Carvalho Chehab			    |-> aligned by  4 bytes
181e66d8631SMauro Carvalho Chehab	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes
182e66d8631SMauro Carvalho Chehab	_________________________________________________________________________
183e66d8631SMauro Carvalho Chehab	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
184e66d8631SMauro Carvalho Chehab	|________|_____________|_____________|_____|______________|_______________
185e66d8631SMauro Carvalho Chehab
186e66d8631SMauro Carvalho ChehabDirectories
187e66d8631SMauro Carvalho Chehab-----------
188e66d8631SMauro Carvalho ChehabAll directories are now organized in a compact on-disk format. Note that
189e66d8631SMauro Carvalho Chehabeach directory block is divided into index and name areas in order to support
190e66d8631SMauro Carvalho Chehabrandom file lookup, and all directory entries are _strictly_ recorded in
191e66d8631SMauro Carvalho Chehabalphabetical order in order to support improved prefix binary search
192e66d8631SMauro Carvalho Chehabalgorithm (could refer to the related source code).
193e66d8631SMauro Carvalho Chehab
194e66d8631SMauro Carvalho Chehab::
195e66d8631SMauro Carvalho Chehab
196e66d8631SMauro Carvalho Chehab		    ___________________________
197e66d8631SMauro Carvalho Chehab		    /                           |
198e66d8631SMauro Carvalho Chehab		/              ______________|________________
199e66d8631SMauro Carvalho Chehab		/              /              | nameoff1       | nameoffN-1
200e66d8631SMauro Carvalho Chehab    ____________.______________._______________v________________v__________
201e66d8631SMauro Carvalho Chehab    | dirent | dirent | ... | dirent | filename | filename | ... | filename |
202e66d8631SMauro Carvalho Chehab    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
203e66d8631SMauro Carvalho Chehab	\                           ^
204e66d8631SMauro Carvalho Chehab	\                          |                           * could have
205e66d8631SMauro Carvalho Chehab	\                         |                             trailing '\0'
206e66d8631SMauro Carvalho Chehab	    \________________________| nameoff0
207e66d8631SMauro Carvalho Chehab
208e66d8631SMauro Carvalho Chehab				Directory block
209e66d8631SMauro Carvalho Chehab
210e66d8631SMauro Carvalho ChehabNote that apart from the offset of the first filename, nameoff0 also indicates
211e66d8631SMauro Carvalho Chehabthe total number of directory entries in this block since it is no need to
212e66d8631SMauro Carvalho Chehabintroduce another on-disk field at all.
213e66d8631SMauro Carvalho Chehab
214e66d8631SMauro Carvalho ChehabCompression
215e66d8631SMauro Carvalho Chehab-----------
216e66d8631SMauro Carvalho ChehabCurrently, EROFS supports 4KB fixed-sized output transparent file compression,
217e66d8631SMauro Carvalho Chehabas illustrated below::
218e66d8631SMauro Carvalho Chehab
219e66d8631SMauro Carvalho Chehab	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
220e66d8631SMauro Carvalho Chehab	    clusterofs                      clusterofs            clusterofs
221e66d8631SMauro Carvalho Chehab	    |                               |                     |   logical data
222e66d8631SMauro Carvalho Chehab    _________v_______________________________v_____________________v_______________
223e66d8631SMauro Carvalho Chehab    ... |    .        |             |        .    |             |  .          | ...
224e66d8631SMauro Carvalho Chehab    ____|____.________|_____________|________.____|_____________|__.__________|____
225e66d8631SMauro Carvalho Chehab	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
226e66d8631SMauro Carvalho Chehab	    size          size          size          size          size
227e66d8631SMauro Carvalho Chehab	    .                             .                .                   .
228e66d8631SMauro Carvalho Chehab	    .                       .               .                  .
229e66d8631SMauro Carvalho Chehab		.                  .              .                .
230e66d8631SMauro Carvalho Chehab	_______._____________._____________._____________._____________________
231e66d8631SMauro Carvalho Chehab	    ... |             |             |             | ... physical data
232e66d8631SMauro Carvalho Chehab	_______|_____________|_____________|_____________|_____________________
233e66d8631SMauro Carvalho Chehab		|-> cluster <-|-> cluster <-|-> cluster <-|
234e66d8631SMauro Carvalho Chehab		    size          size          size
235e66d8631SMauro Carvalho Chehab
236e66d8631SMauro Carvalho ChehabCurrently each on-disk physical cluster can contain 4KB (un)compressed data
237e66d8631SMauro Carvalho Chehabat most. For each logical cluster, there is a corresponding on-disk index to
238e66d8631SMauro Carvalho Chehabdescribe its cluster type, physical cluster address, etc.
239e66d8631SMauro Carvalho Chehab
240e66d8631SMauro Carvalho ChehabSee "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
241