1.. SPDX-License-Identifier: GPL-2.0
2
3Directory Entries
4-----------------
5
6In an ext4 filesystem, a directory is more or less a flat file that maps
7an arbitrary byte string (usually ASCII) to an inode number on the
8filesystem. There can be many directory entries across the filesystem
9that reference the same inode number--these are known as hard links, and
10that is why hard links cannot reference files on other filesystems. As
11such, directory entries are found by reading the data block(s)
12associated with a directory file for the particular directory entry that
13is desired.
14
15Linear (Classic) Directories
16~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
18By default, each directory lists its entries in an “almost-linear”
19array. I write “almost” because it's not a linear array in the memory
20sense because directory entries are not split across filesystem blocks.
21Therefore, it is more accurate to say that a directory is a series of
22data blocks and that each block contains a linear array of directory
23entries. The end of each per-block array is signified by reaching the
24end of the block; the last entry in the block has a record length that
25takes it all the way to the end of the block. The end of the entire
26directory is of course signified by reaching the end of the file. Unused
27directory entries are signified by inode = 0. By default the filesystem
28uses ``struct ext4_dir_entry_2`` for directory entries unless the
29“filetype” feature flag is not set, in which case it uses
30``struct ext4_dir_entry``.
31
32The original directory entry format is ``struct ext4_dir_entry``, which
33is at most 263 bytes long, though on disk you'll need to reference
34``dirent.rec_len`` to know for sure.
35
36.. list-table::
37   :widths: 8 8 24 40
38   :header-rows: 1
39
40   * - Offset
41     - Size
42     - Name
43     - Description
44   * - 0x0
45     - __le32
46     - inode
47     - Number of the inode that this directory entry points to.
48   * - 0x4
49     - __le16
50     - rec_len
51     - Length of this directory entry. Must be a multiple of 4.
52   * - 0x6
53     - __le16
54     - name_len
55     - Length of the file name.
56   * - 0x8
57     - char
58     - name[EXT4_NAME_LEN]
59     - File name.
60
61Since file names cannot be longer than 255 bytes, the new directory
62entry format shortens the name_len field and uses the space for a file
63type flag, probably to avoid having to load every inode during directory
64tree traversal. This format is ``ext4_dir_entry_2``, which is at most
65263 bytes long, though on disk you'll need to reference
66``dirent.rec_len`` to know for sure.
67
68.. list-table::
69   :widths: 8 8 24 40
70   :header-rows: 1
71
72   * - Offset
73     - Size
74     - Name
75     - Description
76   * - 0x0
77     - __le32
78     - inode
79     - Number of the inode that this directory entry points to.
80   * - 0x4
81     - __le16
82     - rec_len
83     - Length of this directory entry.
84   * - 0x6
85     - __u8
86     - name_len
87     - Length of the file name.
88   * - 0x7
89     - __u8
90     - file_type
91     - File type code, see ftype_ table below.
92   * - 0x8
93     - char
94     - name[EXT4_NAME_LEN]
95     - File name.
96
97.. _ftype:
98
99The directory file type is one of the following values:
100
101.. list-table::
102   :widths: 16 64
103   :header-rows: 1
104
105   * - Value
106     - Description
107   * - 0x0
108     - Unknown.
109   * - 0x1
110     - Regular file.
111   * - 0x2
112     - Directory.
113   * - 0x3
114     - Character device file.
115   * - 0x4
116     - Block device file.
117   * - 0x5
118     - FIFO.
119   * - 0x6
120     - Socket.
121   * - 0x7
122     - Symbolic link.
123
124To support directories that are both encrypted and casefolded directories, we
125must also include hash information in the directory entry. We append
126``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries
127for dot and dotdot, which are kept the same. The structure follows immediately
128after ``name`` and is included in the size listed by ``rec_len`` If a directory
129entry uses this extension, it may be up to 271 bytes.
130
131.. list-table::
132   :widths: 8 8 24 40
133   :header-rows: 1
134
135   * - Offset
136     - Size
137     - Name
138     - Description
139   * - 0x0
140     - __le32
141     - hash
142     - The hash of the directory name
143   * - 0x4
144     - __le32
145     - minor_hash
146     - The minor hash of the directory name
147
148
149In order to add checksums to these classic directory blocks, a phony
150``struct ext4_dir_entry`` is placed at the end of each leaf block to
151hold the checksum. The directory entry is 12 bytes long. The inode
152number and name_len fields are set to zero to fool old software into
153ignoring an apparently empty directory entry, and the checksum is stored
154in the place where the name normally goes. The structure is
155``struct ext4_dir_entry_tail``:
156
157.. list-table::
158   :widths: 8 8 24 40
159   :header-rows: 1
160
161   * - Offset
162     - Size
163     - Name
164     - Description
165   * - 0x0
166     - __le32
167     - det_reserved_zero1
168     - Inode number, which must be zero.
169   * - 0x4
170     - __le16
171     - det_rec_len
172     - Length of this directory entry, which must be 12.
173   * - 0x6
174     - __u8
175     - det_reserved_zero2
176     - Length of the file name, which must be zero.
177   * - 0x7
178     - __u8
179     - det_reserved_ft
180     - File type, which must be 0xDE.
181   * - 0x8
182     - __le32
183     - det_checksum
184     - Directory leaf block checksum.
185
186The leaf directory block checksum is calculated against the FS UUID, the
187directory's inode number, the directory's inode generation number, and
188the entire directory entry block up to (but not including) the fake
189directory entry.
190
191Hash Tree Directories
192~~~~~~~~~~~~~~~~~~~~~
193
194A linear array of directory entries isn't great for performance, so a
195new feature was added to ext3 to provide a faster (but peculiar)
196balanced tree keyed off a hash of the directory entry name. If the
197EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a
198hashed btree (htree) to organize and find directory entries. For
199backwards read-only compatibility with ext2, this tree is actually
200hidden inside the directory file, masquerading as “empty” directory data
201blocks! It was stated previously that the end of the linear directory
202entry table was signified with an entry pointing to inode 0; this is
203(ab)used to fool the old linear-scan algorithm into thinking that the
204rest of the directory block is empty so that it moves on.
205
206The root of the tree always lives in the first data block of the
207directory. By ext2 custom, the '.' and '..' entries must appear at the
208beginning of this first block, so they are put here as two
209``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of
210the root node contains metadata about the tree and finally a hash->block
211map to find nodes that are lower in the htree. If
212``dx_root.info.indirect_levels`` is non-zero then the htree has two
213levels; the data block pointed to by the root node's map is an interior
214node, which is indexed by a minor hash. Interior nodes in this tree
215contains a zeroed out ``struct ext4_dir_entry_2`` followed by a
216minor_hash->block map to find leafe nodes. Leaf nodes contain a linear
217array of all ``struct ext4_dir_entry_2``; all of these entries
218(presumably) hash to the same value. If there is an overflow, the
219entries simply overflow into the next leaf node, and the
220least-significant bit of the hash (in the interior node map) that gets
221us to this next leaf node is set.
222
223To traverse the directory as a htree, the code calculates the hash of
224the desired file name and uses it to find the corresponding block
225number. If the tree is flat, the block is a linear array of directory
226entries that can be searched; otherwise, the minor hash of the file name
227is computed and used against this second block to find the corresponding
228third block number. That third block number will be a linear array of
229directory entries.
230
231To traverse the directory as a linear array (such as the old code does),
232the code simply reads every data block in the directory. The blocks used
233for the htree will appear to have no entries (aside from '.' and '..')
234and so only the leaf nodes will appear to have any interesting content.
235
236The root of the htree is in ``struct dx_root``, which is the full length
237of a data block:
238
239.. list-table::
240   :widths: 8 8 24 40
241   :header-rows: 1
242
243   * - Offset
244     - Type
245     - Name
246     - Description
247   * - 0x0
248     - __le32
249     - dot.inode
250     - inode number of this directory.
251   * - 0x4
252     - __le16
253     - dot.rec_len
254     - Length of this record, 12.
255   * - 0x6
256     - u8
257     - dot.name_len
258     - Length of the name, 1.
259   * - 0x7
260     - u8
261     - dot.file_type
262     - File type of this entry, 0x2 (directory) (if the feature flag is set).
263   * - 0x8
264     - char
265     - dot.name[4]
266     - “.\0\0\0”
267   * - 0xC
268     - __le32
269     - dotdot.inode
270     - inode number of parent directory.
271   * - 0x10
272     - __le16
273     - dotdot.rec_len
274     - block_size - 12. The record length is long enough to cover all htree
275       data.
276   * - 0x12
277     - u8
278     - dotdot.name_len
279     - Length of the name, 2.
280   * - 0x13
281     - u8
282     - dotdot.file_type
283     - File type of this entry, 0x2 (directory) (if the feature flag is set).
284   * - 0x14
285     - char
286     - dotdot_name[4]
287     - “..\0\0”
288   * - 0x18
289     - __le32
290     - struct dx_root_info.reserved_zero
291     - Zero.
292   * - 0x1C
293     - u8
294     - struct dx_root_info.hash_version
295     - Hash type, see dirhash_ table below.
296   * - 0x1D
297     - u8
298     - struct dx_root_info.info_length
299     - Length of the tree information, 0x8.
300   * - 0x1E
301     - u8
302     - struct dx_root_info.indirect_levels
303     - Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR
304       feature is set; cannot be larger than 2 otherwise.
305   * - 0x1F
306     - u8
307     - struct dx_root_info.unused_flags
308     -
309   * - 0x20
310     - __le16
311     - limit
312     - Maximum number of dx_entries that can follow this header, plus 1 for
313       the header itself.
314   * - 0x22
315     - __le16
316     - count
317     - Actual number of dx_entries that follow this header, plus 1 for the
318       header itself.
319   * - 0x24
320     - __le32
321     - block
322     - The block number (within the directory file) that goes with hash=0.
323   * - 0x28
324     - struct dx_entry
325     - entries[0]
326     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
327
328.. _dirhash:
329
330The directory hash is one of the following values:
331
332.. list-table::
333   :widths: 16 64
334   :header-rows: 1
335
336   * - Value
337     - Description
338   * - 0x0
339     - Legacy.
340   * - 0x1
341     - Half MD4.
342   * - 0x2
343     - Tea.
344   * - 0x3
345     - Legacy, unsigned.
346   * - 0x4
347     - Half MD4, unsigned.
348   * - 0x5
349     - Tea, unsigned.
350   * - 0x6
351     - Siphash.
352
353Interior nodes of an htree are recorded as ``struct dx_node``, which is
354also the full length of a data block:
355
356.. list-table::
357   :widths: 8 8 24 40
358   :header-rows: 1
359
360   * - Offset
361     - Type
362     - Name
363     - Description
364   * - 0x0
365     - __le32
366     - fake.inode
367     - Zero, to make it look like this entry is not in use.
368   * - 0x4
369     - __le16
370     - fake.rec_len
371     - The size of the block, in order to hide all of the dx_node data.
372   * - 0x6
373     - u8
374     - name_len
375     - Zero. There is no name for this “unused” directory entry.
376   * - 0x7
377     - u8
378     - file_type
379     - Zero. There is no file type for this “unused” directory entry.
380   * - 0x8
381     - __le16
382     - limit
383     - Maximum number of dx_entries that can follow this header, plus 1 for
384       the header itself.
385   * - 0xA
386     - __le16
387     - count
388     - Actual number of dx_entries that follow this header, plus 1 for the
389       header itself.
390   * - 0xE
391     - __le32
392     - block
393     - The block number (within the directory file) that goes with the lowest
394       hash value of this block. This value is stored in the parent block.
395   * - 0x12
396     - struct dx_entry
397     - entries[0]
398     - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block.
399
400The hash maps that exist in both ``struct dx_root`` and
401``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes
402long:
403
404.. list-table::
405   :widths: 8 8 24 40
406   :header-rows: 1
407
408   * - Offset
409     - Type
410     - Name
411     - Description
412   * - 0x0
413     - __le32
414     - hash
415     - Hash code.
416   * - 0x4
417     - __le32
418     - block
419     - Block number (within the directory file, not filesystem blocks) of the
420       next node in the htree.
421
422(If you think this is all quite clever and peculiar, so does the
423author.)
424
425If metadata checksums are enabled, the last 8 bytes of the directory
426block (precisely the length of one dx_entry) are used to store a
427``struct dx_tail``, which contains the checksum. The ``limit`` and
428``count`` entries in the dx_root/dx_node structures are adjusted as
429necessary to fit the dx_tail into the block. If there is no space for
430the dx_tail, the user is notified to run e2fsck -D to rebuild the
431directory index (which will ensure that there's space for the checksum.
432The dx_tail structure is 8 bytes long and looks like this:
433
434.. list-table::
435   :widths: 8 8 24 40
436   :header-rows: 1
437
438   * - Offset
439     - Type
440     - Name
441     - Description
442   * - 0x0
443     - u32
444     - dt_reserved
445     - Zero.
446   * - 0x4
447     - __le32
448     - dt_checksum
449     - Checksum of the htree directory block.
450
451The checksum is calculated against the FS UUID, the htree index header
452(dx_root or dx_node), all of the htree indices (dx_entry) that are in
453use, and the tail block (dx_tail).
454