1.. SPDX-License-Identifier: GPL-2.0 2 3Directory Entries 4----------------- 5 6In an ext4 filesystem, a directory is more or less a flat file that maps 7an arbitrary byte string (usually ASCII) to an inode number on the 8filesystem. There can be many directory entries across the filesystem 9that reference the same inode number--these are known as hard links, and 10that is why hard links cannot reference files on other filesystems. As 11such, directory entries are found by reading the data block(s) 12associated with a directory file for the particular directory entry that 13is desired. 14 15Linear (Classic) Directories 16~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17 18By default, each directory lists its entries in an “almost-linear” 19array. I write “almost” because it's not a linear array in the memory 20sense because directory entries are not split across filesystem blocks. 21Therefore, it is more accurate to say that a directory is a series of 22data blocks and that each block contains a linear array of directory 23entries. The end of each per-block array is signified by reaching the 24end of the block; the last entry in the block has a record length that 25takes it all the way to the end of the block. The end of the entire 26directory is of course signified by reaching the end of the file. Unused 27directory entries are signified by inode = 0. By default the filesystem 28uses ``struct ext4_dir_entry_2`` for directory entries unless the 29“filetype” feature flag is not set, in which case it uses 30``struct ext4_dir_entry``. 31 32The original directory entry format is ``struct ext4_dir_entry``, which 33is at most 263 bytes long, though on disk you'll need to reference 34``dirent.rec_len`` to know for sure. 35 36.. list-table:: 37 :widths: 8 8 24 40 38 :header-rows: 1 39 40 * - Offset 41 - Size 42 - Name 43 - Description 44 * - 0x0 45 - \_\_le32 46 - inode 47 - Number of the inode that this directory entry points to. 48 * - 0x4 49 - \_\_le16 50 - rec\_len 51 - Length of this directory entry. Must be a multiple of 4. 52 * - 0x6 53 - \_\_le16 54 - name\_len 55 - Length of the file name. 56 * - 0x8 57 - char 58 - name[EXT4\_NAME\_LEN] 59 - File name. 60 61Since file names cannot be longer than 255 bytes, the new directory 62entry format shortens the name\_len field and uses the space for a file 63type flag, probably to avoid having to load every inode during directory 64tree traversal. This format is ``ext4_dir_entry_2``, which is at most 65263 bytes long, though on disk you'll need to reference 66``dirent.rec_len`` to know for sure. 67 68.. list-table:: 69 :widths: 8 8 24 40 70 :header-rows: 1 71 72 * - Offset 73 - Size 74 - Name 75 - Description 76 * - 0x0 77 - \_\_le32 78 - inode 79 - Number of the inode that this directory entry points to. 80 * - 0x4 81 - \_\_le16 82 - rec\_len 83 - Length of this directory entry. 84 * - 0x6 85 - \_\_u8 86 - name\_len 87 - Length of the file name. 88 * - 0x7 89 - \_\_u8 90 - file\_type 91 - File type code, see ftype_ table below. 92 * - 0x8 93 - char 94 - name[EXT4\_NAME\_LEN] 95 - File name. 96 97.. _ftype: 98 99The directory file type is one of the following values: 100 101.. list-table:: 102 :widths: 16 64 103 :header-rows: 1 104 105 * - Value 106 - Description 107 * - 0x0 108 - Unknown. 109 * - 0x1 110 - Regular file. 111 * - 0x2 112 - Directory. 113 * - 0x3 114 - Character device file. 115 * - 0x4 116 - Block device file. 117 * - 0x5 118 - FIFO. 119 * - 0x6 120 - Socket. 121 * - 0x7 122 - Symbolic link. 123 124In order to add checksums to these classic directory blocks, a phony 125``struct ext4_dir_entry`` is placed at the end of each leaf block to 126hold the checksum. The directory entry is 12 bytes long. The inode 127number and name\_len fields are set to zero to fool old software into 128ignoring an apparently empty directory entry, and the checksum is stored 129in the place where the name normally goes. The structure is 130``struct ext4_dir_entry_tail``: 131 132.. list-table:: 133 :widths: 8 8 24 40 134 :header-rows: 1 135 136 * - Offset 137 - Size 138 - Name 139 - Description 140 * - 0x0 141 - \_\_le32 142 - det\_reserved\_zero1 143 - Inode number, which must be zero. 144 * - 0x4 145 - \_\_le16 146 - det\_rec\_len 147 - Length of this directory entry, which must be 12. 148 * - 0x6 149 - \_\_u8 150 - det\_reserved\_zero2 151 - Length of the file name, which must be zero. 152 * - 0x7 153 - \_\_u8 154 - det\_reserved\_ft 155 - File type, which must be 0xDE. 156 * - 0x8 157 - \_\_le32 158 - det\_checksum 159 - Directory leaf block checksum. 160 161The leaf directory block checksum is calculated against the FS UUID, the 162directory's inode number, the directory's inode generation number, and 163the entire directory entry block up to (but not including) the fake 164directory entry. 165 166Hash Tree Directories 167~~~~~~~~~~~~~~~~~~~~~ 168 169A linear array of directory entries isn't great for performance, so a 170new feature was added to ext3 to provide a faster (but peculiar) 171balanced tree keyed off a hash of the directory entry name. If the 172EXT4\_INDEX\_FL (0x1000) flag is set in the inode, this directory uses a 173hashed btree (htree) to organize and find directory entries. For 174backwards read-only compatibility with ext2, this tree is actually 175hidden inside the directory file, masquerading as “empty” directory data 176blocks! It was stated previously that the end of the linear directory 177entry table was signified with an entry pointing to inode 0; this is 178(ab)used to fool the old linear-scan algorithm into thinking that the 179rest of the directory block is empty so that it moves on. 180 181The root of the tree always lives in the first data block of the 182directory. By ext2 custom, the '.' and '..' entries must appear at the 183beginning of this first block, so they are put here as two 184``struct ext4_dir_entry_2``\ s and not stored in the tree. The rest of 185the root node contains metadata about the tree and finally a hash->block 186map to find nodes that are lower in the htree. If 187``dx_root.info.indirect_levels`` is non-zero then the htree has two 188levels; the data block pointed to by the root node's map is an interior 189node, which is indexed by a minor hash. Interior nodes in this tree 190contains a zeroed out ``struct ext4_dir_entry_2`` followed by a 191minor\_hash->block map to find leafe nodes. Leaf nodes contain a linear 192array of all ``struct ext4_dir_entry_2``; all of these entries 193(presumably) hash to the same value. If there is an overflow, the 194entries simply overflow into the next leaf node, and the 195least-significant bit of the hash (in the interior node map) that gets 196us to this next leaf node is set. 197 198To traverse the directory as a htree, the code calculates the hash of 199the desired file name and uses it to find the corresponding block 200number. If the tree is flat, the block is a linear array of directory 201entries that can be searched; otherwise, the minor hash of the file name 202is computed and used against this second block to find the corresponding 203third block number. That third block number will be a linear array of 204directory entries. 205 206To traverse the directory as a linear array (such as the old code does), 207the code simply reads every data block in the directory. The blocks used 208for the htree will appear to have no entries (aside from '.' and '..') 209and so only the leaf nodes will appear to have any interesting content. 210 211The root of the htree is in ``struct dx_root``, which is the full length 212of a data block: 213 214.. list-table:: 215 :widths: 8 8 24 40 216 :header-rows: 1 217 218 * - Offset 219 - Type 220 - Name 221 - Description 222 * - 0x0 223 - \_\_le32 224 - dot.inode 225 - inode number of this directory. 226 * - 0x4 227 - \_\_le16 228 - dot.rec\_len 229 - Length of this record, 12. 230 * - 0x6 231 - u8 232 - dot.name\_len 233 - Length of the name, 1. 234 * - 0x7 235 - u8 236 - dot.file\_type 237 - File type of this entry, 0x2 (directory) (if the feature flag is set). 238 * - 0x8 239 - char 240 - dot.name[4] 241 - “.\\0\\0\\0” 242 * - 0xC 243 - \_\_le32 244 - dotdot.inode 245 - inode number of parent directory. 246 * - 0x10 247 - \_\_le16 248 - dotdot.rec\_len 249 - block\_size - 12. The record length is long enough to cover all htree 250 data. 251 * - 0x12 252 - u8 253 - dotdot.name\_len 254 - Length of the name, 2. 255 * - 0x13 256 - u8 257 - dotdot.file\_type 258 - File type of this entry, 0x2 (directory) (if the feature flag is set). 259 * - 0x14 260 - char 261 - dotdot\_name[4] 262 - “..\\0\\0” 263 * - 0x18 264 - \_\_le32 265 - struct dx\_root\_info.reserved\_zero 266 - Zero. 267 * - 0x1C 268 - u8 269 - struct dx\_root\_info.hash\_version 270 - Hash type, see dirhash_ table below. 271 * - 0x1D 272 - u8 273 - struct dx\_root\_info.info\_length 274 - Length of the tree information, 0x8. 275 * - 0x1E 276 - u8 277 - struct dx\_root\_info.indirect\_levels 278 - Depth of the htree. Cannot be larger than 3 if the INCOMPAT\_LARGEDIR 279 feature is set; cannot be larger than 2 otherwise. 280 * - 0x1F 281 - u8 282 - struct dx\_root\_info.unused\_flags 283 - 284 * - 0x20 285 - \_\_le16 286 - limit 287 - Maximum number of dx\_entries that can follow this header, plus 1 for 288 the header itself. 289 * - 0x22 290 - \_\_le16 291 - count 292 - Actual number of dx\_entries that follow this header, plus 1 for the 293 header itself. 294 * - 0x24 295 - \_\_le32 296 - block 297 - The block number (within the directory file) that goes with hash=0. 298 * - 0x28 299 - struct dx\_entry 300 - entries[0] 301 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 302 303.. _dirhash: 304 305The directory hash is one of the following values: 306 307.. list-table:: 308 :widths: 16 64 309 :header-rows: 1 310 311 * - Value 312 - Description 313 * - 0x0 314 - Legacy. 315 * - 0x1 316 - Half MD4. 317 * - 0x2 318 - Tea. 319 * - 0x3 320 - Legacy, unsigned. 321 * - 0x4 322 - Half MD4, unsigned. 323 * - 0x5 324 - Tea, unsigned. 325 326Interior nodes of an htree are recorded as ``struct dx_node``, which is 327also the full length of a data block: 328 329.. list-table:: 330 :widths: 8 8 24 40 331 :header-rows: 1 332 333 * - Offset 334 - Type 335 - Name 336 - Description 337 * - 0x0 338 - \_\_le32 339 - fake.inode 340 - Zero, to make it look like this entry is not in use. 341 * - 0x4 342 - \_\_le16 343 - fake.rec\_len 344 - The size of the block, in order to hide all of the dx\_node data. 345 * - 0x6 346 - u8 347 - name\_len 348 - Zero. There is no name for this “unused” directory entry. 349 * - 0x7 350 - u8 351 - file\_type 352 - Zero. There is no file type for this “unused” directory entry. 353 * - 0x8 354 - \_\_le16 355 - limit 356 - Maximum number of dx\_entries that can follow this header, plus 1 for 357 the header itself. 358 * - 0xA 359 - \_\_le16 360 - count 361 - Actual number of dx\_entries that follow this header, plus 1 for the 362 header itself. 363 * - 0xE 364 - \_\_le32 365 - block 366 - The block number (within the directory file) that goes with the lowest 367 hash value of this block. This value is stored in the parent block. 368 * - 0x12 369 - struct dx\_entry 370 - entries[0] 371 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 372 373The hash maps that exist in both ``struct dx_root`` and 374``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes 375long: 376 377.. list-table:: 378 :widths: 8 8 24 40 379 :header-rows: 1 380 381 * - Offset 382 - Type 383 - Name 384 - Description 385 * - 0x0 386 - \_\_le32 387 - hash 388 - Hash code. 389 * - 0x4 390 - \_\_le32 391 - block 392 - Block number (within the directory file, not filesystem blocks) of the 393 next node in the htree. 394 395(If you think this is all quite clever and peculiar, so does the 396author.) 397 398If metadata checksums are enabled, the last 8 bytes of the directory 399block (precisely the length of one dx\_entry) are used to store a 400``struct dx_tail``, which contains the checksum. The ``limit`` and 401``count`` entries in the dx\_root/dx\_node structures are adjusted as 402necessary to fit the dx\_tail into the block. If there is no space for 403the dx\_tail, the user is notified to run e2fsck -D to rebuild the 404directory index (which will ensure that there's space for the checksum. 405The dx\_tail structure is 8 bytes long and looks like this: 406 407.. list-table:: 408 :widths: 8 8 24 40 409 :header-rows: 1 410 411 * - Offset 412 - Type 413 - Name 414 - Description 415 * - 0x0 416 - u32 417 - dt\_reserved 418 - Zero. 419 * - 0x4 420 - \_\_le32 421 - dt\_checksum 422 - Checksum of the htree directory block. 423 424The checksum is calculated against the FS UUID, the htree index header 425(dx\_root or dx\_node), all of the htree indices (dx\_entry) that are in 426use, and the tail block (dx\_tail). 427