18a98ec7cSDarrick J. Wong.. SPDX-License-Identifier: GPL-2.0 28a98ec7cSDarrick J. Wong 38a98ec7cSDarrick J. WongLayout 48a98ec7cSDarrick J. Wong------ 58a98ec7cSDarrick J. Wong 68a98ec7cSDarrick J. WongThe layout of a standard block group is approximately as follows (each 78a98ec7cSDarrick J. Wongof these fields is discussed in a separate section below): 88a98ec7cSDarrick J. Wong 98a98ec7cSDarrick J. Wong.. list-table:: 108a98ec7cSDarrick J. Wong :widths: 1 1 1 1 1 1 1 1 118a98ec7cSDarrick J. Wong :header-rows: 1 128a98ec7cSDarrick J. Wong 138a98ec7cSDarrick J. Wong * - Group 0 Padding 148a98ec7cSDarrick J. Wong - ext4 Super Block 158a98ec7cSDarrick J. Wong - Group Descriptors 168a98ec7cSDarrick J. Wong - Reserved GDT Blocks 178a98ec7cSDarrick J. Wong - Data Block Bitmap 188a98ec7cSDarrick J. Wong - inode Bitmap 198a98ec7cSDarrick J. Wong - inode Table 208a98ec7cSDarrick J. Wong - Data Blocks 218a98ec7cSDarrick J. Wong * - 1024 bytes 228a98ec7cSDarrick J. Wong - 1 block 238a98ec7cSDarrick J. Wong - many blocks 248a98ec7cSDarrick J. Wong - many blocks 258a98ec7cSDarrick J. Wong - 1 block 268a98ec7cSDarrick J. Wong - 1 block 278a98ec7cSDarrick J. Wong - many blocks 288a98ec7cSDarrick J. Wong - many more blocks 298a98ec7cSDarrick J. Wong 308a98ec7cSDarrick J. WongFor the special case of block group 0, the first 1024 bytes are unused, 318a98ec7cSDarrick J. Wongto allow for the installation of x86 boot sectors and other oddities. 328a98ec7cSDarrick J. WongThe superblock will start at offset 1024 bytes, whichever block that 338a98ec7cSDarrick J. Wonghappens to be (usually 0). However, if for some reason the block size = 348a98ec7cSDarrick J. Wong1024, then block 0 is marked in use and the superblock goes in block 1. 358a98ec7cSDarrick J. WongFor all other block groups, there is no padding. 368a98ec7cSDarrick J. Wong 378a98ec7cSDarrick J. WongThe ext4 driver primarily works with the superblock and the group 388a98ec7cSDarrick J. Wongdescriptors that are found in block group 0. Redundant copies of the 398a98ec7cSDarrick J. Wongsuperblock and group descriptors are written to some of the block groups 408a98ec7cSDarrick J. Wongacross the disk in case the beginning of the disk gets trashed, though 418a98ec7cSDarrick J. Wongnot all block groups necessarily host a redundant copy (see following 428a98ec7cSDarrick J. Wongparagraph for more details). If the group does not have a redundant 438a98ec7cSDarrick J. Wongcopy, the block group begins with the data block bitmap. Note also that 448a98ec7cSDarrick J. Wongwhen the filesystem is freshly formatted, mkfs will allocate “reserve 458a98ec7cSDarrick J. WongGDT block” space after the block group descriptors and before the start 468a98ec7cSDarrick J. Wongof the block bitmaps to allow for future expansion of the filesystem. By 478a98ec7cSDarrick J. Wongdefault, a filesystem is allowed to increase in size by a factor of 488a98ec7cSDarrick J. Wong1024x over the original filesystem size. 498a98ec7cSDarrick J. Wong 508a98ec7cSDarrick J. WongThe location of the inode table is given by ``grp.bg_inode_table_*``. It 518a98ec7cSDarrick J. Wongis continuous range of blocks large enough to contain 528a98ec7cSDarrick J. Wong``sb.s_inodes_per_group * sb.s_inode_size`` bytes. 538a98ec7cSDarrick J. Wong 548a98ec7cSDarrick J. WongAs for the ordering of items in a block group, it is generally 558a98ec7cSDarrick J. Wongestablished that the super block and the group descriptor table, if 568a98ec7cSDarrick J. Wongpresent, will be at the beginning of the block group. The bitmaps and 578a98ec7cSDarrick J. Wongthe inode table can be anywhere, and it is quite possible for the 588a98ec7cSDarrick J. Wongbitmaps to come after the inode table, or for both to be in different 593103084aSWang Jianjiangroups (flex_bg). Leftover space is used for file data blocks, indirect 608a98ec7cSDarrick J. Wongblock maps, extent tree blocks, and extended attributes. 618a98ec7cSDarrick J. Wong 628a98ec7cSDarrick J. WongFlexible Block Groups 638a98ec7cSDarrick J. Wong--------------------- 648a98ec7cSDarrick J. Wong 658a98ec7cSDarrick J. WongStarting in ext4, there is a new feature called flexible block groups 663103084aSWang Jianjian(flex_bg). In a flex_bg, several block groups are tied together as one 678a98ec7cSDarrick J. Wonglogical block group; the bitmap spaces and the inode table space in the 683103084aSWang Jianjianfirst block group of the flex_bg are expanded to include the bitmaps 693103084aSWang Jianjianand inode tables of all other block groups in the flex_bg. For example, 703103084aSWang Jianjianif the flex_bg size is 4, then group 0 will contain (in order) the 718a98ec7cSDarrick J. Wongsuperblock, group descriptors, data block bitmaps for groups 0-3, inode 728a98ec7cSDarrick J. Wongbitmaps for groups 0-3, inode tables for groups 0-3, and the remaining 738a98ec7cSDarrick J. Wongspace in group 0 is for file data. The effect of this is to group the 74219db95bSAyush Ranjanblock group metadata close together for faster loading, and to enable 75219db95bSAyush Ranjanlarge files to be continuous on disk. Backup copies of the superblock 76219db95bSAyush Ranjanand group descriptors are always at the beginning of block groups, even 773103084aSWang Jianjianif flex_bg is enabled. The number of block groups that make up a 783103084aSWang Jianjianflex_bg is given by 2 ^ ``sb.s_log_groups_per_flex``. 798a98ec7cSDarrick J. Wong 808a98ec7cSDarrick J. WongMeta Block Groups 818a98ec7cSDarrick J. Wong----------------- 828a98ec7cSDarrick J. Wong 833103084aSWang JianjianWithout the option META_BG, for safety concerns, all block group 848a98ec7cSDarrick J. Wongdescriptors copies are kept in the first block group. Given the default 858a98ec7cSDarrick J. Wong128MiB(2^27 bytes) block group size and 64-byte group descriptors, ext4 868a98ec7cSDarrick J. Wongcan have at most 2^27/64 = 2^21 block groups. This limits the entire 87d9d2c827SMauro Carvalho Chehabfilesystem size to 2^21 * 2^27 = 2^48bytes or 256TiB. 888a98ec7cSDarrick J. Wong 898a98ec7cSDarrick J. WongThe solution to this problem is to use the metablock group feature 903103084aSWang Jianjian(META_BG), which is already in ext3 for all 2.6 releases. With the 913103084aSWang JianjianMETA_BG feature, ext4 filesystems are partitioned into many metablock 928a98ec7cSDarrick J. Wonggroups. Each metablock group is a cluster of block groups whose group 938a98ec7cSDarrick J. Wongdescriptor structures can be stored in a single disk block. For ext4 948a98ec7cSDarrick J. Wongfilesystems with 4 KB block size, a single metablock group partition 958a98ec7cSDarrick J. Wongincludes 64 block groups, or 8 GiB of disk space. The metablock group 968a98ec7cSDarrick J. Wongfeature moves the location of the group descriptors from the congested 978a98ec7cSDarrick J. Wongfirst block group of the whole filesystem into the first group of each 988a98ec7cSDarrick J. Wongmetablock group itself. The backups are in the second and last group of 998a98ec7cSDarrick J. Wongeach metablock group. This increases the 2^21 maximum block groups limit 1008a98ec7cSDarrick J. Wongto the hard limit 2^32, allowing support for a 512PiB filesystem. 1018a98ec7cSDarrick J. Wong 1028a98ec7cSDarrick J. WongThe change in the filesystem format replaces the current scheme where 1038a98ec7cSDarrick J. Wongthe superblock is followed by a variable-length set of block group 1048a98ec7cSDarrick J. Wongdescriptors. Instead, the superblock and a single block group descriptor 1058a98ec7cSDarrick J. Wongblock is placed at the beginning of the first, second, and last block 1068a98ec7cSDarrick J. Wonggroups in a meta-block group. A meta-block group is a collection of 1078a98ec7cSDarrick J. Wongblock groups which can be described by a single block group descriptor 108*b7eef407SWu Boblock. Since the size of the block group descriptor structure is 64 109*b7eef407SWu Bobytes, a meta-block group contains 16 block groups for filesystems with 110*b7eef407SWu Boa 1KB block size, and 64 block groups for filesystems with a 4KB 1118a98ec7cSDarrick J. Wongblocksize. Filesystems can either be created using this new block group 1128a98ec7cSDarrick J. Wongdescriptor layout, or existing filesystems can be resized on-line, and 1133103084aSWang Jianjianthe field s_first_meta_bg in the superblock will indicate the first 1148a98ec7cSDarrick J. Wongblock group using this new layout. 1158a98ec7cSDarrick J. Wong 1168a98ec7cSDarrick J. WongPlease see an important note about ``BLOCK_UNINIT`` in the section about 1178a98ec7cSDarrick J. Wongblock and inode bitmaps. 1188a98ec7cSDarrick J. Wong 1198a98ec7cSDarrick J. WongLazy Block Group Initialization 1208a98ec7cSDarrick J. Wong------------------------------- 1218a98ec7cSDarrick J. Wong 1228a98ec7cSDarrick J. WongA new feature for ext4 are three block group descriptor flags that 1238a98ec7cSDarrick J. Wongenable mkfs to skip initializing other parts of the block group 1243103084aSWang Jianjianmetadata. Specifically, the INODE_UNINIT and BLOCK_UNINIT flags mean 1258a98ec7cSDarrick J. Wongthat the inode and block bitmaps for that group can be calculated and 1268a98ec7cSDarrick J. Wongtherefore the on-disk bitmap blocks are not initialized. This is 1278a98ec7cSDarrick J. Wonggenerally the case for an empty block group or a block group containing 1283103084aSWang Jianjianonly fixed-location block group metadata. The INODE_ZEROED flag means 1298a98ec7cSDarrick J. Wongthat the inode table has been initialized; mkfs will unset this flag and 1308a98ec7cSDarrick J. Wongrely on the kernel to initialize the inode tables in the background. 1318a98ec7cSDarrick J. Wong 1328a98ec7cSDarrick J. WongBy not writing zeroes to the bitmaps and inode table, mkfs time is 1333103084aSWang Jianjianreduced considerably. Note the feature flag is RO_COMPAT_GDT_CSUM, 1343103084aSWang Jianjianbut the dumpe2fs output prints this as “uninit_bg”. They are the same 1358a98ec7cSDarrick J. Wongthing. 136