18a98ec7cSDarrick J. Wong.. SPDX-License-Identifier: GPL-2.0
28a98ec7cSDarrick J. Wong
38a98ec7cSDarrick J. WongBigalloc
48a98ec7cSDarrick J. Wong--------
58a98ec7cSDarrick J. Wong
68a98ec7cSDarrick J. WongAt the moment, the default size of a block is 4KiB, which is a commonly
78a98ec7cSDarrick J. Wongsupported page size on most MMU-capable hardware. This is fortunate, as
88a98ec7cSDarrick J. Wongext4 code is not prepared to handle the case where the block size
98a98ec7cSDarrick J. Wongexceeds the page size. However, for a filesystem of mostly huge files,
108a98ec7cSDarrick J. Wongit is desirable to be able to allocate disk blocks in units of multiple
118a98ec7cSDarrick J. Wongblocks to reduce both fragmentation and metadata overhead. The
12e8552640SAyush Ranjanbigalloc feature provides exactly this ability.
13e8552640SAyush Ranjan
14e8552640SAyush RanjanThe bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
15e8552640SAyush Ranjanuse clustered allocation, so that each bit in the ext4 block allocation
16e8552640SAyush Ranjanbitmap addresses a power of two number of blocks. For example, if the
17e8552640SAyush Ranjanfile system is mainly going to be storing large files in the 4-32
18e8552640SAyush Ranjanmegabyte range, it might make sense to set a cluster size of 1 megabyte.
19e8552640SAyush RanjanThis means that each bit in the block allocation bitmap now addresses
20e8552640SAyush Ranjan256 4k blocks. This shrinks the total size of the block allocation
21e8552640SAyush Ranjanbitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
22e8552640SAyush Ranjanmeans that a block group addresses 32 gigabytes instead of 128 megabytes,
23e8552640SAyush Ranjanalso shrinking the amount of file system overhead for metadata.
24e8552640SAyush Ranjan
25e8552640SAyush RanjanThe administrator can set a block cluster size at mkfs time (which is
26*3103084aSWang Jianjianstored in the s_log_cluster_size field in the superblock); from then
27e8552640SAyush Ranjanon, the block bitmaps track clusters, not individual blocks. This means
28e8552640SAyush Ranjanthat block groups can be several gigabytes in size (instead of just
29e8552640SAyush Ranjan128MiB); however, the minimum allocation unit becomes a cluster, not a
30e8552640SAyush Ranjanblock, even for directories. TaoBao had a patchset to extend the “use
31e8552640SAyush Ranjanunits of clusters instead of blocks” to the extent tree, though it is
32e8552640SAyush Ranjannot clear where those patches went-- they eventually morphed into
33e8552640SAyush Ranjan“extent tree v2” but that code has not landed as of May 2015.
348a98ec7cSDarrick J. Wong
35