mballoc.c - OpenGrok history log for /openbmc/linux/fs/ext4/mballoc.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.1.41, v6.1.40, v6.1.39
# 79ebf48c	07-Jul-2023	Lu Hongfei <luhongfei@vivo.com>	ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Link: https://lore.kernel.org/r/20230707115907.26637-1-luhongfei@vivo.com Signed-of ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Link: https://lore.kernel.org/r/20230707115907.26637-1-luhongfei@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 89cadf6e	07-Jul-2023	Lu Hongfei <luhongfei@vivo.com>	ext4: change the type of blocksize in ext4_mb_init_cache() The return value type of i_blocksize() is 'unsigned int', so the type of blocksize has been modified from 'int' to 'unsigned int' to ensure ext4: change the type of blocksize in ext4_mb_init_cache() The return value type of i_blocksize() is 'unsigned int', so the type of blocksize has been modified from 'int' to 'unsigned int' to ensure data type consistency. Signed-off-by: Lu Hongfei <luhongfei@vivo.com> Link: https://lore.kernel.org/r/20230707105516.9156-1-luhongfei@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 772c9f69	16-Jul-2023	Ritesh Harjani <ritesh.list@gmail.com>	ext4: don't use CR_BEST_AVAIL_LEN for non-regular files Using CR_BEST_AVAIL_LEN only make sense for regular files, as for non-regular files we never normalize the allocation request length i.e. goal ext4: don't use CR_BEST_AVAIL_LEN for non-regular files Using CR_BEST_AVAIL_LEN only make sense for regular files, as for non-regular files we never normalize the allocation request length i.e. goal len is same as original length (ac_g_ex.fe_len == ac_o_ex.fe_len). Hence there is no scope of trimming the goal length to make it satisfy original request len. Thus this patch avoids using CR_BEST_AVAIL_LEN criteria for non-regular files request. Cc: stable@kernel.org Fixes: 33122aa930f1 ("ext4: Add allocation criteria 1.5 (CR1_5)") Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/2a694c748ff8b8c4b416995a24f06f07b55047a8.1689516047.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 4eea9fbe	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: correct some stale comment of criteria We named criteria with CR_XXX, correct stale comment to criteria with raw number. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Rit ext4: correct some stale comment of criteria We named criteria with CR_XXX, correct stale comment to criteria with raw number. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-11-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# bcb123ac	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: return found group directly in ext4_mb_choose_next_group_best_avail Return good group when it's found in loop to remove futher check if good group is found after loop. Signed-off-by: Kemeng S ext4: return found group directly in ext4_mb_choose_next_group_best_avail Return good group when it's found in loop to remove futher check if good group is found after loop. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# b50675a4	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: return found group directly in ext4_mb_choose_next_group_goal_fast Return good group when it's found in loop to remove futher check if good group is found after loop. Signed-off-by: Kemeng Sh ext4: return found group directly in ext4_mb_choose_next_group_goal_fast Return good group when it's found in loop to remove futher check if good group is found after loop. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# de8bf0e5	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: replace the traditional ternary conditional operator with with max()/min() Replace the traditional ternary conditional operator with with max()/min() Signed-off-by: Kemeng Shi <shikemeng@huaw ext4: replace the traditional ternary conditional operator with with max()/min() Replace the traditional ternary conditional operator with with max()/min() Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# ad635507	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: remove unnecessary return for void function The return at end of void function is unnecessary, just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjan ext4: remove unnecessary return for void function The return at end of void function is unnecessary, just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# bb60caa2	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: use is_power_of_2 helper in ext4_mb_regular_allocator Use intuitive is_power_of_2 helper in ext4_mb_regular_allocator. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Rites ext4: use is_power_of_2 helper in ext4_mb_regular_allocator Use intuitive is_power_of_2 helper in ext4_mb_regular_allocator. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 919eb90c	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: return found group directly in ext4_mb_choose_next_group_p2_aligned Return good group when it's found in loop to remove unnecessary NULL initialization of grp and futher check if good group is ext4: return found group directly in ext4_mb_choose_next_group_p2_aligned Return good group when it's found in loop to remove unnecessary NULL initialization of grp and futher check if good group is found after loop. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 60c672b7	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: avoid potential data overflow in next_linear_group ngroups is ext4_group_t (unsigned int) while next_linear_group treat it in int. If ngroups is bigger than max number described by int, it wil ext4: avoid potential data overflow in next_linear_group ngroups is ext4_group_t (unsigned int) while next_linear_group treat it in int. If ngroups is bigger than max number described by int, it will be treat as a negative number. Then "return group + 1 >= ngroups ? 0 : group + 1;" may keep returning 0. Switch int to ext4_group_t in next_linear_group to fix the overflow. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# a9ce5993	01-Aug-2023	Kemeng Shi <shikemeng@huaweicloud.com>	ext4: correct grp validation in ext4_mb_good_group Group corruption check will access memory of grp and will trigger kernel crash if grp is NULL. So do NULL check before corruption check. Fixes: 53 ext4: correct grp validation in ext4_mb_good_group Group corruption check will access memory of grp and will trigger kernel crash if grp is NULL. So do NULL check before corruption check. Fixes: 5354b2af3406 ("ext4: allow ext4_get_group_info() to fail") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230801143204.2284343-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
Revision tags: v6.1.38, v6.1.37
# 304749c0	30-Jun-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: replace CR_FAST macro with inline function for readability Replace CR_FAST with ext4_mb_cr_expensive() inline function for better readability. This function returns true if the criteria is one ext4: replace CR_FAST macro with inline function for readability Replace CR_FAST with ext4_mb_cr_expensive() inline function for better readability. This function returns true if the criteria is one of the expensive/slower ones where lots of disk IO/prefetching is acceptable. No functional changes are intended in this patch. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230630085927.140137-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
Revision tags: v6.1.36, v6.4, v6.1.35
# 95257987	16-Jun-2023	Jan Kara <jack@suse.cz>	ext4: drop EXT4_MF_FS_ABORTED flag EXT4_MF_FS_ABORTED flag has practically the same intent as EXT4_FLAGS_SHUTDOWN flag. The shutdown flag is checked in many more places than the aborted flag which i ext4: drop EXT4_MF_FS_ABORTED flag EXT4_MF_FS_ABORTED flag has practically the same intent as EXT4_FLAGS_SHUTDOWN flag. The shutdown flag is checked in many more places than the aborted flag which is mostly the historical artifact where we were relying on SB_RDONLY checks instead of the aborted flag checks. There are only three places - ext4_sync_file(), __ext4_remount(), and mballoc debug code - which check aborted flag and not shutdown flag and this is arguably a bug. Avoid these inconsistencies by removing EXT4_MF_FS_ABORTED flag and using EXT4_FLAGS_SHUTDOWN everywhere. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616165109.21695-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# bedc5d34	24-Jul-2023	Baokun Li <libaokun1@huawei.com>	ext4: avoid overlapping preallocations due to overflow Let's say we want to allocate 2 blocks starting from 4294966386, after predicting the file size, start is aligned to 4294965248, len is changed ext4: avoid overlapping preallocations due to overflow Let's say we want to allocate 2 blocks starting from 4294966386, after predicting the file size, start is aligned to 4294965248, len is changed to 2048, then end = start + size = 0x100000000. Since end is of type ext4_lblk_t, i.e. uint, end is truncated to 0. This causes (pa->pa_lstart >= end) to always hold when checking if the current extent to be allocated crosses already preallocated blocks, so the resulting ac_g_ex may cross already preallocated blocks. Hence we convert the end type to loff_t and use pa_logical_end() to avoid overflow. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230724121059.11834-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# bc056e71	24-Jul-2023	Baokun Li <libaokun1@huawei.com>	ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow When we calculate the end position of ext4_free_extent, this position may be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow When we calculate the end position of ext4_free_extent, this position may be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not the first case of adjusting the best extent, that is, new_bex_end > 0, the following BUG_ON will be triggered: ========================================================= kernel BUG at fs/ext4/mballoc.c:5116! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279 RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430 Call Trace: <TASK> ext4_mb_use_best_found+0x203/0x2f0 ext4_mb_try_best_found+0x163/0x240 ext4_mb_regular_allocator+0x158/0x1550 ext4_mb_new_blocks+0x86a/0xe10 ext4_ext_map_blocks+0xb0c/0x13a0 ext4_map_blocks+0x2cd/0x8f0 ext4_iomap_begin+0x27b/0x400 iomap_iter+0x222/0x3d0 __iomap_dio_rw+0x243/0xcb0 iomap_dio_rw+0x16/0x80 ========================================================= A simple reproducer demonstrating the problem: mkfs.ext4 -F /dev/sda -b 4096 100M mount /dev/sda /tmp/test fallocate -l1M /tmp/test/tmp fallocate -l10M /tmp/test/file fallocate -i -o 1M -l16777203M /tmp/test/file fsstress -d /tmp/test -l 0 -n 100000 -p 8 & sleep 10 && killall -9 fsstress rm -f /tmp/test/tmp xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192" We simply refactor the logic for adjusting the best extent by adding a temporary ext4_free_extent ex and use extent_logical_end() to avoid overflow, which also simplifies the code. Cc: stable@kernel.org # 6.4 Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 43bbddc0	24-Jul-2023	Baokun Li <libaokun1@huawei.com>	ext4: add two helper functions extent_logical_end() and pa_logical_end() When we use lstart + len to calculate the end of free extent or prealloc space, it may exceed the maximum value of 4294967295 ext4: add two helper functions extent_logical_end() and pa_logical_end() When we use lstart + len to calculate the end of free extent or prealloc space, it may exceed the maximum value of 4294967295(0xffffffff) supported by ext4_lblk_t and cause overflow, which may lead to various problems. Therefore, we add two helper functions, extent_logical_end() and pa_logical_end(), to limit the type of end to loff_t, and also convert lstart to loff_t for calculation to avoid overflow. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230724121059.11834-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 9d3de7ee	22-Jul-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: fix rbtree traversal bug in ext4_mb_use_preallocated During allocations, while looking for preallocations(PA) in the per inode rbtree, we can't do a direct traversal of the tree because ext4_m ext4: fix rbtree traversal bug in ext4_mb_use_preallocated During allocations, while looking for preallocations(PA) in the per inode rbtree, we can't do a direct traversal of the tree because ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted and that can cause direct traversal to skip some entries. This was leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy our request and ultimately tried to create a new PA that would overlap with the missed one. To makes sure we handle that case while still keeping the performance of the rbtree, we make use of the fact that the only pa that could possibly overlap the original goal start is the one that satisfies the below conditions: 1. It must have it's logical start immediately to the left of (ie less than) original logical start. 2. It must not be deleted To find this pa we use the following traversal method: 1. Descend into the rbtree normally to find the immediate neighboring PA. Here we keep descending irrespective of if the PA is deleted or if it overlaps with our request etc. The goal is to find an immediately adjacent PA. 2. If the found PA is on right of original goal, use rb_prev() to find the left adjacent PA. 3. Check if this PA is deleted and keep moving left with rb_prev() until a non deleted PA is found. 4. This is the PA we are looking for. Now we can check if it can satisfy the original request and proceed accordingly. This approach also takes care of having deleted PAs in the tree. (While we are at it, also fix a possible overflow bug in calculating the end of a PA) [1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+tKRFaEQHtt8Q@mail.gmail.com/ Cc: stable@kernel.org # 6.4 Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Tested-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.1690045963.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
Revision tags: v6.1.34
# 5d5460fa	09-Jun-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail() In ext4_mb_choose_next_group_best_avail(), we want the start order to be 1 less than goal length and the min_order to be, at max, ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail() In ext4_mb_choose_next_group_best_avail(), we want the start order to be 1 less than goal length and the min_order to be, at max, 1 more than the original length. This commit fixes an off by one issue that arose due to the fact that 1 << fls(n) > (n). After all the processing: order = 1 order below goal len min_order = maximum of the three:- - order - trim_order - 1 order below B2C(s_stripe) - 1 order above original len Cc: stable@kernel.org Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
Revision tags: v6.1.33
# 4c0cfebd	08-Jun-2023	Theodore Ts'o <tytso@mit.edu>	ext4: clean up mballoc criteria comments Line wrap and slightly clarify the comments describing mballoc's cirtiera. Define EXT4_MB_NUM_CRS as part of the enum, so that it will automatically get upd ext4: clean up mballoc criteria comments Line wrap and slightly clarify the comments describing mballoc's cirtiera. Define EXT4_MB_NUM_CRS as part of the enum, so that it will automatically get updated when criteria is added or removed. Also fix a potential unitialized use of 'cr' variable if CONFIG_EXT4_DEBUG is enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
Revision tags: v6.1.32, v6.1.31
# f52f3d2b	30-May-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: Give symbolic names to mballoc criterias mballoc criterias have historically been called by numbers like CR0, CR1... however this makes it confusing to understand what each criteria is about. ext4: Give symbolic names to mballoc criterias mballoc criterias have historically been called by numbers like CR0, CR1... however this makes it confusing to understand what each criteria is about. Change these criterias from numbers to symbolic names and add relevant comments. While we are at it, also reformat and add some comments to ext4_seq_mb_stats_show() for better readability. Additionally, define CR_FAST which signifies the criteria below which we can make quicker decisions like: * quitting early if (free block < requested len) * avoiding to scan free extents smaller than required len. * avoiding to initialize buddy cache and work with existing cache * limiting prefetches Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/a2dc6ec5aea5e5e68cf8e788c2a964ffead9c8b0.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 7e170922	30-May-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: Add allocation criteria 1.5 (CR1_5) CR1_5 aims to optimize allocations which can't be satisfied in CR1. The fact that we couldn't find a group in CR1 suggests that it would be difficult to fin ext4: Add allocation criteria 1.5 (CR1_5) CR1_5 aims to optimize allocations which can't be satisfied in CR1. The fact that we couldn't find a group in CR1 suggests that it would be difficult to find a continuous extent to compleltely satisfy our allocations. So before falling to the slower CR2, in CR1.5 we proactively trim the the preallocations so we can find a group with (free / fragments) big enough. This speeds up our allocation at the cost of slightly reduced preallocation. The patch also adds a new sysfs tunable: * /sys/fs/ext4/<partition>/mb_cr1_5_max_trim_order This controls how much CR1.5 can trim a request before falling to CR2. For example, for a request of order 7 and max trim order 2, CR1.5 can trim this upto order 5. Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/150fdf65c8e4cc4dba71e020ce0859bcf636a5ff.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 856d865c	30-May-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: Abstract out logic to search average fragment list Make the logic of searching average fragment list of a given order reusable by abstracting it out to a differnet function. This will also avo ext4: Abstract out logic to search average fragment list Make the logic of searching average fragment list of a given order reusable by abstracting it out to a differnet function. This will also avoid code duplication in upcoming patches. No functional changes. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/028c11d95b17ce0285f45456709a0ca922df1b83.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 4f3d1e45	30-May-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: Ensure ext4_mb_prefetch_fini() is called for all prefetched BGs Before this patch, the call stack in ext4_run_li_request is as follows: /* * nr = no. of BGs we want to fetch (=s_mb_prefe ext4: Ensure ext4_mb_prefetch_fini() is called for all prefetched BGs Before this patch, the call stack in ext4_run_li_request is as follows: /* * nr = no. of BGs we want to fetch (=s_mb_prefetch) * prefetch_ios = no. of BGs not uptodate after * ext4_read_block_bitmap_nowait() */ next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios); ext4_mb_prefetch_fini(sb, next_group prefetch_ios); ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in range [next_group - prefetch_ios, next_group). This is incorrect since sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to incorrectly ignore some of the BGs that might need initialization. This issue is more notable now with the previous patch enabling "fetching" of BLOCK_UNINIT BGs which are marked buffer_uptodate by default. Fix this by passing nr to ext4_mb_prefetch_fini() instead of prefetch_ios so that it considers the right range of groups. Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT groups that would need buddy initialization. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/05e648ae04ec5b754207032823e9c1de9a54f87a.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
# 3c629604	30-May-2023	Ojaswin Mujoo <ojaswin@linux.ibm.com>	ext4: Don't skip prefetching BLOCK_UNINIT groups Currently, ext4_mb_prefetch() and ext4_mb_prefetch_fini() skip BLOCK_UNINIT groups since fetching their bitmaps doesn't need disk IO. As a consequenc ext4: Don't skip prefetching BLOCK_UNINIT groups Currently, ext4_mb_prefetch() and ext4_mb_prefetch_fini() skip BLOCK_UNINIT groups since fetching their bitmaps doesn't need disk IO. As a consequence, we end not initializing the buddy structures and CR0/1 lists for these BGs, even though it can be done without any disk IO overhead. Hence, don't skip such BGs during prefetch and prefetch_fini. This improves the accuracy of CR0/1 allocation as earlier, we could have essentially empty BLOCK_UNINIT groups being ignored by CR0/1 due to their buddy not being initialized, leading to slower CR2 allocations. With this patch CR0/1 will be able to discover these groups as well, thus improving performance. Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/dc3130b8daf45ffe63d8a3c1edcf00eb8ba70e1f.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> show more ...
123 4 5 6 7 8 9 10 >>...42