Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
73aa8ea0 |
| 09-Apr-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity c
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity checks, extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted extent_map has a @block_start way too large. Meanwhile my btrfs_file_extent_item based members are returning a correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K | PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be [36K, 48K). However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE] Inside btrfs_drop_extent_map_range() function, if we hit an extent_map that covers the target range but is still beyond it, we need to split that extent map into half:
|<-- drop range -->| |<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward extent_map::block_start by the difference between the end of drop range and the extent map start.
However in that particular case, the difference is calculated using (start + len - em->start).
The problem is @start can be modified if the drop range covers any pinned extent.
This leads to wrong calculation, and would be caught by my later extent_map sanity checks, which checks the em::block_start against btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range"), which removed the @len update for pinned extents.
[FIX] Fix it by avoiding using @start completely, and use @end - em->start instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable@vger.kernel.org # 6.5+ Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
73aa8ea0 |
| 09-Apr-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity c
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity checks, extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted extent_map has a @block_start way too large. Meanwhile my btrfs_file_extent_item based members are returning a correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K | PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be [36K, 48K). However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE] Inside btrfs_drop_extent_map_range() function, if we hit an extent_map that covers the target range but is still beyond it, we need to split that extent map into half:
|<-- drop range -->| |<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward extent_map::block_start by the difference between the end of drop range and the extent map start.
However in that particular case, the difference is calculated using (start + len - em->start).
The problem is @start can be modified if the drop range covers any pinned extent.
This leads to wrong calculation, and would be caught by my later extent_map sanity checks, which checks the em::block_start against btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range"), which removed the @len update for pinned extents.
[FIX] Fix it by avoiding using @start completely, and use @end - em->start instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable@vger.kernel.org # 6.5+ Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
73aa8ea0 |
| 09-Apr-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity c
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
commit fe1c6c7acce10baf9521d6dccc17268d91ee2305 upstream.
[BUG] During my extent_map cleanup/refactor, with extra sanity checks, extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted extent_map has a @block_start way too large. Meanwhile my btrfs_file_extent_item based members are returning a correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K | PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be [36K, 48K). However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE] Inside btrfs_drop_extent_map_range() function, if we hit an extent_map that covers the target range but is still beyond it, we need to split that extent map into half:
|<-- drop range -->| |<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward extent_map::block_start by the difference between the end of drop range and the extent map start.
However in that particular case, the difference is calculated using (start + len - em->start).
The problem is @start can be modified if the drop range covers any pinned extent.
This leads to wrong calculation, and would be caught by my later extent_map sanity checks, which checks the em::block_start against btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range"), which removed the @len update for pinned extents.
[FIX] Fix it by avoiding using @start completely, and use @end - em->start instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable@vger.kernel.org # 6.5+ Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
c962098c |
| 17-Aug-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: fix incorrect splitting in btrfs_drop_extent_map_range
In production we were seeing a variety of WARN_ON()'s in the extent_map code, specifically in btrfs_drop_extent_map_range() when we have
btrfs: fix incorrect splitting in btrfs_drop_extent_map_range
In production we were seeing a variety of WARN_ON()'s in the extent_map code, specifically in btrfs_drop_extent_map_range() when we have to call add_extent_mapping() for our second split.
Consider the following extent map layout
PINNED [0 16K) [32K, 48K)
and then we call btrfs_drop_extent_map_range for [0, 36K), with skip_pinned == true. The initial loop will have
start = 0 end = 36K len = 36K
we will find the [0, 16k) extent, but since we are pinned we will skip it, which has this code
start = em_end; if (end != (u64)-1) len = start + len - em_end;
em_end here is 16K, so now the values are
start = 16K len = 16K + 36K - 16K = 36K
len should instead be 20K. This is a problem when we find the next extent at [32K, 48K), we need to split this extent to leave [36K, 48k), however the code for the split looks like this
split->start = start + len; split->len = em_end - (start + len);
In this case we have
em_end = 48K split->start = 16K + 36K // this should be 16K + 20K split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
and now we have an invalid extent_map in the tree that potentially overlaps other entries in the extent map. Even in the non-overlapping case we will have split->start set improperly, which will cause problems with any block related calculations.
We don't actually need len in this loop, we can simply use end as our end point, and only adjust start up when we find a pinned extent we need to skip.
Adjust the logic to do this, which keeps us from inserting an invalid extent map.
We only skip_pinned in the relocation case, so this is relatively rare, except in the case where you are running relocation a lot, which can happen with auto relocation on.
Fixes: 55ef68990029 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
f000bc6f |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: pass the new logical address to split_extent_map
split_extent_map splits off the first chunk of an extent map into a new one. One of the two users is the zoned I/O completion code that wants
btrfs: pass the new logical address to split_extent_map
split_extent_map splits off the first chunk of an extent map into a new one. One of the two users is the zoned I/O completion code that wants to rewrite the logical block start address right after this split. Pass in the logical address to be set in the split off first extent_map as an argument to avoid an extra extent tree lookup for this case.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a6f3e205 |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: move split_extent_map to extent_map.c
split_extent_map doesn't have anything to do with the other code in inode.c, so move it to extent_map.c.
This also allows marking replace_extent_mapping
btrfs: move split_extent_map to extent_map.c
split_extent_map doesn't have anything to do with the other code in inode.c, so move it to extent_map.c.
This also allows marking replace_extent_mapping static.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1d126800 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: drop gfp from parameter extent state helpers
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This
btrfs: drop gfp from parameter extent state helpers
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This reduces stack consumption in many functions and simplifies a lot of code.
Net effect on module on a release build:
text data bss dec hex filename 1250432 20985 16088 1287505 13a551 pre/btrfs.ko 1247074 20985 16088 1284147 139833 post/btrfs.ko
DELTA: -3358
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
62bc6047 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: pass NOWAIT for set/clear extent bits as another bit
The only flags we now pass to set_extent_bit/__clear_extent_bit are GFP_NOFS and GFP_NOWAIT (a few functions handling mappings). This requ
btrfs: pass NOWAIT for set/clear extent bits as another bit
The only flags we now pass to set_extent_bit/__clear_extent_bit are GFP_NOFS and GFP_NOWAIT (a few functions handling mappings). This requires an extra parameter to be passed everywhere but is almost always the same.
Encode the GFP_NOWAIT as an artificial extent bit and extract the real bits and gfp mask in the lowest level helpers. Now the passed gfp mask is not actually used and can be removed.
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e85de967 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: open code set_extent_bits_nowait
The helper only passes GFP_NOWAIT as gfp flags and is used two times.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signe
btrfs: open code set_extent_bits_nowait
The helper only passes GFP_NOWAIT as gfp flags and is used two times.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15 |
|
#
e4cc1483 |
| 27-Feb-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix extent map logging bit not cleared for split maps after dropping range
At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING bit on a 'flags' variable that was not init
btrfs: fix extent map logging bit not cleared for split maps after dropping range
At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING bit on a 'flags' variable that was not initialized. This makes static checkers complain about it, so initialize the 'flags' variable before clearing the bit.
In practice this has no consequences, because EXTENT_FLAG_LOGGING should not be set when btrfs_drop_extent_map_range() is called, as an fsync locks the inode in exclusive mode, locks the inode's mmap semaphore in exclusive mode too and it always flushes all delalloc.
Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the flags of the split extent map.
Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/ Fixes: db21370bffbc ("btrfs: drop extent map range more efficiently") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79 |
|
#
cfd7a17d |
| 11-Nov-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove no longer used btrfs_next_extent_map()
There are no more users of btrfs_next_extent_map(), the previous patch in the series ("btrfs: search for delalloc more efficiently during lseek/f
btrfs: remove no longer used btrfs_next_extent_map()
There are no more users of btrfs_next_extent_map(), the previous patch in the series ("btrfs: search for delalloc more efficiently during lseek/fiemap") removed the last usage of the function, so delete it.
This change is part of a patchset that has the goal to make performance better for applications that use lseek's SEEK_HOLE and SEEK_DATA modes to iterate over the extents of a file. Two examples are the cp program from coreutils 9.0+ and the tar program (when using its --sparse / -S option). A sample test and results are listed in the changelog of the last patch in the series:
1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree 2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap 3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap 4/9 btrfs: search for delalloc more efficiently during lseek/fiemap 5/9 btrfs: remove no longer used btrfs_next_extent_map() 6/9 btrfs: allow passing a cached state record to count_range_bits() 7/9 btrfs: update stale comment for count_range_bits() 8/9 btrfs: use cached state when looking for delalloc ranges with fiemap 9/9 btrfs: use cached state when looking for delalloc ranges with lseek
Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69 |
|
#
d52a1365 |
| 16-Sep-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: selftests: remove impossible inline extent at non-zero file offset
In our inode-tests.c, we create an inline offset at file offset 5, which is no longer possible since the introduction of tre
btrfs: selftests: remove impossible inline extent at non-zero file offset
In our inode-tests.c, we create an inline offset at file offset 5, which is no longer possible since the introduction of tree-checker.
Thus I don't think we should spend time maintaining some corner cases which are already ruled out by tree-checker.
So this patch will:
- Change the inline extent to start at file offset 0
Also change its length to 6 to cover the original length
- Add an extra ASSERT() for btrfs_add_extent_mapping()
This is to make sure tree-checker is working correctly.
- Update the inode selftest
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
43dd529a |
| 27-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: update function comments
Update, reformat or reword function comments. This also removes the kdoc marker so we don't get reports when the function name is missing.
Changes made:
- remove kd
btrfs: update function comments
Update, reformat or reword function comments. This also removes the kdoc marker so we don't get reports when the function name is missing.
Changes made:
- remove kdoc markers - reformat the brief description to be a proper sentence - reword to imperative voice - align parameter list - fix typos
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9b569ea0 |
| 19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move the printk helpers out of ctree.h
We have a bunch of printk helpers that are in ctree.h. These have nothing to do with ctree.c, so move them into their own header. Subsequent patches wi
btrfs: move the printk helpers out of ctree.h
We have a bunch of printk helpers that are in ctree.h. These have nothing to do with ctree.c, so move them into their own header. Subsequent patches will cleanup the printk helpers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
d47704bd |
| 11-Oct-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: get the next extent map during fiemap/lseek more efficiently
At find_delalloc_subrange(), when we need to get the next extent map, we do a full search on the extent map tree (a red black tree
btrfs: get the next extent map during fiemap/lseek more efficiently
At find_delalloc_subrange(), when we need to get the next extent map, we do a full search on the extent map tree (a red black tree). This is fine but it's a lot more efficient to simply use rb_next(), which typically requires iterating over less nodes of the tree and never needs to compare the ranges of nodes with the one we are looking for.
So add a public helper to extent_map.{h,c} to get the extent map that immediately follows another extent map, using rb_next(), and use that helper at find_delalloc_subrange().
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
db21370b |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: drop extent map range more efficiently
Currently when dropping extent maps for a file range, through btrfs_drop_extent_map_range(), we do the following non-optimal things:
1) We lookup for e
btrfs: drop extent map range more efficiently
Currently when dropping extent maps for a file range, through btrfs_drop_extent_map_range(), we do the following non-optimal things:
1) We lookup for extent maps one by one, always starting the search from the root of the extent map tree. This is not efficient if we have multiple extent maps in the range;
2) We check on every iteration if we have the 'split' and 'split2' spare extent maps in case we need to split an extent map that intersects our range but also crosses its boundaries (to the left, to the right or both cases). If our target range is for example:
[2M, 8M)
And we have 3 extents maps in the range:
[1M, 3M) [3M, 6M) [6M, 10M[
The on the first iteration we allocate two extent maps for 'split' and 'split2', and use the 'split' to split the first extent map, so after the split we set 'split' to 'split2' and then set 'split2' to NULL.
On the second iteration, we don't need to split the second extent map, but because 'split2' is now NULL, we allocate a new extent map for 'split2'.
On the third iteration we need to split the third extent map, so we use the extent map pointed by 'split'.
So we ended up allocating 3 extent maps for splitting, but all we needed was 2 extent maps. We never need to allocate more than 2, because extent maps that need to be split are always the first one and the last one in the target range.
Improve on this by:
1) Using rb_next() to move on to the next extent map. This results in iterating over less nodes of the tree and it does not require comparing the ranges of nodes to our start/end offset;
2) Allocate the 2 extent maps for splitting before entering the loop and never allocate more than 2. In practice it's very rare to have the combination of both extent map allocations fail, since we have a dedicated slab for extent maps, and also have the need to split two extent maps.
This patch is part of a patchset comprised of the following patches:
btrfs: fix missed extent on fsync after dropping extent maps btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: move open coded extent map tree deletion out of inode eviction btrfs: add helper to replace extent map range with a new extent map btrfs: remove the refcount warning/check at free_extent_map() btrfs: remove unnecessary extent map initializations btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: remove unnecessary next extent map search btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: drop extent map range more efficiently
And the following fio test was done before and after applying the whole patchset, on a non-debug kernel (Debian's default kernel config) on a 12 cores Intel box with 64G of ram:
$ cat test.sh #!/bin/bash
DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-R free-space-tree -O no-holes"
cat <<EOF > /tmp/fio-job.ini [writers] rw=randwrite fsync=8 fallocate=none group_reporting=1 direct=0 bssplit=4k/20:8k/20:16k/20:32k/10:64k/10:128k/5:256k/5:512k/5:1m/5 ioengine=psync filesize=2G runtime=300 time_based directory=$MNT numjobs=8 thread EOF
echo performance | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo echo "Using config:" echo cat /tmp/fio-job.ini echo
umount $MNT &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT
fio /tmp/fio-job.ini
umount $MNT
Result before applying the patchset:
WRITE: bw=197MiB/s (206MB/s), 197MiB/s-197MiB/s (206MB/s-206MB/s), io=57.7GiB (61.9GB), run=300188-300188msec
Result after applying the patchset:
WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=59.5GiB (63.9GB), run=300019-300019msec
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
6c05813e |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove unnecessary next extent map search
At __tree_search(), and its single caller __lookup_extent_mapping(), there is no point in finding the next extent map that starts after the search of
btrfs: remove unnecessary next extent map search
At __tree_search(), and its single caller __lookup_extent_mapping(), there is no point in finding the next extent map that starts after the search offset if we were able to find the previous extent map that ends before our search offset, because __lookup_extent_mapping() ignores the next acceptable extent map if we were able to find the previous one.
So just return immediately if we were able to find the previous extent map, therefore avoiding wasting time iterating the tree looking for the next extent map which will not be used by __lookup_extent_mapping().
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
08f088dd |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove unnecessary NULL pointer checks when searching extent maps
The previous and next pointer arguments passed to __tree_search() are never NULL as the only caller of this function, __looku
btrfs: remove unnecessary NULL pointer checks when searching extent maps
The previous and next pointer arguments passed to __tree_search() are never NULL as the only caller of this function, __lookup_extent_mapping(), always passes the address of two on stack pointers. So remove the NULL checks and add assertions to verify the pointers.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
74333c7d |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: assert tree is locked when clearing extent map from logging
When calling clear_em_logging() we should have a write lock on the extent map tree, as we will try to merge the extent map with the
btrfs: assert tree is locked when clearing extent map from logging
When calling clear_em_logging() we should have a write lock on the extent map tree, as we will try to merge the extent map with the previous and next ones in the tree. So assert that we have a write lock.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2e0cdaa0 |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove unnecessary extent map initializations
When allocating an extent map, we use kmem_cache_zalloc() which guarantees the returned memory is initialized to zeroes, therefore it's pointless
btrfs: remove unnecessary extent map initializations
When allocating an extent map, we use kmem_cache_zalloc() which guarantees the returned memory is initialized to zeroes, therefore it's pointless to initialize the generation and flags of the extent map to zero again.
Remove those initializations, as they are pointless and slightly increase the object text size.
Before removing them:
$ size fs/btrfs/extent_map.o text data bss dec hex filename 9241 274 24 9539 2543 fs/btrfs/extent_map.o
After removing them:
$ size fs/btrfs/extent_map.o text data bss dec hex filename 9209 274 24 9507 2523 fs/btrfs/extent_map.o
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ad5d6e91 |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove the refcount warning/check at free_extent_map()
At free_extent_map(), it's pointless to have a WARN_ON() to check if the refcount of the extent map is zero. Such check is already done
btrfs: remove the refcount warning/check at free_extent_map()
At free_extent_map(), it's pointless to have a WARN_ON() to check if the refcount of the extent map is zero. Such check is already done by the refcount_t module and refcount_dec_and_test(), which loudly complains if we try to decrement a reference count that is currently 0.
The WARN_ON() dates back to the time when used a regular atomic_t type for the reference counter, before we switched to the refcount_t type. The main goal of the refcount_t type/module is precisely to catch such types of bugs and loudly complain if they happen.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a1ba4c08 |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: add helper to replace extent map range with a new extent map
We have several places that need to drop all the extent maps in a given file range and then add a new extent map for that range. C
btrfs: add helper to replace extent map range with a new extent map
We have several places that need to drop all the extent maps in a given file range and then add a new extent map for that range. Currently they call btrfs_drop_extent_map_range() to delete all extent maps in the range and then keep trying to add the new extent map in a loop that keeps retrying while the insertion of the new extent map fails with -EEXIST.
So instead of repeating this logic, add a helper to extent_map.c that does these steps and name it btrfs_replace_extent_map_range(). Also add a comment about why the retry loop is necessary.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9c9d1b4f |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: move open coded extent map tree deletion out of inode eviction
Move the loop that removes all the extent maps from the inode's extent map tree during inode eviction out of inode.c and into ex
btrfs: move open coded extent map tree deletion out of inode eviction
Move the loop that removes all the extent maps from the inode's extent map tree during inode eviction out of inode.c and into extent_map.c, to btrfs_drop_extent_map_range(). Anything manipulating extent maps or the extent map tree should be in extent_map.c.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f3109e33 |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
Instead of open coding the end offset calculation of an extent map, use the helper extent_map_end() and cache its result in a local varia
btrfs: use extent_map_end() at btrfs_drop_extent_map_range()
Instead of open coding the end offset calculation of an extent map, use the helper extent_map_end() and cache its result in a local variable, since it's used several times.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
4c0c8cfc |
| 19-Sep-2022 |
Filipe Manana <fdmanana@suse.com> |
btrfs: move btrfs_drop_extent_cache() to extent_map.c
The function btrfs_drop_extent_cache() doesn't really belong at file.c because what it does is drop a range of extent maps for a file range. It
btrfs: move btrfs_drop_extent_cache() to extent_map.c
The function btrfs_drop_extent_cache() doesn't really belong at file.c because what it does is drop a range of extent maps for a file range. It directly allocates and manipulates extent maps, by dropping, splitting and replacing them in an extent map tree, so it should be located at extent_map.c, where all manipulations of an extent map tree and its extent maps are supposed to be done.
So move it out of file.c and into extent_map.c. Additionally do the following changes:
1) Rename it into btrfs_drop_extent_map_range(), as this makes it more clear about what it does. The term "cache" is a bit confusing as it's not widely used, "extent maps" or "extent mapping" is much more common;
2) Change its 'skip_pinned' argument from int to bool;
3) Turn several of its local variables from int to bool, since they are used as booleans;
4) Move the declaration of some variables out of the function's main scope and into the scopes where they are used;
5) Remove pointless assignment of false to 'modified' early in the while loop, as later that variable is set and it's not used before that second assignment;
6) Remove checks for NULL before calling free_extent_map().
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|