Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
d5e09e38 |
| 12-Sep-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: abort transaction on generation mismatch when marking eb as dirty
[ Upstream commit 50564b651d01c19ce732819c5b3c3fd60707188e ]
When marking an extent buffer as dirty, at btrfs_mark_buffer_di
btrfs: abort transaction on generation mismatch when marking eb as dirty
[ Upstream commit 50564b651d01c19ce732819c5b3c3fd60707188e ]
When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(), we check if its generation matches the running transaction and if not we just print a warning. Such mismatch is an indicator that something really went wrong and only printing a warning message (and stack trace) is not enough to prevent a corruption. Allowing a transaction to commit with such an extent buffer will trigger an error if we ever try to read it from disk due to a generation mismatch with its parent generation.
So abort the current transaction with -EUCLEAN if we notice a generation mismatch. For this we need to pass a transaction handle to btrfs_mark_buffer_dirty() which is always available except in test code, in which case we can pass NULL since it operates on dummy extent buffers and all test roots have a single node/leaf (root node at level 0).
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.2 |
|
#
b4c639f6 |
| 05-Sep-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this
fs/btrfs/tree-log.c: In function ‘btrfs_l
btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this
fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’: fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used uninitialized [-Wmaybe-uninitialized] 4828 | ret = copy_items(trans, inode, dst_path, path, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4829 | start_slot, ins_nr, 1, 0); | ~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here 4725 | int start_slot; | ^~~~~~~~~~
The compiler is incorrect, as we only use this code when ins_len > 0, and when ins_len > 0 we have start_slot properly initialized. However we generally find the -Wmaybe-uninitialized warnings valuable, so initialize start_slot to get rid of the warning.
Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45 |
|
#
84af994b |
| 09-Aug-2023 |
Ruan Jinjie <ruanjinjie@huawei.com> |
btrfs: use LIST_HEAD() to initialize the list_head
Use LIST_HEAD() to initialize the list_head instead of open-coding it.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: David Sterb
btrfs: use LIST_HEAD() to initialize the list_head
Use LIST_HEAD() to initialize the list_head instead of open-coding it.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4 |
|
#
966de47f |
| 22-Jun-2023 |
Colin Ian King <colin.i.king@gmail.com> |
btrfs: remove redundant initialization of variables in log_new_ancestors
The variables leaf and slot are initialized when declared but the values assigned to them are never read as they are being re
btrfs: remove redundant initialization of variables in log_new_ancestors
The variables leaf and slot are initialized when declared but the values assigned to them are never read as they are being re-assigned later on. The initializations are redundant and can be removed. Cleans up clang scan build warnings:
fs/btrfs/tree-log.c:6797:25: warning: Value stored to 'leaf' during its initialization is never read [deadcode.DeadStores] fs/btrfs/tree-log.c:6798:7: warning: Value stored to 'slot' during its initialization is never read [deadcode.DeadStores]
It's been there since b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks") without any usage so it's safe to be removed.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2a9462de |
| 05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
btrfs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
btrfs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-27-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
Revision tags: v6.1.35, v6.1.34 |
|
#
fc4026e2 |
| 12-Jun-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: do not BUG_ON() when dropping inode items from log root
When dropping inode items from a log tree at drop_inode_items(), we this BUG_ON() on the result of btrfs_search_slot() because we don't
btrfs: do not BUG_ON() when dropping inode items from log root
When dropping inode items from a log tree at drop_inode_items(), we this BUG_ON() on the result of btrfs_search_slot() because we don't expect an exact match since having a key with an offset of (u64)-1 is unexpected. That is generally true, but for dir index keys for example, we can get a key with such an offset value, even though it's very unlikely and it would take ages to increase the sequence counter for a dir index up to (u64)-1. We can deal with an exact match, we just have to delete the key at that slot, so there is really no need to BUG_ON(), error out or trigger any warning. So remove the BUG_ON().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
5cfe76f8 |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical
btrfs_ordered_sum::bytendr stores a logical address. Make that clear by renaming it to ->logical.
Reviewed-by: Johannes Thumsh
btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical
btrfs_ordered_sum::bytendr stores a logical address. Make that clear by renaming it to ->logical.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
59fcf388 |
| 17-May-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool
The for_rename argument of btrfs_record_unlink_dir() is defined as an integer, but the argument is in fact used as a boolean. S
btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool
The for_rename argument of btrfs_record_unlink_dir() is defined as an integer, but the argument is in fact used as a boolean. So change it to a boolean to make its use more clear.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
acfb5a4f |
| 17-May-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove pointless label and goto at btrfs_record_unlink_dir()
There's no point of having a label and goto at btrfs_record_unlink_dir() because the function is trivial and can just return early
btrfs: remove pointless label and goto at btrfs_record_unlink_dir()
There's no point of having a label and goto at btrfs_record_unlink_dir() because the function is trivial and can just return early if we are not in a rename context. So remove the label and goto and instead return early if we are not in a rename.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1e75ef03 |
| 17-May-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: update comments at btrfs_record_unlink_dir() to be more clear
Update the comments at btrfs_record_unlink_dir() so that they mention where new names are logged and where old names are removed.
btrfs: update comments at btrfs_record_unlink_dir() to be more clear
Update the comments at btrfs_record_unlink_dir() so that they mention where new names are logged and where old names are removed. Also, while at it make the width of the comments closer to 80 columns and capitalize the sentences and finish them with punctuation.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
d67ba263 |
| 17-May-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use inode_logged() at btrfs_record_unlink_dir()
At btrfs_record_unlink_dir() we directly check the logged_trans field of the given inodes to check if they were previously logged in the curren
btrfs: use inode_logged() at btrfs_record_unlink_dir()
At btrfs_record_unlink_dir() we directly check the logged_trans field of the given inodes to check if they were previously logged in the current transaction, and if any of them were, then we can avoid setting the field last_unlink_trans of the directory to the id of the current transaction if we are in a rename path. Avoiding that can later prevent falling back to a transaction commit if anyone attempts to log the directory.
However the logged_trans field, store in struct btrfs_inode, is transient, not persisted in the inode item on its subvolume b+tree, so that means that if an inode is evicted and then loaded again, its original value is lost and it's reset to 0. So directly checking the logged_trans field can lead to some false negative, and that only results in a performance impact as mentioned before.
Instead of directly checking the logged_trans field of the inodes, use the inode_logged() helper, which will check in the log tree if an inode was logged before in case its logged_trans field has a value of 0. This way we can avoid setting the directory inode's last_unlink_trans and cause future logging attempts of it to fallback to transaction commits. The following test script shows one example where this happens without this patch:
$ cat test.sh #!/bin/bash
DEV=/dev/nullb0 MNT=/mnt/nullb0
num_init_files=10000 num_new_files=10000
mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT
mkdir $MNT/testdir for ((i = 1; i <= $num_init_files; i++)); do echo -n > $MNT/testdir/file_$i done
echo -n > $MNT/testdir/foo
sync
# Add some files so that there's more work in the transaction other # than just renaming file foo. for ((i = 1; i <= $num_new_files; i++)); do echo -n > $MNT/testdir/new_file_$i done
# Change the file, fsync it. setfattr -n user.x1 -v 123 $MNT/testdir/foo xfs_io -c "fsync" $MNT/testdir/foo
# Now triggger eviction of file foo but no eviction for our test # directory, since it is being used by the process below. This will # set logged_trans of the file's inode to 0 once it is loaded again. ( cd $MNT/testdir while true; do : done ) & pid=$!
echo 2 > /proc/sys/vm/drop_caches
kill $pid wait $pid
# Move foo out of our testdir. This will set last_unlink_trans # of the directory inode to the current transaction, because # logged_trans of both the directory and the file are set to 0. mv $MNT/testdir/foo $MNT/foo
# Change file foo again and fsync it. # This fsync will result in a transaction commit because the rename # above has set last_unlink_trans of the parent directory to the id # of the current transaction and because our inode for file foo has # last_unlink_trans set to the current transaction, since it was # evicted and reloaded and it was previously modified in the current # transaction (the xattr addition). xfs_io -c "pwrite 0 64K" $MNT/foo start=$(date +%s%N) xfs_io -c "fsync" $MNT/foo end=$(date +%s%N) dur=$(( (end - start) / 1000000 ))
echo "file fsync took: $dur milliseconds"
umount $MNT
Before this patch: fsync took 19 milliseconds After this patch: fsync took 5 milliseconds
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bf1f4fd3 |
| 17-May-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use inode_logged() at need_log_inode()
At need_log_inode() we directly check the ->logged_trans field of the given inode to check if it was previously logged in the transaction, with the goal
btrfs: use inode_logged() at need_log_inode()
At need_log_inode() we directly check the ->logged_trans field of the given inode to check if it was previously logged in the transaction, with the goal of skipping logging the inode again when it's not necessary. The ->logged_trans field in not persisted in the inode item or elsewhere, it's only stored in memory (struct btrfs_inode), so it's transient and lost once the inode is evicted and then loaded again. Once an inode is loaded, we are conservative and set ->logged_trans to 0, which may mean that either the inode was never logged in the current transaction or it was logged but evicted before being loaded again.
Instead of checking the inode's ->logged_trans field directly, we can use instead the helper inode_logged(), which will really check if the inode was logged before in the current transaction in case we have a ->logged_trans field with a value of 0. This will prevent unnecessarily logging an inode when it's not needed, and in some cases preventing a transaction commit, in case the logging requires a fallback to a transaction commit. The following test script shows a scenario where due to eviction we fallback a transaction commit when trying to fsync a file that was renamed:
$ cat test.sh #!/bin/bash
DEV=/dev/nullb0 MNT=/mnt/nullb0
num_init_files=10000 num_new_files=10000
mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT
mkdir $MNT/testdir for ((i = 1; i <= $num_init_files; i++)); do echo -n > $MNT/testdir/file_$i done
echo -n > $MNT/testdir/foo
sync
# Add some files so that there's more work in the transaction other # than just renaming file foo. for ((i = 1; i <= $num_new_files; i++)); do echo -n > $MNT/testdir/new_file_$i done
# Fsync the directory first. xfs_io -c "fsync" $MNT/testdir
# Rename file foo. mv $MNT/testdir/foo $MNT/testdir/bar
# Now trigger eviction of the test directory's inode. # Once loaded again, it will have logged_trans set to 0 and # last_unlink_trans set to the current transaction. echo 2 > /proc/sys/vm/drop_caches
# Fsync file bar (ex-foo). # Before the patch the fsync would result in a transaction commit # because the inode for file bar has last_unlink_trans set to the # current transaction, so it will attempt to log the parent directory # as well, which will fallback to a full transaction commit because # it also has its last_unlink_trans set to the current transaction, # due to the inode eviction. start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir/bar end=$(date +%s%N) dur=$(( (end - start) / 1000000 ))
echo "file fsync took: $dur milliseconds"
umount $MNT
Before this patch: fsync took 22 milliseconds After this patch: fsync took 8 milliseconds
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.29 |
|
#
8fd9f423 |
| 15-May-2023 |
Shida Zhang <zhangshida@kylinos.cn> |
btrfs: fix an uninitialized variable warning in btrfs_log_inode
This fixes the following warning reported by gcc 10.2.1 under x86_64:
../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’: ../fs/bt
btrfs: fix an uninitialized variable warning in btrfs_log_inode
This fixes the following warning reported by gcc 10.2.1 under x86_64:
../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’: ../fs/btrfs/tree-log.c:6211:9: error: ‘last_range_start’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 6211 | ret = insert_dir_log_key(trans, log, path, key.objectid, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6212 | first_dir_index, last_dir_index); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../fs/btrfs/tree-log.c:6161:6: note: ‘last_range_start’ was declared here 6161 | u64 last_range_start; | ^~~~~~~~~~~~~~~~
This might be a false positive fixed in later compiler versions but we want to have it fixed.
Reported-by: k2ci <kernel-bot@kylinos.cn> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23 |
|
#
5d3e4f1d |
| 05-Apr-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use log root when iterating over index keys when logging directory
When logging dir dentries of a directory, we iterate over the subvolume tree to find dir index keys on leaves modified in th
btrfs: use log root when iterating over index keys when logging directory
When logging dir dentries of a directory, we iterate over the subvolume tree to find dir index keys on leaves modified in the current transaction. This however is heavy on locking, since btrfs_search_forward() may often keep locks on extent buffers for quite a while when walking the tree to find a suitable leaf modified in the current transaction and with a key not smaller than then the provided minimum key. That means it will block other tasks trying to access the subvolume tree, which may be common fs operations like creating, renaming, linking, unlinking, reflinking files, etc.
A better solution is to iterate the log tree, since it's much smaller than a subvolume tree and just use plain btrfs_search_slot() (or the wrapper btrfs_for_each_slot()) and only contains dir index keys added in the current transaction.
The following bonnie++ test on a non-debug kernel (with Debian's default kernel config) on a 20G null block device, was used to measure the impact:
$ cat test.sh #!/bin/bash
DEV=/dev/nullb0 MNT=/mnt/nullb0
NR_DIRECTORIES=20 NR_FILES=20480 # must be a multiple of 1024 DATASET_SIZE=$(( (8 * 1024 * 1024 * 1024) / 1048576 )) # 8 GiB as megabytes DIRECTORY_SIZE=$(( DATASET_SIZE / NR_FILES )) NR_FILES=$(( NR_FILES / 1024 ))
umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount $DEV $MNT
bonnie++ -u root -d $MNT \ -n $NR_FILES:$DIRECTORY_SIZE:$DIRECTORY_SIZE:$NR_DIRECTORIES \ -r 0 -s $DATASET_SIZE -b
umount $MNT
Before patchset:
Version 2.00a ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP debian0 8G 376k 99 1.1g 98 939m 92 1527k 99 3.2g 99 9060 256 Latency 24920us 207us 680ms 5594us 171us 2891us Version 2.00a ------Sequential Create------ --------Random Create-------- debian0 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 20/20 20480 96 +++++ +++ 20480 95 20480 99 +++++ +++ 20480 97 Latency 8708us 137us 5128us 6743us 60us 19712us
After patchset:
Version 2.00a ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP debian0 8G 384k 99 1.2g 99 971m 91 1533k 99 3.3g 99 9180 309 Latency 24930us 125us 661ms 5587us 46us 2020us Version 2.00a ------Sequential Create------ --------Random Create-------- debian0 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 20/20 20480 90 +++++ +++ 20480 99 20480 99 +++++ +++ 20480 97 Latency 7030us 61us 1246us 4942us 56us 16855us
The patchset consists of this patch plus a previous one that has the following subject:
"btrfs: avoid iterating over all indexes when logging directory"
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
fa4b8cb1 |
| 05-Apr-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: avoid iterating over all indexes when logging directory
When logging a directory, after copying all directory index items from the subvolume tree to the log tree, we iterate over the subvolum
btrfs: avoid iterating over all indexes when logging directory
When logging a directory, after copying all directory index items from the subvolume tree to the log tree, we iterate over the subvolume tree to find all dir index items that are located in leaves COWed (or created) in the current transaction. If we keep logging a directory several times during the same transaction, we end up iterating over the same dir index items everytime we log the directory, wasting time and adding extra lock contention on the subvolume tree.
So just keep track of the last logged dir index offset in order to start the search for that index (+1) the next time the directory is logged, as dir index values (key offsets) come from a monotonically increasing counter.
The following test measures the difference before and after this change:
$ cat test.sh #!/bin/bash
DEV=/dev/nullb0 MNT=/mnt/nullb0
umount $DEV &> /dev/null mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT
# Time values in milliseconds. declare -a fsync_times # Total number of files added to the test directory. num_files=1000000 # Fsync directory after every N files are added. fsync_period=100
mkdir $MNT/testdir
fsync_total_time=0 for ((i = 1; i <= $num_files; i++)); do echo -n > $MNT/testdir/file_$i
if [ $((i % fsync_period)) -eq 0 ]; then start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N) fsync_total_time=$((fsync_total_time + (end - start))) fsync_times[i]=$(( (end - start) / 1000000 )) echo -n -e "Progress $i / $num_files\r" fi done
echo -e "\nHistogram of directory fsync duration in ms:\n"
printf '%s\n' "${fsync_times[@]}" | \ perl -MStatistics::Histogram -e '@d = <>; print get_histogram(\@d);'
fsync_total_time=$((fsync_total_time / 1000000)) echo -e "\nTotal time spent in fsync: $fsync_total_time ms\n" echo
umount $MNT
The test was run on a non-debug kernel (Debian's default kernel config) against a 15G null block device.
Result before this change:
Histogram of directory fsync duration in ms:
Count: 10000 Range: 3.000 - 362.000; Mean: 34.556; Median: 31.000; Stddev: 25.751 Percentiles: 90th: 71.000; 95th: 77.000; 99th: 81.000 3.000 - 5.278: 1423 ################################# 5.278 - 8.854: 1173 ########################### 8.854 - 14.467: 591 ############## 14.467 - 23.277: 1025 ####################### 23.277 - 37.105: 1422 ################################# 37.105 - 58.809: 2036 ############################################### 58.809 - 92.876: 2316 ##################################################### 92.876 - 146.346: 6 | 146.346 - 230.271: 6 | 230.271 - 362.000: 2 |
Total time spent in fsync: 350527 ms
Result after this change:
Histogram of directory fsync duration in ms:
Count: 10000 Range: 3.000 - 1088.000; Mean: 8.704; Median: 8.000; Stddev: 12.576 Percentiles: 90th: 12.000; 95th: 14.000; 99th: 17.000 3.000 - 6.007: 3222 ################################# 6.007 - 11.276: 5197 ##################################################### 11.276 - 20.506: 1551 ################ 20.506 - 36.674: 24 | 36.674 - 201.552: 1 | 201.552 - 353.841: 4 | 353.841 - 1088.000: 1 |
Total time spent in fsync: 92114 ms
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e6b430f8 |
| 03-Apr-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: tree-log: factor out a clean_log_buffer helper
The tree-log code has three almost identical copies for the accounting on an extent_buffer that doesn't need to be written any more. The only d
btrfs: tree-log: factor out a clean_log_buffer helper
The tree-log code has three almost identical copies for the accounting on an extent_buffer that doesn't need to be written any more. The only difference is that walk_down_log_tree passed the bytenr used to find the buffer instead of extent_buffer.start and calculates the length using the nodesize, while the other two callers look at the extent_buffer.len field that must always be equivalent to the nodesize.
Factor the code into a common helper.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14 |
|
#
fdf8d595 |
| 23-Feb-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: open code btrfs_bin_search()
btrfs_bin_search() is a simple wrapper that searches for the whole slots by calling btrfs_generic_bin_search() with the starting slot/first_slot preset to 0.
Thi
btrfs: open code btrfs_bin_search()
btrfs_bin_search() is a simple wrapper that searches for the whole slots by calling btrfs_generic_bin_search() with the starting slot/first_slot preset to 0.
This simple wrapper can be open coded as btrfs_bin_search().
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9 |
|
#
79b02ec1 |
| 26-Jan-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
This is used in the tree-log code and is a holdover from previous iterations of extent buffer writeback. We can sim
btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
This is used in the tree-log code and is a holdover from previous iterations of extent buffer writeback. We can simply use wait_on_extent_buffer_writeback here, and remove btrfs_wait_tree_block_writeback completely as it's equivalent (waiting on page write writeback).
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
190a8339 |
| 26-Jan-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty
btrfs_clean_tree_block is a misnomer, it's just clear_extent_buffer_dirty with some extra accounting around it. Rename this to btrfs
btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty
btrfs_clean_tree_block is a misnomer, it's just clear_extent_buffer_dirty with some extra accounting around it. Rename this to btrfs_clear_buffer_dirty to make it more clear it belongs with it's setter, btrfs_mark_buffer_dirty.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c4e54a65 |
| 26-Jan-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block
Now that we're passing in the trans into btrfs_clean_tree_block, we can easily roll in the handling of the !trans case and repl
btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block
Now that we're passing in the trans into btrfs_clean_tree_block, we can easily roll in the handling of the !trans case and replace all occurrences of
if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)) clear_extent_buffer_dirty(eb);
with
btrfs_tree_lock(eb); btrfs_clean_tree_block(eb); btrfs_tree_unlock(eb);
We need the lock because if we are actually dirty we need to make sure we aren't racing with anything that's starting writeout currently. This also makes sure that we're accounting fs_info->dirty_metadata_bytes appropriately.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ed25dab3 |
| 26-Jan-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: add trans argument to btrfs_clean_tree_block
We check the header generation in the extent buffer against the current running transaction id to see if it's safe to clear DIRTY on this buffer.
btrfs: add trans argument to btrfs_clean_tree_block
We check the header generation in the extent buffer against the current running transaction id to see if it's safe to clear DIRTY on this buffer. Generally speaking if we're clearing the buffer dirty we're holding the transaction open, but in the case of cleaning up an aborted transaction we don't, so we have extra checks in that path to check the transid. To allow for a future cleanup go ahead and pass in the trans handle so we don't have to rely on ->running_transaction being set.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19 |
|
#
235e1c7b |
| 10-Jan-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use a single variable to track return value for log_dir_items()
We currently use 'ret' and 'err' to track the return value for log_dir_items(), which is confusing and likely the cause for pre
btrfs: use a single variable to track return value for log_dir_items()
We currently use 'ret' and 'err' to track the return value for log_dir_items(), which is confusing and likely the cause for previous bugs where log_dir_items() did not return an error when it should, fixed in previous patches.
So change this and use only a single variable, 'ret', to track the return value. This is simpler and makes it similar to most of the existing code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5cce1780 |
| 10-Jan-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT
Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value has a few inconveniences:
1) If it's ever used by btrfs_log_inode(), o
btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT
Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value has a few inconveniences:
1) If it's ever used by btrfs_log_inode(), or any function down the call chain, we have to remember to btrfs_set_log_full_commit(), which is repetitive and has a chance to be forgotten in future use cases. btrfs_log_inode_parent() only calls btrfs_set_log_full_commit() when it gets a negative value from btrfs_log_inode();
2) Down the call chain of btrfs_log_inode(), we may have functions that need to force a log commit, but can return either an error (negative value), false (0) or true (1). So they are forced to return some random negative to force a log commit - using BTRFS_LOG_FORCE_COMMIT would make the intention more clear. Currently the only example is flush_dir_items_batch().
So turn BTRFS_LOG_FORCE_COMMIT into a negative value. The chosen value is -(MAX_ERRNO + 1), so that it does not overlap any errno value and makes it easier to debug.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
6afaed53 |
| 10-Jan-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: simplify update of last_dir_index_offset when logging a directory
When logging a directory, we always set the inode's last_dir_index_offset to the offset of the last dir index item we found.
btrfs: simplify update of last_dir_index_offset when logging a directory
When logging a directory, we always set the inode's last_dir_index_offset to the offset of the last dir index item we found. This is using an extra field in the log context structure, and it makes more sense to update it only after we insert dir index items, and we could directly update the inode's last_dir_index_offset field instead.
So make this simpler by updating the inode's last_dir_index_offset only when we actually insert dir index keys in the log tree, and getting rid of the last_dir_item_offset field in the log context structure.
Reported-by: David Arendt <admin@prnet.org> Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/ Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com> Link: https://lore.kernel.org/linux-btrfs/Y8voyTXdnPDz8xwY@mail.gmail.com/ Reported-by: Hunter Wardlaw <wardlawhunter@gmail.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216851 CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
09e44868 |
| 10-Jan-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: do not abort transaction on failure to update log root
When syncing a log, if we fail to update a log root in the log root tree, we are aborting the transaction if the failure was not -ENOSPC
btrfs: do not abort transaction on failure to update log root
When syncing a log, if we fail to update a log root in the log root tree, we are aborting the transaction if the failure was not -ENOSPC. This is excessive because there is a chance that a transaction commit can succeed, and therefore avoid to turn the filesystem into RO mode. All we need to be careful about is to mark the log for a full commit, which we already do, to make sure no one commits a super block pointing to an outdated log root tree.
So don't abort the transaction if we fail to update a log root in the log root tree, and log an error if the failure is not -ENOSPC, so that it does not go completely unnoticed.
CC: stable@vger.kernel.org # 6.0+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|