History log of /openbmc/linux/fs/btrfs/tree-log.c (Results 1 – 25 of 1209)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3
# d5e09e38 12-Sep-2023 Filipe Manana <fdmanana@suse.com>

btrfs: abort transaction on generation mismatch when marking eb as dirty

[ Upstream commit 50564b651d01c19ce732819c5b3c3fd60707188e ]

When marking an extent buffer as dirty, at btrfs_mark_buffer_di

btrfs: abort transaction on generation mismatch when marking eb as dirty

[ Upstream commit 50564b651d01c19ce732819c5b3c3fd60707188e ]

When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(),
we check if its generation matches the running transaction and if not we
just print a warning. Such mismatch is an indicator that something really
went wrong and only printing a warning message (and stack trace) is not
enough to prevent a corruption. Allowing a transaction to commit with such
an extent buffer will trigger an error if we ever try to read it from disk
due to a generation mismatch with its parent generation.

So abort the current transaction with -EUCLEAN if we notice a generation
mismatch. For this we need to pass a transaction handle to
btrfs_mark_buffer_dirty() which is always available except in test code,
in which case we can pass NULL since it operates on dummy extent buffers
and all test roots have a single node/leaf (root node at level 0).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.2
# b4c639f6 05-Sep-2023 Josef Bacik <josef@toxicpanda.com>

btrfs: initialize start_slot in btrfs_log_prealloc_extents

Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this

fs/btrfs/tree-log.c: In function ‘btrfs_l

btrfs: initialize start_slot in btrfs_log_prealloc_extents

Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this

fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
uninitialized [-Wmaybe-uninitialized]
4828 | ret = copy_items(trans, inode, dst_path, path,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4829 | start_slot, ins_nr, 1, 0);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
4725 | int start_slot;
| ^~~~~~~~~~

The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized. However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.

Reported-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45
# 84af994b 09-Aug-2023 Ruan Jinjie <ruanjinjie@huawei.com>

btrfs: use LIST_HEAD() to initialize the list_head

Use LIST_HEAD() to initialize the list_head instead of open-coding it.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: David Sterb

btrfs: use LIST_HEAD() to initialize the list_head

Use LIST_HEAD() to initialize the list_head instead of open-coding it.

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4
# 966de47f 22-Jun-2023 Colin Ian King <colin.i.king@gmail.com>

btrfs: remove redundant initialization of variables in log_new_ancestors

The variables leaf and slot are initialized when declared but the values
assigned to them are never read as they are being re

btrfs: remove redundant initialization of variables in log_new_ancestors

The variables leaf and slot are initialized when declared but the values
assigned to them are never read as they are being re-assigned later on.
The initializations are redundant and can be removed. Cleans up clang
scan build warnings:

fs/btrfs/tree-log.c:6797:25: warning: Value stored to 'leaf' during its
initialization is never read [deadcode.DeadStores]
fs/btrfs/tree-log.c:6798:7: warning: Value stored to 'slot' during its
initialization is never read [deadcode.DeadStores]

It's been there since b8aa330d2acb ("Btrfs: improve performance on fsync
of files with multiple hardlinks") without any usage so it's safe to be
removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 2a9462de 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

btrfs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

btrfs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-27-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


Revision tags: v6.1.35, v6.1.34
# fc4026e2 12-Jun-2023 Filipe Manana <fdmanana@suse.com>

btrfs: do not BUG_ON() when dropping inode items from log root

When dropping inode items from a log tree at drop_inode_items(), we this
BUG_ON() on the result of btrfs_search_slot() because we don't

btrfs: do not BUG_ON() when dropping inode items from log root

When dropping inode items from a log tree at drop_inode_items(), we this
BUG_ON() on the result of btrfs_search_slot() because we don't expect an
exact match since having a key with an offset of (u64)-1 is unexpected.
That is generally true, but for dir index keys for example, we can get a
key with such an offset value, even though it's very unlikely and it would
take ages to increase the sequence counter for a dir index up to (u64)-1.
We can deal with an exact match, we just have to delete the key at that
slot, so there is really no need to BUG_ON(), error out or trigger any
warning. So remove the BUG_ON().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30
# 5cfe76f8 24-May-2023 Christoph Hellwig <hch@lst.de>

btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical

btrfs_ordered_sum::bytendr stores a logical address. Make that clear by
renaming it to ->logical.

Reviewed-by: Johannes Thumsh

btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical

btrfs_ordered_sum::bytendr stores a logical address. Make that clear by
renaming it to ->logical.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 59fcf388 17-May-2023 Filipe Manana <fdmanana@suse.com>

btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool

The for_rename argument of btrfs_record_unlink_dir() is defined as an
integer, but the argument is in fact used as a boolean. S

btrfs: change for_rename argument of btrfs_record_unlink_dir() to bool

The for_rename argument of btrfs_record_unlink_dir() is defined as an
integer, but the argument is in fact used as a boolean. So change it to
a boolean to make its use more clear.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# acfb5a4f 17-May-2023 Filipe Manana <fdmanana@suse.com>

btrfs: remove pointless label and goto at btrfs_record_unlink_dir()

There's no point of having a label and goto at btrfs_record_unlink_dir()
because the function is trivial and can just return early

btrfs: remove pointless label and goto at btrfs_record_unlink_dir()

There's no point of having a label and goto at btrfs_record_unlink_dir()
because the function is trivial and can just return early if we are not
in a rename context. So remove the label and goto and instead return
early if we are not in a rename.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 1e75ef03 17-May-2023 Filipe Manana <fdmanana@suse.com>

btrfs: update comments at btrfs_record_unlink_dir() to be more clear

Update the comments at btrfs_record_unlink_dir() so that they mention
where new names are logged and where old names are removed.

btrfs: update comments at btrfs_record_unlink_dir() to be more clear

Update the comments at btrfs_record_unlink_dir() so that they mention
where new names are logged and where old names are removed. Also, while
at it make the width of the comments closer to 80 columns and capitalize
the sentences and finish them with punctuation.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# d67ba263 17-May-2023 Filipe Manana <fdmanana@suse.com>

btrfs: use inode_logged() at btrfs_record_unlink_dir()

At btrfs_record_unlink_dir() we directly check the logged_trans field of
the given inodes to check if they were previously logged in the curren

btrfs: use inode_logged() at btrfs_record_unlink_dir()

At btrfs_record_unlink_dir() we directly check the logged_trans field of
the given inodes to check if they were previously logged in the current
transaction, and if any of them were, then we can avoid setting the field
last_unlink_trans of the directory to the id of the current transaction if
we are in a rename path. Avoiding that can later prevent falling back to
a transaction commit if anyone attempts to log the directory.

However the logged_trans field, store in struct btrfs_inode, is transient,
not persisted in the inode item on its subvolume b+tree, so that means
that if an inode is evicted and then loaded again, its original value is
lost and it's reset to 0. So directly checking the logged_trans field can
lead to some false negative, and that only results in a performance impact
as mentioned before.

Instead of directly checking the logged_trans field of the inodes, use the
inode_logged() helper, which will check in the log tree if an inode was
logged before in case its logged_trans field has a value of 0. This way
we can avoid setting the directory inode's last_unlink_trans and cause
future logging attempts of it to fallback to transaction commits. The
following test script shows one example where this happens without this
patch:

$ cat test.sh
#!/bin/bash

DEV=/dev/nullb0
MNT=/mnt/nullb0

num_init_files=10000
num_new_files=10000

mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT

mkdir $MNT/testdir
for ((i = 1; i <= $num_init_files; i++)); do
echo -n > $MNT/testdir/file_$i
done

echo -n > $MNT/testdir/foo

sync

# Add some files so that there's more work in the transaction other
# than just renaming file foo.
for ((i = 1; i <= $num_new_files; i++)); do
echo -n > $MNT/testdir/new_file_$i
done

# Change the file, fsync it.
setfattr -n user.x1 -v 123 $MNT/testdir/foo
xfs_io -c "fsync" $MNT/testdir/foo

# Now triggger eviction of file foo but no eviction for our test
# directory, since it is being used by the process below. This will
# set logged_trans of the file's inode to 0 once it is loaded again.
(
cd $MNT/testdir
while true; do
:
done
) &
pid=$!

echo 2 > /proc/sys/vm/drop_caches

kill $pid
wait $pid

# Move foo out of our testdir. This will set last_unlink_trans
# of the directory inode to the current transaction, because
# logged_trans of both the directory and the file are set to 0.
mv $MNT/testdir/foo $MNT/foo

# Change file foo again and fsync it.
# This fsync will result in a transaction commit because the rename
# above has set last_unlink_trans of the parent directory to the id
# of the current transaction and because our inode for file foo has
# last_unlink_trans set to the current transaction, since it was
# evicted and reloaded and it was previously modified in the current
# transaction (the xattr addition).
xfs_io -c "pwrite 0 64K" $MNT/foo
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/foo
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))

echo "file fsync took: $dur milliseconds"

umount $MNT

Before this patch: fsync took 19 milliseconds
After this patch: fsync took 5 milliseconds

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# bf1f4fd3 17-May-2023 Filipe Manana <fdmanana@suse.com>

btrfs: use inode_logged() at need_log_inode()

At need_log_inode() we directly check the ->logged_trans field of the
given inode to check if it was previously logged in the transaction, with
the goal

btrfs: use inode_logged() at need_log_inode()

At need_log_inode() we directly check the ->logged_trans field of the
given inode to check if it was previously logged in the transaction, with
the goal of skipping logging the inode again when it's not necessary.
The ->logged_trans field in not persisted in the inode item or elsewhere,
it's only stored in memory (struct btrfs_inode), so it's transient and
lost once the inode is evicted and then loaded again. Once an inode is
loaded, we are conservative and set ->logged_trans to 0, which may mean
that either the inode was never logged in the current transaction or it
was logged but evicted before being loaded again.

Instead of checking the inode's ->logged_trans field directly, we can
use instead the helper inode_logged(), which will really check if the
inode was logged before in the current transaction in case we have a
->logged_trans field with a value of 0. This will prevent unnecessarily
logging an inode when it's not needed, and in some cases preventing a
transaction commit, in case the logging requires a fallback to a
transaction commit. The following test script shows a scenario where
due to eviction we fallback a transaction commit when trying to fsync
a file that was renamed:

$ cat test.sh
#!/bin/bash

DEV=/dev/nullb0
MNT=/mnt/nullb0

num_init_files=10000
num_new_files=10000

mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT

mkdir $MNT/testdir
for ((i = 1; i <= $num_init_files; i++)); do
echo -n > $MNT/testdir/file_$i
done

echo -n > $MNT/testdir/foo

sync

# Add some files so that there's more work in the transaction other
# than just renaming file foo.
for ((i = 1; i <= $num_new_files; i++)); do
echo -n > $MNT/testdir/new_file_$i
done

# Fsync the directory first.
xfs_io -c "fsync" $MNT/testdir

# Rename file foo.
mv $MNT/testdir/foo $MNT/testdir/bar

# Now trigger eviction of the test directory's inode.
# Once loaded again, it will have logged_trans set to 0 and
# last_unlink_trans set to the current transaction.
echo 2 > /proc/sys/vm/drop_caches

# Fsync file bar (ex-foo).
# Before the patch the fsync would result in a transaction commit
# because the inode for file bar has last_unlink_trans set to the
# current transaction, so it will attempt to log the parent directory
# as well, which will fallback to a full transaction commit because
# it also has its last_unlink_trans set to the current transaction,
# due to the inode eviction.
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir/bar
end=$(date +%s%N)
dur=$(( (end - start) / 1000000 ))

echo "file fsync took: $dur milliseconds"

umount $MNT

Before this patch: fsync took 22 milliseconds
After this patch: fsync took 8 milliseconds

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.29
# 8fd9f423 15-May-2023 Shida Zhang <zhangshida@kylinos.cn>

btrfs: fix an uninitialized variable warning in btrfs_log_inode

This fixes the following warning reported by gcc 10.2.1 under x86_64:

../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’:
../fs/bt

btrfs: fix an uninitialized variable warning in btrfs_log_inode

This fixes the following warning reported by gcc 10.2.1 under x86_64:

../fs/btrfs/tree-log.c: In function ‘btrfs_log_inode’:
../fs/btrfs/tree-log.c:6211:9: error: ‘last_range_start’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
6211 | ret = insert_dir_log_key(trans, log, path, key.objectid,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6212 | first_dir_index, last_dir_index);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/btrfs/tree-log.c:6161:6: note: ‘last_range_start’ was declared here
6161 | u64 last_range_start;
| ^~~~~~~~~~~~~~~~

This might be a false positive fixed in later compiler versions but we
want to have it fixed.

Reported-by: k2ci <kernel-bot@kylinos.cn>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23
# 5d3e4f1d 05-Apr-2023 Filipe Manana <fdmanana@suse.com>

btrfs: use log root when iterating over index keys when logging directory

When logging dir dentries of a directory, we iterate over the subvolume
tree to find dir index keys on leaves modified in th

btrfs: use log root when iterating over index keys when logging directory

When logging dir dentries of a directory, we iterate over the subvolume
tree to find dir index keys on leaves modified in the current transaction.
This however is heavy on locking, since btrfs_search_forward() may often
keep locks on extent buffers for quite a while when walking the tree to
find a suitable leaf modified in the current transaction and with a key
not smaller than then the provided minimum key. That means it will block
other tasks trying to access the subvolume tree, which may be common fs
operations like creating, renaming, linking, unlinking, reflinking files,
etc.

A better solution is to iterate the log tree, since it's much smaller than
a subvolume tree and just use plain btrfs_search_slot() (or the wrapper
btrfs_for_each_slot()) and only contains dir index keys added in the
current transaction.

The following bonnie++ test on a non-debug kernel (with Debian's default
kernel config) on a 20G null block device, was used to measure the impact:

$ cat test.sh
#!/bin/bash

DEV=/dev/nullb0
MNT=/mnt/nullb0

NR_DIRECTORIES=20
NR_FILES=20480 # must be a multiple of 1024
DATASET_SIZE=$(( (8 * 1024 * 1024 * 1024) / 1048576 )) # 8 GiB as megabytes
DIRECTORY_SIZE=$(( DATASET_SIZE / NR_FILES ))
NR_FILES=$(( NR_FILES / 1024 ))

umount $DEV &> /dev/null
mkfs.btrfs -f $DEV
mount $DEV $MNT

bonnie++ -u root -d $MNT \
-n $NR_FILES:$DIRECTORY_SIZE:$DIRECTORY_SIZE:$NR_DIRECTORIES \
-r 0 -s $DATASET_SIZE -b

umount $MNT

Before patchset:

Version 2.00a ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
debian0 8G 376k 99 1.1g 98 939m 92 1527k 99 3.2g 99 9060 256
Latency 24920us 207us 680ms 5594us 171us 2891us
Version 2.00a ------Sequential Create------ --------Random Create--------
debian0 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
20/20 20480 96 +++++ +++ 20480 95 20480 99 +++++ +++ 20480 97
Latency 8708us 137us 5128us 6743us 60us 19712us

After patchset:

Version 2.00a ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
debian0 8G 384k 99 1.2g 99 971m 91 1533k 99 3.3g 99 9180 309
Latency 24930us 125us 661ms 5587us 46us 2020us
Version 2.00a ------Sequential Create------ --------Random Create--------
debian0 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
20/20 20480 90 +++++ +++ 20480 99 20480 99 +++++ +++ 20480 97
Latency 7030us 61us 1246us 4942us 56us 16855us

The patchset consists of this patch plus a previous one that has the
following subject:

"btrfs: avoid iterating over all indexes when logging directory"

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# fa4b8cb1 05-Apr-2023 Filipe Manana <fdmanana@suse.com>

btrfs: avoid iterating over all indexes when logging directory

When logging a directory, after copying all directory index items from the
subvolume tree to the log tree, we iterate over the subvolum

btrfs: avoid iterating over all indexes when logging directory

When logging a directory, after copying all directory index items from the
subvolume tree to the log tree, we iterate over the subvolume tree to find
all dir index items that are located in leaves COWed (or created) in the
current transaction. If we keep logging a directory several times during
the same transaction, we end up iterating over the same dir index items
everytime we log the directory, wasting time and adding extra lock
contention on the subvolume tree.

So just keep track of the last logged dir index offset in order to start
the search for that index (+1) the next time the directory is logged, as
dir index values (key offsets) come from a monotonically increasing
counter.

The following test measures the difference before and after this change:

$ cat test.sh
#!/bin/bash

DEV=/dev/nullb0
MNT=/mnt/nullb0

umount $DEV &> /dev/null
mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT

# Time values in milliseconds.
declare -a fsync_times
# Total number of files added to the test directory.
num_files=1000000
# Fsync directory after every N files are added.
fsync_period=100

mkdir $MNT/testdir

fsync_total_time=0
for ((i = 1; i <= $num_files; i++)); do
echo -n > $MNT/testdir/file_$i

if [ $((i % fsync_period)) -eq 0 ]; then
start=$(date +%s%N)
xfs_io -c "fsync" $MNT/testdir
end=$(date +%s%N)
fsync_total_time=$((fsync_total_time + (end - start)))
fsync_times[i]=$(( (end - start) / 1000000 ))
echo -n -e "Progress $i / $num_files\r"
fi
done

echo -e "\nHistogram of directory fsync duration in ms:\n"

printf '%s\n' "${fsync_times[@]}" | \
perl -MStatistics::Histogram -e '@d = <>; print get_histogram(\@d);'

fsync_total_time=$((fsync_total_time / 1000000))
echo -e "\nTotal time spent in fsync: $fsync_total_time ms\n"
echo

umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config)
against a 15G null block device.

Result before this change:

Histogram of directory fsync duration in ms:

Count: 10000
Range: 3.000 - 362.000; Mean: 34.556; Median: 31.000; Stddev: 25.751
Percentiles: 90th: 71.000; 95th: 77.000; 99th: 81.000
3.000 - 5.278: 1423 #################################
5.278 - 8.854: 1173 ###########################
8.854 - 14.467: 591 ##############
14.467 - 23.277: 1025 #######################
23.277 - 37.105: 1422 #################################
37.105 - 58.809: 2036 ###############################################
58.809 - 92.876: 2316 #####################################################
92.876 - 146.346: 6 |
146.346 - 230.271: 6 |
230.271 - 362.000: 2 |

Total time spent in fsync: 350527 ms

Result after this change:

Histogram of directory fsync duration in ms:

Count: 10000
Range: 3.000 - 1088.000; Mean: 8.704; Median: 8.000; Stddev: 12.576
Percentiles: 90th: 12.000; 95th: 14.000; 99th: 17.000
3.000 - 6.007: 3222 #################################
6.007 - 11.276: 5197 #####################################################
11.276 - 20.506: 1551 ################
20.506 - 36.674: 24 |
36.674 - 201.552: 1 |
201.552 - 353.841: 4 |
353.841 - 1088.000: 1 |

Total time spent in fsync: 92114 ms

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# e6b430f8 03-Apr-2023 Christoph Hellwig <hch@lst.de>

btrfs: tree-log: factor out a clean_log_buffer helper

The tree-log code has three almost identical copies for the accounting on
an extent_buffer that doesn't need to be written any more. The only
d

btrfs: tree-log: factor out a clean_log_buffer helper

The tree-log code has three almost identical copies for the accounting on
an extent_buffer that doesn't need to be written any more. The only
difference is that walk_down_log_tree passed the bytenr used to find the
buffer instead of extent_buffer.start and calculates the length using the
nodesize, while the other two callers look at the extent_buffer.len
field that must always be equivalent to the nodesize.

Factor the code into a common helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14
# fdf8d595 23-Feb-2023 Anand Jain <anand.jain@oracle.com>

btrfs: open code btrfs_bin_search()

btrfs_bin_search() is a simple wrapper that searches for the whole slots
by calling btrfs_generic_bin_search() with the starting slot/first_slot
preset to 0.

Thi

btrfs: open code btrfs_bin_search()

btrfs_bin_search() is a simple wrapper that searches for the whole slots
by calling btrfs_generic_bin_search() with the starting slot/first_slot
preset to 0.

This simple wrapper can be open coded as btrfs_bin_search().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9
# 79b02ec1 26-Jan-2023 Josef Bacik <josef@toxicpanda.com>

btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback

This is used in the tree-log code and is a holdover from previous
iterations of extent buffer writeback. We can sim

btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback

This is used in the tree-log code and is a holdover from previous
iterations of extent buffer writeback. We can simply use
wait_on_extent_buffer_writeback here, and remove
btrfs_wait_tree_block_writeback completely as it's equivalent (waiting
on page write writeback).

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 190a8339 26-Jan-2023 Josef Bacik <josef@toxicpanda.com>

btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty

btrfs_clean_tree_block is a misnomer, it's just
clear_extent_buffer_dirty with some extra accounting around it. Rename
this to btrfs

btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty

btrfs_clean_tree_block is a misnomer, it's just
clear_extent_buffer_dirty with some extra accounting around it. Rename
this to btrfs_clear_buffer_dirty to make it more clear it belongs with
it's setter, btrfs_mark_buffer_dirty.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# c4e54a65 26-Jan-2023 Josef Bacik <josef@toxicpanda.com>

btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block

Now that we're passing in the trans into btrfs_clean_tree_block, we can
easily roll in the handling of the !trans case and repl

btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block

Now that we're passing in the trans into btrfs_clean_tree_block, we can
easily roll in the handling of the !trans case and replace all
occurrences of

if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags))
clear_extent_buffer_dirty(eb);

with

btrfs_tree_lock(eb);
btrfs_clean_tree_block(eb);
btrfs_tree_unlock(eb);

We need the lock because if we are actually dirty we need to make sure
we aren't racing with anything that's starting writeout currently. This
also makes sure that we're accounting fs_info->dirty_metadata_bytes
appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# ed25dab3 26-Jan-2023 Josef Bacik <josef@toxicpanda.com>

btrfs: add trans argument to btrfs_clean_tree_block

We check the header generation in the extent buffer against the current
running transaction id to see if it's safe to clear DIRTY on this
buffer.

btrfs: add trans argument to btrfs_clean_tree_block

We check the header generation in the extent buffer against the current
running transaction id to see if it's safe to clear DIRTY on this
buffer. Generally speaking if we're clearing the buffer dirty we're
holding the transaction open, but in the case of cleaning up an aborted
transaction we don't, so we have extra checks in that path to check the
transid. To allow for a future cleanup go ahead and pass in the trans
handle so we don't have to rely on ->running_transaction being set.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19
# 235e1c7b 10-Jan-2023 Filipe Manana <fdmanana@suse.com>

btrfs: use a single variable to track return value for log_dir_items()

We currently use 'ret' and 'err' to track the return value for
log_dir_items(), which is confusing and likely the cause for pre

btrfs: use a single variable to track return value for log_dir_items()

We currently use 'ret' and 'err' to track the return value for
log_dir_items(), which is confusing and likely the cause for previous
bugs where log_dir_items() did not return an error when it should, fixed
in previous patches.

So change this and use only a single variable, 'ret', to track the return
value. This is simpler and makes it similar to most of the existing code.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 5cce1780 10-Jan-2023 Filipe Manana <fdmanana@suse.com>

btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT

Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value
has a few inconveniences:

1) If it's ever used by btrfs_log_inode(), o

btrfs: use a negative value for BTRFS_LOG_FORCE_COMMIT

Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value
has a few inconveniences:

1) If it's ever used by btrfs_log_inode(), or any function down the call
chain, we have to remember to btrfs_set_log_full_commit(), which is
repetitive and has a chance to be forgotten in future use cases.
btrfs_log_inode_parent() only calls btrfs_set_log_full_commit() when
it gets a negative value from btrfs_log_inode();

2) Down the call chain of btrfs_log_inode(), we may have functions that
need to force a log commit, but can return either an error (negative
value), false (0) or true (1). So they are forced to return some
random negative to force a log commit - using BTRFS_LOG_FORCE_COMMIT
would make the intention more clear. Currently the only example is
flush_dir_items_batch().

So turn BTRFS_LOG_FORCE_COMMIT into a negative value. The chosen value
is -(MAX_ERRNO + 1), so that it does not overlap any errno value and makes
it easier to debug.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 6afaed53 10-Jan-2023 Filipe Manana <fdmanana@suse.com>

btrfs: simplify update of last_dir_index_offset when logging a directory

When logging a directory, we always set the inode's last_dir_index_offset
to the offset of the last dir index item we found.

btrfs: simplify update of last_dir_index_offset when logging a directory

When logging a directory, we always set the inode's last_dir_index_offset
to the offset of the last dir index item we found. This is using an extra
field in the log context structure, and it makes more sense to update it
only after we insert dir index items, and we could directly update the
inode's last_dir_index_offset field instead.

So make this simpler by updating the inode's last_dir_index_offset only
when we actually insert dir index keys in the log tree, and getting rid
of the last_dir_item_offset field in the log context structure.

Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
Reported-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/Y8voyTXdnPDz8xwY@mail.gmail.com/
Reported-by: Hunter Wardlaw <wardlawhunter@gmail.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1207231
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216851
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 09e44868 10-Jan-2023 Filipe Manana <fdmanana@suse.com>

btrfs: do not abort transaction on failure to update log root

When syncing a log, if we fail to update a log root in the log root tree,
we are aborting the transaction if the failure was not -ENOSPC

btrfs: do not abort transaction on failure to update log root

When syncing a log, if we fail to update a log root in the log root tree,
we are aborting the transaction if the failure was not -ENOSPC. This is
excessive because there is a chance that a transaction commit can succeed,
and therefore avoid to turn the filesystem into RO mode. All we need to be
careful about is to mark the log for a full commit, which we already do,
to make sure no one commits a super block pointing to an outdated log root
tree.

So don't abort the transaction if we fail to update a log root in the log
root tree, and log an error if the failure is not -ENOSPC, so that it does
not go completely unnoticed.

CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


12345678910>>...49