History log of /openbmc/linux/fs/btrfs/scrub.c (Results 1 – 25 of 875)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# e498cc00 09-Apr-2024 Naohiro Aota <naohiro.aota@wdc.com>

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the err

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the errors are fixed, sctx->error_bitmap is cleared for the stripe
range. However, in the zoned mode, it runs relocation to repair scrub
errors when the bitmap is *not* empty, which is a flipped condition.

Also, it runs the relocation even if the scrub is read-only. This was
missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the
read-only flag during repair").

The repair is only necessary when there is a repaired sector and should be
done on read-write scrub. So, tweak the condition for both regular and
zoned case.

Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub")
Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# e498cc00 09-Apr-2024 Naohiro Aota <naohiro.aota@wdc.com>

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the err

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the errors are fixed, sctx->error_bitmap is cleared for the stripe
range. However, in the zoned mode, it runs relocation to repair scrub
errors when the bitmap is *not* empty, which is a flipped condition.

Also, it runs the relocation even if the scrub is read-only. This was
missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the
read-only flag during repair").

The repair is only necessary when there is a repaired sector and should be
done on read-write scrub. So, tweak the condition for both regular and
zoned case.

Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub")
Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26
# e498cc00 09-Apr-2024 Naohiro Aota <naohiro.aota@wdc.com>

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the err

btrfs: scrub: run relocation repair when/only needed

commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.

When btrfs scrub finds an error, it reads mirrors to find correct data. If
all the errors are fixed, sctx->error_bitmap is cleared for the stripe
range. However, in the zoned mode, it runs relocation to repair scrub
errors when the bitmap is *not* empty, which is a flipped condition.

Also, it runs the relocation even if the scrub is read-only. This was
missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the
read-only flag during repair").

The repair is only necessary when there is a repaired sector and should be
done on read-write scrub. So, tweak the condition for both regular and
zoned case.

Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub")
Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.25, v6.6.24, v6.6.23
# 8ca8aac4 26-Feb-2024 Johannes Thumshirn <johannes.thumshirn@wdc.com>

btrfs: zoned: use zone aware sb location for scrub

commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.

At the moment scrub_supers() doesn't grab the super block's location via
the zoned devic

btrfs: zoned: use zone aware sb location for scrub

commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.

At the moment scrub_supers() doesn't grab the super block's location via
the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset().

This leads to checksum errors on 'scrub' as we're not accessing the
correct location of the super block.

So use btrfs_sb_log_location() for getting the super blocks location on
scrub.

Reported-by: WA AM <waautomata@gmail.com>
Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.25, v6.6.24, v6.6.23
# 8ca8aac4 26-Feb-2024 Johannes Thumshirn <johannes.thumshirn@wdc.com>

btrfs: zoned: use zone aware sb location for scrub

commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.

At the moment scrub_supers() doesn't grab the super block's location via
the zoned devic

btrfs: zoned: use zone aware sb location for scrub

commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.

At the moment scrub_supers() doesn't grab the super block's location via
the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset().

This leads to checksum errors on 'scrub' as we're not accessing the
correct location of the super block.

So use btrfs_sb_log_location() for getting the super blocks location on
scrub.

Reported-by: WA AM <waautomata@gmail.com>
Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13
# 642b9c52 16-Jan-2024 Qu Wenruo <wqu@suse.com>

btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned

commit f546c4282673497a06ecb6190b50ae7f6c85b02f upstream.

[BUG]
There is a bug report that, on a ext4-converted btrfs, scrub

btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned

commit f546c4282673497a06ecb6190b50ae7f6c85b02f upstream.

[BUG]
There is a bug report that, on a ext4-converted btrfs, scrub leads to
various problems, including:

- "unable to find chunk map" errors
BTRFS info (device vdb): scrub: started on devid 1
BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 4096
BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 45056

This would lead to unrepariable errors.

- Use-after-free KASAN reports:
==================================================================
BUG: KASAN: slab-use-after-free in __blk_rq_map_sg+0x18f/0x7c0
Read of size 8 at addr ffff8881013c9040 by task btrfs/909
CPU: 0 PID: 909 Comm: btrfs Not tainted 6.7.0-x64v3-dbg #11 c50636e9419a8354555555245df535e380563b2b
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-2 12/24/2023
Call Trace:
<TASK>
dump_stack_lvl+0x43/0x60
print_report+0xcf/0x640
kasan_report+0xa6/0xd0
__blk_rq_map_sg+0x18f/0x7c0
virtblk_prep_rq.isra.0+0x215/0x6a0 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff]
virtio_queue_rqs+0xc4/0x310 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff]
blk_mq_flush_plug_list.part.0+0x780/0x860
__blk_flush_plug+0x1ba/0x220
blk_finish_plug+0x3b/0x60
submit_initial_group_read+0x10a/0x290 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
flush_scrub_stripes+0x38e/0x430 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_stripe+0x82a/0xae0 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_chunk+0x178/0x200 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
scrub_enumerate_chunks+0x4bc/0xa30 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
btrfs_scrub_dev+0x398/0x810 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
btrfs_ioctl+0x4b9/0x3020 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965]
__x64_sys_ioctl+0xbd/0x100
do_syscall_64+0x5d/0xe0
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f47e5e0952b

- Crash, mostly due to above use-after-free

[CAUSE]
The converted fs has the following data chunk layout:

item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2214658048) itemoff 16025 itemsize 80
length 86016 owner 2 stripe_len 65536 type DATA|single

For above logical bytenr 2214744064, it's at the chunk end
(2214658048 + 86016 = 2214744064).

This means btrfs_submit_bio() would split the bio, and trigger endio
function for both of the two halves.

However scrub_submit_initial_read() would only expect the endio function
to be called once, not any more.
This means the first endio function would already free the bbio::bio,
leaving the bvec freed, thus the 2nd endio call would lead to
use-after-free.

[FIX]
- Make sure scrub_read_endio() only updates bits in its range
Since we may read less than 64K at the end of the chunk, we should not
touch the bits beyond chunk boundary.

- Make sure scrub_submit_initial_read() only to read the chunk range
This is done by calculating the real number of sectors we need to
read, and add sector-by-sector to the bio.

Thankfully the scrub read repair path won't need extra fixes:

- scrub_stripe_submit_repair_read()
With above fixes, we won't update error bit for range beyond chunk,
thus scrub_stripe_submit_repair_read() should never submit any read
beyond the chunk.

Reported-by: Rongrong <i@rong.moe>
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Tested-by: Rongrong <i@rong.moe>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Use min_t() to fix a compiling error due to difference types ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6
# 6927a91c 27-Oct-2023 Qu Wenruo <wqu@suse.com>

btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()

[ Upstream commit 47e2b06b7b5cb356a987ba3429550c3a89ea89d6 ]

[BUG]
There is a compilation warning reported on com

btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()

[ Upstream commit 47e2b06b7b5cb356a987ba3429550c3a89ea89d6 ]

[BUG]
There is a compilation warning reported on commit ae76d8e3e135 ("btrfs:
scrub: fix grouping of read IO"), where gcc (14.0.0 20231022 experimental)
is reporting the following uninitialized variable:

fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’:
fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]]
2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN;
fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here
2040 | u64 found_logical;
| ^~~~~~~~~~~~~

[CAUSE]
This is a false alert, as @found_logical is passed as parameter
@found_logical_ret of function queue_scrub_stripe().

As long as queue_scrub_stripe() returned 0, we would update
@found_logical_ret. And if queue_scrub_stripe() returned >0 or <0, the
caller would not utilized @found_logical, thus there should be nothing
wrong.

Although the triggering gcc is still experimental, it looks like the
extra check on "if (found_logical_ret)" can sometimes confuse the
compiler.

Meanwhile the only caller of queue_scrub_stripe() is always passing a
valid pointer, there is no need for such check at all.

[FIX]
Although the report itself is a false alert, we can still make it more
explicit by:

- Replace the check for @found_logical_ret with ASSERT()

- Initialize @found_logical to U64_MAX

- Add one extra ASSERT() to make sure @found_logical got updated

Link: https://lore.kernel.org/linux-btrfs/87fs1x1p93.fsf@gentoo.org/
Fixes: ae76d8e3e135 ("btrfs: scrub: fix grouping of read IO")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43
# 4fe44f9d 03-Aug-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()

Currently the scrub_stripe_read_repair_worker() only does reads to
rebuild the corrupted sectors, it doesn't do

btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()

Currently the scrub_stripe_read_repair_worker() only does reads to
rebuild the corrupted sectors, it doesn't do any writeback.

The design is mostly to put writeback into a more ordered manner, to
co-operate with dev-replace with zoned mode, which requires every write
to be submitted in their bytenr order.

However the writeback for repaired sectors into the original mirror
doesn't need such strong sync requirement, as it can only happen for
non-zoned devices.

This patch would move the writeback for repaired sectors into
scrub_stripe_read_repair_worker(), which removes two calls sites for
repaired sectors writeback. (one from flush_scrub_stripes(), one from
scrub_raid56_parity_stripe())

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 39dc7bd9 03-Aug-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: don't go ordered workqueue for dev-replace

The workqueue fs_info->scrub_worker would go ordered workqueue if it's a
device replace operation.

However the scrub is relying on multiple

btrfs: scrub: don't go ordered workqueue for dev-replace

The workqueue fs_info->scrub_worker would go ordered workqueue if it's a
device replace operation.

However the scrub is relying on multiple workers to do data csum
verification, and we always submit several read requests in a row.

Thus there is no need to use ordered workqueue just for dev-replace.
We have extra synchronization (the main thread will always
submit-and-wait for dev-replace writes) to handle it for zoned devices.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# ae76d8e3 03-Aug-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: fix grouping of read IO

[REGRESSION]
There are several regression reports about the scrub performance with
v6.4 kernel.

On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub sp

btrfs: scrub: fix grouping of read IO

[REGRESSION]
There are several regression reports about the scrub performance with
v6.4 kernel.

On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but
v6.4 can only go 1GB/s, an obvious 66% performance drop.

[CAUSE]
Iostat shows a very different behavior between v6.3 and v6.4 kernel:

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
nvme0n1p3 9731.00 3425544.00 17237.00 63.92 2.18 352.02 21.18 100.00
nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00

The upper one is v6.3 while the lower one is v6.4.

There are several obvious differences:

- Very few read merges
This turns out to be a behavior change that we no longer do bio
plug/unplug.

- Very low aqu-sz
This is due to the submit-and-wait behavior of flush_scrub_stripes(),
and extra extent/csum tree search.

Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ
to merge the reads, while SATA SSDs can not handle high queue depth well
either.

[FIX]
For now this patch focuses on the read speed fix. Dev-replace replace
speed needs more work.

For the read part, we go two directions to fix the problems:

- Re-introduce blk plug/unplug to merge read requests
This is pretty simple, and the behavior is pretty easy to observe.

This would enlarge the average read request size to 512K.

- Introduce multi-group reads and no longer wait for each group
Instead of the old behavior, which submits 8 stripes and waits for
them, here we would enlarge the total number of stripes to 16 * 8.
Which is 8M per device, the same limit as the old scrub in-flight
bios size limit.

Now every time we fill a group (8 stripes), we submit them and
continue to next stripes.

Only when the full 16 * 8 stripes are all filled, we submit the
remaining ones (the last group), and wait for all groups to finish.
Then submit the repair writes and dev-replace writes.

This should enlarge the queue depth.

This would greatly improve the merge rate (thus read block size) and
queue depth:

Before (with regression, and cached extent/csum path):

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
nvme0n1p3 20666.00 1318240.00 10.00 0.05 0.08 63.79 1.63 100.00

After (with all patches applied):

nvme0n1p3 5165.00 2278304.00 30557.00 85.54 0.55 441.10 2.81 100.00

i.e. 1287 to 2224 MB/s.

CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 3c771c19 03-Aug-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: avoid unnecessary csum tree search preparing stripes

One of the bottleneck of the new scrub code is the extra csum tree
search.

The old code would only do the csum tree search for eac

btrfs: scrub: avoid unnecessary csum tree search preparing stripes

One of the bottleneck of the new scrub code is the extra csum tree
search.

The old code would only do the csum tree search for each scrub bio,
which can be as large as 512KiB, thus they can afford to allocate a new
path each time.

But the new scrub code is doing csum tree search for each stripe, which
is only 64KiB, this means we'd better re-use the same csum path during
each search.

This patch would introduce a per-sctx path for csum tree search, as we
don't need to re-allocate the path every time we need to do a csum tree
search.

With this change we can further improve the queue depth and improve the
scrub read performance:

Before (with regression and cached extent tree path):

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00

After (with both cached extent/csum tree path):

nvme0n1p3 17759.00 1133280.00 10.00 0.06 0.08 63.81 1.50 100.00

Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 1dc4888e 03-Aug-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: avoid unnecessary extent tree search preparing stripes

Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror()
to scrub_stripe infrastructure"), scrub no longer re-use t

btrfs: scrub: avoid unnecessary extent tree search preparing stripes

Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror()
to scrub_stripe infrastructure"), scrub no longer re-use the same path
for extent tree search.

This can lead to unnecessary extent tree search, especially for the new
stripe based scrub, as we have way more stripes to prepare.

This patch would re-introduce a shared path for extent tree search, and
properly release it when the block group is scrubbed.

This change alone can improve scrub performance slightly by reducing the
time spend preparing the stripe thus improving the queue depth.

Before (with regression):

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util
nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00

After (with this patch):

nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00

Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34
# 17353a34 14-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: remove unused btrfs_path in scrub_simple_mirror()

The @path in scrub_simple_mirror() is no longer utilized after commit
e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scr

btrfs: scrub: remove unused btrfs_path in scrub_simple_mirror()

The @path in scrub_simple_mirror() is no longer utilized after commit
e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
infrastructure").

Before that commit, we call find_first_extent_item() directly, which
needs a path and that path can be reused. But after that switch commit,
the extent search is done inside queue_scrub_stripe(), which will no
longer accept a path from outside.

So the @path variable can be safely removed.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ remove the stale comment ]
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# cf4ac2b9 21-Jun-2023 Colin Ian King <colin.i.king@gmail.com>

btrfs: scrub: remove redundant division of stripe_nr

Variable stripe_nr is being divided by map->num_stripes however the
result is never read. The division and assignment are redundant and
can be re

btrfs: scrub: remove redundant division of stripe_nr

Variable stripe_nr is being divided by map->num_stripes however the
result is never read. The division and assignment are redundant and
can be removed. Cleans up clang scan build warning:

fs/btrfs/scrub.c:1264:3: warning: Value stored to 'stripe_nr' is
never read [deadcode.DeadStores]

The code is a leftover from 6ded22c1bfe6 ("btrfs: reduce div64 calls by
limiting the number of stripes of a chunk to u32") that converted div64
to normal division, it's the same but previous version did not trigger a
warning.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# b471965f 28-Jul-2023 Anand Jain <anand.jain@oracle.com>

btrfs: fix replace/scrub failure with metadata_uuid

Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
reported a few of the test cases failing.

The failure scenario can be summariz

btrfs: fix replace/scrub failure with metadata_uuid

Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list)
reported a few of the test cases failing.

The failure scenario can be summarized and simplified as follows:

$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0
$ btrfstune -m /dev/sdb1 :0
$ wipefs -a /dev/sdb1 :0
$ mount -o degraded /dev/sdb2 /btrfs :0
$ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1
STDERR:
ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error

[11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0
[11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576

As above, the replace is failing because we are verifying the header with
fs_devices::fsid instead of fs_devices::metadata_uuid, despite the
metadata_uuid actually being present.

To fix this, use fs_devices::metadata_uuid. We copy fsid into
fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.

Fixes: a3ddbaebc7c9 ("btrfs: scrub: introduce a helper to verify one metadata block")
CC: stable@vger.kernel.org # 6.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# cb091225 22-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: fix remaining u32 overflows when left shifting stripe_nr

There was regression caused by a97699d1d610 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a729

btrfs: fix remaining u32 overflows when left shifting stripe_nr

There was regression caused by a97699d1d610 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr").
To avoid code churn the fix was open coding the type casts but
unfortunately missed one which was still possible to hit [1].

The missing place was assignment of bioc->full_stripe_logical inside
btrfs_map_block().

Fix it by adding a helper that does the safe calculation of the offset
and use it everywhere even though it may not be strictly necessary due
to already using u64 types. This replaces all remaining
"<< BTRFS_STRIPE_LEN_SHIFT" calls.

[1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/

Fixes: a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 81db6ae8 12-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers

Since the scrub rework introduced by commit 2af2aaf98205 ("btrfs:
scrub: introduce structure for new BTRFS_STRIPE_LEN based interface"

btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers

Since the scrub rework introduced by commit 2af2aaf98205 ("btrfs:
scrub: introduce structure for new BTRFS_STRIPE_LEN based interface")
and later commits, scrub only needs one single workqueue,
fs_info::scrub_worker.

That scrub_wr_completion_workers is initially to handle the delay work
after write bios finished. But the new scrub code goes submit-and-wait
for write bios, thus all the work are done inside the scrub_worker.

The last user of fs_info::scrub_wr_completion_workers is removed in
commit 16f93993498b ("btrfs: scrub: remove the old writeback
infrastructure"), so we can safely remove the workqueue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# c2bbc0ba 12-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: remove scrub_ctx::csum_list member

Since the rework of scrub introduced by commit 2af2aaf98205 ("btrfs:
scrub: introduce structure for new BTRFS_STRIPE_LEN based interface")
and later

btrfs: scrub: remove scrub_ctx::csum_list member

Since the rework of scrub introduced by commit 2af2aaf98205 ("btrfs:
scrub: introduce structure for new BTRFS_STRIPE_LEN based interface")
and later commits, scrub no longer keeps its data checksum inside sctx.

Instead we have scrub_stripe::csums for the checksum of the stripe.
Thus we can remove the unused scrub_ctx::csum_list member.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.33, v6.1.32
# 723b8bb1 30-May-2023 Christoph Hellwig <hch@lst.de>

btrfs: open code btrfs_map_sblock

btrfs_map_sblock just hard codes three arguments and calls
btrfs_map_sblock. Remove it as it doesn't provide any real value, but
makes following the btrfs_map_bloc

btrfs: open code btrfs_map_sblock

btrfs_map_sblock just hard codes three arguments and calls
btrfs_map_sblock. Remove it as it doesn't provide any real value, but
makes following the btrfs_map_block call chains harder.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.31
# 58e814fc 25-May-2023 Tejun Heo <tj@kernel.org>

btrfs: use alloc_ordered_workqueue() to create ordered workqueues

BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. T

btrfs: use alloc_ordered_workqueue() to create ordered workqueues

BACKGROUND
==========

When multiple work items are queued to a workqueue, their execution order
doesn't match the queueing order. They may get executed in any order and
simultaneously. When fully serialized execution - one by one in the queueing
order - is needed, an ordered workqueue should be used which can be created
with alloc_ordered_workqueue().

However, alloc_ordered_workqueue() was a later addition. Before it, an
ordered workqueue could be obtained by creating an UNBOUND workqueue with
@max_active==1. This originally was an implementation side-effect which was
broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered"). Because there were users that depended on the ordered execution,
5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
made workqueue allocation path to implicitly promote UNBOUND workqueues w/
@max_active==1 to ordered workqueues.

While this has worked okay, overloading the UNBOUND allocation interface
this way creates other issues. It's difficult to tell whether a given
workqueue actually needs to be ordered and users that legitimately want a
min concurrency level wq unexpectedly gets an ordered one instead. With
planned UNBOUND workqueue updates to improve execution locality and more
prevalence of chiplet designs which can benefit from such improvements, this
isn't a state we wanna be in forever.

This patch series audits all call sites that create an UNBOUND workqueue w/
@max_active==1 and converts them to alloc_ordered_workqueue() as necessary.

BTRFS
=====

* fs_info->scrub_workers initialized in scrub_workers_get() was setting
@max_active to 1 when @is_dev_replace is set and it seems that the
workqueue actually needs to be ordered if @is_dev_replace. Update the code
so that alloc_ordered_workqueue() is used if @is_dev_replace.

* fs_info->discard_ctl.discard_workers initialized in
btrfs_init_workqueues() was directly using alloc_workqueue() w/
@max_active==1. Converted to alloc_ordered_workqueue().

* fs_info->fixup_workers and fs_info->qgroup_rescan_workers initialized in
btrfs_queue_work() use the btrfs's workqueue wrapper, btrfs_workqueue,
which are allocated with btrfs_alloc_workqueue().

btrfs_workqueue implements automatic @max_active adjustment which is
disabled when the specified max limit is below a certain threshold, so
calling btrfs_alloc_workqueue() with @limit_active==1 yields an ordered
workqueue whose @max_active won't be changed as the auto-tuning is
disabled.

This is rather brittle in that nothing clearly indicates that the two
workqueues should be ordered or btrfs_alloc_workqueue() must disable
auto-tuning when @limit_active==1.

This patch factors out the common btrfs_workqueue init code into
btrfs_init_workqueue() and add explicit btrfs_alloc_ordered_workqueue().
The two workqueues are converted to use the new ordered allocation
interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.30, v6.1.29
# b9cb105e 12-May-2023 Jiapeng Chong <jiapeng.chong@linux.alibaba.com>

btrfs: scrub: remove more unused functions

These functions are defined in the scrub.c file, but last callers were
removed in e9255d6c4054 ("btrfs: scrub: remove the old scrub recheck
code").

fs/btr

btrfs: scrub: remove more unused functions

These functions are defined in the scrub.c file, but last callers were
removed in e9255d6c4054 ("btrfs: scrub: remove the old scrub recheck
code").

fs/btrfs/scrub.c:553:20: warning: unused function 'scrub_stripe_index_and_offset'.
fs/btrfs/scrub.c:543:19: warning: unused function 'scrub_nr_raid_mirrors'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4937
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.28
# b7f9945a 11-May-2023 Qu Wenruo <wqu@suse.com>

btrfs: handle tree backref walk error properly

[BUG]
Smatch reports the following errors related to commit ("btrfs: output
affected files when relocation fails"):

fs/btrfs/inode.c:283 print_data_r

btrfs: handle tree backref walk error properly

[BUG]
Smatch reports the following errors related to commit ("btrfs: output
affected files when relocation fails"):

fs/btrfs/inode.c:283 print_data_reloc_error()
error: uninitialized symbol 'ref_level'.

[CAUSE]
That part of code is mostly copied from scrub, but unfortunately scrub
code from the beginning is not doing the error handling properly.

The offending code looks like this:

do {
ret = tree_backref_for_extent();
btrfs_warn_rl();
} while (ret != 1);

There are several problems involved:

- No error handling
If that tree_backref_for_extent() failed, we would output the same
error again and again, never really exit as it requires ret == 1 to
exit.

- Always do one extra output
As tree_backref_for_extent() only return > 0 if there is no more
backref item.
This means after the last item we hit, we would output an invalid
error message for ret > 0 case.

[FIX]
Fix the old code by:

- Move @ref_root and @ref_level into the if branch
And do not initialize them, so we can catch such uninitialized values
just like what we do in the inode.c

- Explicitly check the return value of tree_backref_for_extent()
And handle ret < 0 and ret > 0 cases properly.

- No more do {} while () loop
Instead go while (true) {} loop since we will handle @ret manually.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24
# 94ead93e 13-Apr-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read

For P/Q stripe scrub, we have quite some duplicated read IO:

- Data stripes read for verification
This is triggered by

btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read

For P/Q stripe scrub, we have quite some duplicated read IO:

- Data stripes read for verification
This is triggered by the scrub_submit_initial_read() inside
scrub_raid56_parity_stripe().

- Data stripes read (again) for P/Q stripe verification
This is triggered by scrub_assemble_read_bios() from scrub_rbio().

Although we can have hit rbio cache and avoid unnecessary read, the
chance is very low, as scrub would easily flush the whole rbio cache.

This means, even we're just scrubbing a single P/Q stripe, we would read
the data stripes twice for the best case scenario. If we need to
recover some data stripes, it would cause more reads on the same data
stripes, again and again.

However before we call raid56_parity_submit_scrub_rbio() we already
have all data stripes repaired and their contents ready to use.
But RAID56 cache is unaware about the scrub cache, thus RAID56 layer
itself still needs to re-read the data stripes.

To avoid such cache miss, this patch would:

- Introduce a new helper, raid56_parity_cache_data_pages()
This function would grab the pages from an array, and copy the content
to the rbio, marking all the involved sectors uptodate.

The page copy is unavoidable because of the cache pages of rbio are all
self managed, thus can not utilize outside pages without screwing up
the lifespan.

- Use the repaired data stripes as cache inside
scrub_raid56_parity_stripe()

By this, we ensure all the data sectors of the scrub rbio are already
uptodate, and no need to read them again from disk.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# b50f2d04 14-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: fix a return value overwrite in scrub_stripe()

[RETURN VALUE OVERWRITE]
Inside scrub_stripe(), we would submit all the remaining stripes after
iterating all extents.

But since flush_s

btrfs: scrub: fix a return value overwrite in scrub_stripe()

[RETURN VALUE OVERWRITE]
Inside scrub_stripe(), we would submit all the remaining stripes after
iterating all extents.

But since flush_scrub_stripes() can return error, we need to avoid
overwriting the existing @ret if there is any error.

However the existing check is doing the wrong check:

ret2 = flush_scrub_stripes();
if (!ret2)
ret = ret2;

This would overwrite the existing @ret to 0 as long as the final flush
detects no critical errors.

[FIX]
We should check @ret other than @ret2 in that case.

Fixes: 8eb3dd17eadd ("btrfs: dev-replace: error out if we have unrepaired metadata error during")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


# 79b8ee70 06-Jun-2023 Qu Wenruo <wqu@suse.com>

btrfs: scrub: also report errors hit during the initial read

[BUG]
After the recent scrub rework introduced in commit e02ee89baa66 ("btrfs:
scrub: switch scrub_simple_mirror() to scrub_stripe infras

btrfs: scrub: also report errors hit during the initial read

[BUG]
After the recent scrub rework introduced in commit e02ee89baa66 ("btrfs:
scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"),
btrfs scrub no longer reports repaired errors any more:

# mkfs.btrfs -f $dev -d DUP
# mount $dev $mnt
# xfs_io -f -d -c "pwrite -b 64K -S 0xaa 0 64" $mnt/file
# umount $dev
# xfs_io -f -c "pwrite -S 0xff $phy1 64K" $dev # Corrupt the first mirror
# mount $dev $mnt
# btrfs scrub start -BR $mnt
scrub done for 725e7cb7-8a4a-4c77-9f2a-86943619e218
Scrub started: Tue Jun 6 14:56:50 2023
Status: finished
Duration: 0:00:00
data_extents_scrubbed: 2
tree_extents_scrubbed: 18
data_bytes_scrubbed: 131072
tree_bytes_scrubbed: 294912
read_errors: 0
csum_errors: 0 <<< No errors here
verify_errors: 0
[...]
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 16 <<< Only corrected errors
last_physical: 2723151872

This can confuse btrfs-progs, as it relies on the csum_errors to
determine if there is anything wrong.

While on v6.3.x kernels, the report is different:

csum_errors: 16 <<<
verify_errors: 0
[...]
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 16 <<<

[CAUSE]
In the reworked scrub, we update the scrub progress inside
scrub_stripe_report_errors(), using various bitmaps to update the
result.

For example for csum_errors, we use bitmap_weight() of
stripe->csum_error_bitmap.

Unfortunately at that stage, all error bitmaps (except
init_error_bitmap) are the result of the latest repair attempt, thus if
the stripe is fully repaired, those error bitmaps will all be empty,
resulting the above output mismatch.

To fix this, record the number of errors into stripe->init_nr_*_errors.
Since we don't really care about where those errors are, we only need to
record the number of errors.

Then in scrub_stripe_report_errors(), use those initial numbers to
update the progress other than using the latest error bitmaps.

Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

show more ...


12345678910>>...35