Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
e498cc00 |
| 09-Apr-2024 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the err
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the errors are fixed, sctx->error_bitmap is cleared for the stripe range. However, in the zoned mode, it runs relocation to repair scrub errors when the bitmap is *not* empty, which is a flipped condition.
Also, it runs the relocation even if the scrub is read-only. This was missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair").
The repair is only necessary when there is a repaired sector and should be done on read-write scrub. So, tweak the condition for both regular and zoned case.
Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub") Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
e498cc00 |
| 09-Apr-2024 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the err
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the errors are fixed, sctx->error_bitmap is cleared for the stripe range. However, in the zoned mode, it runs relocation to repair scrub errors when the bitmap is *not* empty, which is a flipped condition.
Also, it runs the relocation even if the scrub is read-only. This was missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair").
The repair is only necessary when there is a repaired sector and should be done on read-write scrub. So, tweak the condition for both regular and zoned case.
Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub") Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26 |
|
#
e498cc00 |
| 09-Apr-2024 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the err
btrfs: scrub: run relocation repair when/only needed
commit 7192833c4e55b26e8f15ef58577867a1bc808036 upstream.
When btrfs scrub finds an error, it reads mirrors to find correct data. If all the errors are fixed, sctx->error_bitmap is cleared for the stripe range. However, in the zoned mode, it runs relocation to repair scrub errors when the bitmap is *not* empty, which is a flipped condition.
Also, it runs the relocation even if the scrub is read-only. This was missed by a fix in commit 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair").
The repair is only necessary when there is a repaired sector and should be done on read-write scrub. So, tweak the condition for both regular and zoned case.
Fixes: 54765392a1b9 ("btrfs: scrub: introduce helper to queue a stripe for scrub") Fixes: 1f2030ff6e49 ("btrfs: scrub: respect the read-only flag during repair") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
8ca8aac4 |
| 26-Feb-2024 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: zoned: use zone aware sb location for scrub
commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.
At the moment scrub_supers() doesn't grab the super block's location via the zoned devic
btrfs: zoned: use zone aware sb location for scrub
commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.
At the moment scrub_supers() doesn't grab the super block's location via the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset().
This leads to checksum errors on 'scrub' as we're not accessing the correct location of the super block.
So use btrfs_sb_log_location() for getting the super blocks location on scrub.
Reported-by: WA AM <waautomata@gmail.com> Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
8ca8aac4 |
| 26-Feb-2024 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: zoned: use zone aware sb location for scrub
commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.
At the moment scrub_supers() doesn't grab the super block's location via the zoned devic
btrfs: zoned: use zone aware sb location for scrub
commit 74098a989b9c3370f768140b7783a7aaec2759b3 upstream.
At the moment scrub_supers() doesn't grab the super block's location via the zoned device aware btrfs_sb_log_location() but via btrfs_sb_offset().
This leads to checksum errors on 'scrub' as we're not accessing the correct location of the super block.
So use btrfs_sb_log_location() for getting the super blocks location on scrub.
Reported-by: WA AM <waautomata@gmail.com> Link: http://lore.kernel.org/linux-btrfs/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13 |
|
#
642b9c52 |
| 16-Jan-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned
commit f546c4282673497a06ecb6190b50ae7f6c85b02f upstream.
[BUG] There is a bug report that, on a ext4-converted btrfs, scrub
btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned
commit f546c4282673497a06ecb6190b50ae7f6c85b02f upstream.
[BUG] There is a bug report that, on a ext4-converted btrfs, scrub leads to various problems, including:
- "unable to find chunk map" errors BTRFS info (device vdb): scrub: started on devid 1 BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 4096 BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 45056
This would lead to unrepariable errors.
- Use-after-free KASAN reports: ================================================================== BUG: KASAN: slab-use-after-free in __blk_rq_map_sg+0x18f/0x7c0 Read of size 8 at addr ffff8881013c9040 by task btrfs/909 CPU: 0 PID: 909 Comm: btrfs Not tainted 6.7.0-x64v3-dbg #11 c50636e9419a8354555555245df535e380563b2b Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-2 12/24/2023 Call Trace: <TASK> dump_stack_lvl+0x43/0x60 print_report+0xcf/0x640 kasan_report+0xa6/0xd0 __blk_rq_map_sg+0x18f/0x7c0 virtblk_prep_rq.isra.0+0x215/0x6a0 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff] virtio_queue_rqs+0xc4/0x310 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff] blk_mq_flush_plug_list.part.0+0x780/0x860 __blk_flush_plug+0x1ba/0x220 blk_finish_plug+0x3b/0x60 submit_initial_group_read+0x10a/0x290 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] flush_scrub_stripes+0x38e/0x430 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_stripe+0x82a/0xae0 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_chunk+0x178/0x200 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_enumerate_chunks+0x4bc/0xa30 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] btrfs_scrub_dev+0x398/0x810 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] btrfs_ioctl+0x4b9/0x3020 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] __x64_sys_ioctl+0xbd/0x100 do_syscall_64+0x5d/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f47e5e0952b
- Crash, mostly due to above use-after-free
[CAUSE] The converted fs has the following data chunk layout:
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2214658048) itemoff 16025 itemsize 80 length 86016 owner 2 stripe_len 65536 type DATA|single
For above logical bytenr 2214744064, it's at the chunk end (2214658048 + 86016 = 2214744064).
This means btrfs_submit_bio() would split the bio, and trigger endio function for both of the two halves.
However scrub_submit_initial_read() would only expect the endio function to be called once, not any more. This means the first endio function would already free the bbio::bio, leaving the bvec freed, thus the 2nd endio call would lead to use-after-free.
[FIX] - Make sure scrub_read_endio() only updates bits in its range Since we may read less than 64K at the end of the chunk, we should not touch the bits beyond chunk boundary.
- Make sure scrub_submit_initial_read() only to read the chunk range This is done by calculating the real number of sectors we need to read, and add sector-by-sector to the bio.
Thankfully the scrub read repair path won't need extra fixes:
- scrub_stripe_submit_repair_read() With above fixes, we won't update error bit for range beyond chunk, thus scrub_stripe_submit_repair_read() should never submit any read beyond the chunk.
Reported-by: Rongrong <i@rong.moe> Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") Tested-by: Rongrong <i@rong.moe> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [ Use min_t() to fix a compiling error due to difference types ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6 |
|
#
6927a91c |
| 27-Oct-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()
[ Upstream commit 47e2b06b7b5cb356a987ba3429550c3a89ea89d6 ]
[BUG] There is a compilation warning reported on com
btrfs: make found_logical_ret parameter mandatory for function queue_scrub_stripe()
[ Upstream commit 47e2b06b7b5cb356a987ba3429550c3a89ea89d6 ]
[BUG] There is a compilation warning reported on commit ae76d8e3e135 ("btrfs: scrub: fix grouping of read IO"), where gcc (14.0.0 20231022 experimental) is reporting the following uninitialized variable:
fs/btrfs/scrub.c: In function ‘scrub_simple_mirror.isra’: fs/btrfs/scrub.c:2075:29: error: ‘found_logical’ may be used uninitialized [-Werror=maybe-uninitialized[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmaybe-uninitialized]] 2075 | cur_logical = found_logical + BTRFS_STRIPE_LEN; fs/btrfs/scrub.c:2040:21: note: ‘found_logical’ was declared here 2040 | u64 found_logical; | ^~~~~~~~~~~~~
[CAUSE] This is a false alert, as @found_logical is passed as parameter @found_logical_ret of function queue_scrub_stripe().
As long as queue_scrub_stripe() returned 0, we would update @found_logical_ret. And if queue_scrub_stripe() returned >0 or <0, the caller would not utilized @found_logical, thus there should be nothing wrong.
Although the triggering gcc is still experimental, it looks like the extra check on "if (found_logical_ret)" can sometimes confuse the compiler.
Meanwhile the only caller of queue_scrub_stripe() is always passing a valid pointer, there is no need for such check at all.
[FIX] Although the report itself is a false alert, we can still make it more explicit by:
- Replace the check for @found_logical_ret with ASSERT()
- Initialize @found_logical to U64_MAX
- Add one extra ASSERT() to make sure @found_logical got updated
Link: https://lore.kernel.org/linux-btrfs/87fs1x1p93.fsf@gentoo.org/ Fixes: ae76d8e3e135 ("btrfs: scrub: fix grouping of read IO") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43 |
|
#
4fe44f9d |
| 03-Aug-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()
Currently the scrub_stripe_read_repair_worker() only does reads to rebuild the corrupted sectors, it doesn't do
btrfs: scrub: move write back of repaired sectors to scrub_stripe_read_repair_worker()
Currently the scrub_stripe_read_repair_worker() only does reads to rebuild the corrupted sectors, it doesn't do any writeback.
The design is mostly to put writeback into a more ordered manner, to co-operate with dev-replace with zoned mode, which requires every write to be submitted in their bytenr order.
However the writeback for repaired sectors into the original mirror doesn't need such strong sync requirement, as it can only happen for non-zoned devices.
This patch would move the writeback for repaired sectors into scrub_stripe_read_repair_worker(), which removes two calls sites for repaired sectors writeback. (one from flush_scrub_stripes(), one from scrub_raid56_parity_stripe())
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
39dc7bd9 |
| 03-Aug-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: don't go ordered workqueue for dev-replace
The workqueue fs_info->scrub_worker would go ordered workqueue if it's a device replace operation.
However the scrub is relying on multiple
btrfs: scrub: don't go ordered workqueue for dev-replace
The workqueue fs_info->scrub_worker would go ordered workqueue if it's a device replace operation.
However the scrub is relying on multiple workers to do data csum verification, and we always submit several read requests in a row.
Thus there is no need to use ordered workqueue just for dev-replace. We have extra synchronization (the main thread will always submit-and-wait for dev-replace writes) to handle it for zoned devices.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ae76d8e3 |
| 03-Aug-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: fix grouping of read IO
[REGRESSION] There are several regression reports about the scrub performance with v6.4 kernel.
On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub sp
btrfs: scrub: fix grouping of read IO
[REGRESSION] There are several regression reports about the scrub performance with v6.4 kernel.
On a PCIe 3.0 device, the old v6.3 kernel can go 3GB/s scrub speed, but v6.4 can only go 1GB/s, an obvious 66% performance drop.
[CAUSE] Iostat shows a very different behavior between v6.3 and v6.4 kernel:
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 9731.00 3425544.00 17237.00 63.92 2.18 352.02 21.18 100.00 nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00
The upper one is v6.3 while the lower one is v6.4.
There are several obvious differences:
- Very few read merges This turns out to be a behavior change that we no longer do bio plug/unplug.
- Very low aqu-sz This is due to the submit-and-wait behavior of flush_scrub_stripes(), and extra extent/csum tree search.
Both behaviors are not that obvious on SATA SSDs, as SATA SSDs have NCQ to merge the reads, while SATA SSDs can not handle high queue depth well either.
[FIX] For now this patch focuses on the read speed fix. Dev-replace replace speed needs more work.
For the read part, we go two directions to fix the problems:
- Re-introduce blk plug/unplug to merge read requests This is pretty simple, and the behavior is pretty easy to observe.
This would enlarge the average read request size to 512K.
- Introduce multi-group reads and no longer wait for each group Instead of the old behavior, which submits 8 stripes and waits for them, here we would enlarge the total number of stripes to 16 * 8. Which is 8M per device, the same limit as the old scrub in-flight bios size limit.
Now every time we fill a group (8 stripes), we submit them and continue to next stripes.
Only when the full 16 * 8 stripes are all filled, we submit the remaining ones (the last group), and wait for all groups to finish. Then submit the repair writes and dev-replace writes.
This should enlarge the queue depth.
This would greatly improve the merge rate (thus read block size) and queue depth:
Before (with regression, and cached extent/csum path):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 20666.00 1318240.00 10.00 0.05 0.08 63.79 1.63 100.00
After (with all patches applied):
nvme0n1p3 5165.00 2278304.00 30557.00 85.54 0.55 441.10 2.81 100.00
i.e. 1287 to 2224 MB/s.
CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
3c771c19 |
| 03-Aug-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: avoid unnecessary csum tree search preparing stripes
One of the bottleneck of the new scrub code is the extra csum tree search.
The old code would only do the csum tree search for eac
btrfs: scrub: avoid unnecessary csum tree search preparing stripes
One of the bottleneck of the new scrub code is the extra csum tree search.
The old code would only do the csum tree search for each scrub bio, which can be as large as 512KiB, thus they can afford to allocate a new path each time.
But the new scrub code is doing csum tree search for each stripe, which is only 64KiB, this means we'd better re-use the same csum path during each search.
This patch would introduce a per-sctx path for csum tree search, as we don't need to re-allocate the path every time we need to do a csum tree search.
With this change we can further improve the queue depth and improve the scrub read performance:
Before (with regression and cached extent tree path):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00
After (with both cached extent/csum tree path):
nvme0n1p3 17759.00 1133280.00 10.00 0.06 0.08 63.81 1.50 100.00
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1dc4888e |
| 03-Aug-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: avoid unnecessary extent tree search preparing stripes
Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), scrub no longer re-use t
btrfs: scrub: avoid unnecessary extent tree search preparing stripes
Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), scrub no longer re-use the same path for extent tree search.
This can lead to unnecessary extent tree search, especially for the new stripe based scrub, as we have way more stripes to prepare.
This patch would re-introduce a shared path for extent tree search, and properly release it when the block group is scrubbed.
This change alone can improve scrub performance slightly by reducing the time spend preparing the stripe thus improving the queue depth.
Before (with regression):
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz aqu-sz %util nvme0n1p3 15578.00 993616.00 5.00 0.03 0.09 63.78 1.32 100.00
After (with this patch):
nvme0n1p3 15875.00 1013328.00 12.00 0.08 0.08 63.83 1.35 100.00
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
17353a34 |
| 14-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: remove unused btrfs_path in scrub_simple_mirror()
The @path in scrub_simple_mirror() is no longer utilized after commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scr
btrfs: scrub: remove unused btrfs_path in scrub_simple_mirror()
The @path in scrub_simple_mirror() is no longer utilized after commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure").
Before that commit, we call find_first_extent_item() directly, which needs a path and that path can be reused. But after that switch commit, the extent search is done inside queue_scrub_stripe(), which will no longer accept a path from outside.
So the @path variable can be safely removed.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ remove the stale comment ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
cf4ac2b9 |
| 21-Jun-2023 |
Colin Ian King <colin.i.king@gmail.com> |
btrfs: scrub: remove redundant division of stripe_nr
Variable stripe_nr is being divided by map->num_stripes however the result is never read. The division and assignment are redundant and can be re
btrfs: scrub: remove redundant division of stripe_nr
Variable stripe_nr is being divided by map->num_stripes however the result is never read. The division and assignment are redundant and can be removed. Cleans up clang scan build warning:
fs/btrfs/scrub.c:1264:3: warning: Value stored to 'stripe_nr' is never read [deadcode.DeadStores]
The code is a leftover from 6ded22c1bfe6 ("btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32") that converted div64 to normal division, it's the same but previous version did not trigger a warning.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b471965f |
| 28-Jul-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: fix replace/scrub failure with metadata_uuid
Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list) reported a few of the test cases failing.
The failure scenario can be summariz
btrfs: fix replace/scrub failure with metadata_uuid
Fstests with POST_MKFS_CMD="btrfstune -m" (as in the mailing list) reported a few of the test cases failing.
The failure scenario can be summarized and simplified as follows:
$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb1 /dev/sdb2 :0 $ btrfstune -m /dev/sdb1 :0 $ wipefs -a /dev/sdb1 :0 $ mount -o degraded /dev/sdb2 /btrfs :0 $ btrfs replace start -B -f -r 1 /dev/sdb1 /btrfs :1 STDERR: ERROR: ioctl(DEV_REPLACE_START) failed on "/btrfs": Input/output error
[11290.583502] BTRFS warning (device sdb2): tree block 22036480 mirror 2 has bad fsid, has 99835c32-49f0-4668-9e66-dc277a96b4a6 want da40350c-33ac-4872-92a8-4948ed8c04d0 [11290.586580] BTRFS error (device sdb2): unable to fix up (regular) error at logical 22020096 on dev /dev/sdb8 physical 1048576
As above, the replace is failing because we are verifying the header with fs_devices::fsid instead of fs_devices::metadata_uuid, despite the metadata_uuid actually being present.
To fix this, use fs_devices::metadata_uuid. We copy fsid into fs_devices::metadata_uuid if there is no metadata_uuid, so its fine.
Fixes: a3ddbaebc7c9 ("btrfs: scrub: introduce a helper to verify one metadata block") CC: stable@vger.kernel.org # 6.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
cb091225 |
| 22-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix remaining u32 overflows when left shifting stripe_nr
There was regression caused by a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by a729
btrfs: fix remaining u32 overflows when left shifting stripe_nr
There was regression caused by a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr"). To avoid code churn the fix was open coding the type casts but unfortunately missed one which was still possible to hit [1].
The missing place was assignment of bioc->full_stripe_logical inside btrfs_map_block().
Fix it by adding a helper that does the safe calculation of the offset and use it everywhere even though it may not be strictly necessary due to already using u64 types. This replaces all remaining "<< BTRFS_STRIPE_LEN_SHIFT" calls.
[1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/
Fixes: a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
81db6ae8 |
| 12-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers
Since the scrub rework introduced by commit 2af2aaf98205 ("btrfs: scrub: introduce structure for new BTRFS_STRIPE_LEN based interface"
btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers
Since the scrub rework introduced by commit 2af2aaf98205 ("btrfs: scrub: introduce structure for new BTRFS_STRIPE_LEN based interface") and later commits, scrub only needs one single workqueue, fs_info::scrub_worker.
That scrub_wr_completion_workers is initially to handle the delay work after write bios finished. But the new scrub code goes submit-and-wait for write bios, thus all the work are done inside the scrub_worker.
The last user of fs_info::scrub_wr_completion_workers is removed in commit 16f93993498b ("btrfs: scrub: remove the old writeback infrastructure"), so we can safely remove the workqueue.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c2bbc0ba |
| 12-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: remove scrub_ctx::csum_list member
Since the rework of scrub introduced by commit 2af2aaf98205 ("btrfs: scrub: introduce structure for new BTRFS_STRIPE_LEN based interface") and later
btrfs: scrub: remove scrub_ctx::csum_list member
Since the rework of scrub introduced by commit 2af2aaf98205 ("btrfs: scrub: introduce structure for new BTRFS_STRIPE_LEN based interface") and later commits, scrub no longer keeps its data checksum inside sctx.
Instead we have scrub_stripe::csums for the checksum of the stripe. Thus we can remove the unused scrub_ctx::csum_list member.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.33, v6.1.32 |
|
#
723b8bb1 |
| 30-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: open code btrfs_map_sblock
btrfs_map_sblock just hard codes three arguments and calls btrfs_map_sblock. Remove it as it doesn't provide any real value, but makes following the btrfs_map_bloc
btrfs: open code btrfs_map_sblock
btrfs_map_sblock just hard codes three arguments and calls btrfs_map_sblock. Remove it as it doesn't provide any real value, but makes following the btrfs_map_block call chains harder.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.31 |
|
#
58e814fc |
| 25-May-2023 |
Tejun Heo <tj@kernel.org> |
btrfs: use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ==========
When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. T
btrfs: use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ==========
When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue().
However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by 4c16bd327c74 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered"). Because there were users that depended on the ordered execution, 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") made workqueue allocation path to implicitly promote UNBOUND workqueues w/ @max_active==1 to ordered workqueues.
While this has worked okay, overloading the UNBOUND allocation interface this way creates other issues. It's difficult to tell whether a given workqueue actually needs to be ordered and users that legitimately want a min concurrency level wq unexpectedly gets an ordered one instead. With planned UNBOUND workqueue updates to improve execution locality and more prevalence of chiplet designs which can benefit from such improvements, this isn't a state we wanna be in forever.
This patch series audits all call sites that create an UNBOUND workqueue w/ @max_active==1 and converts them to alloc_ordered_workqueue() as necessary.
BTRFS =====
* fs_info->scrub_workers initialized in scrub_workers_get() was setting @max_active to 1 when @is_dev_replace is set and it seems that the workqueue actually needs to be ordered if @is_dev_replace. Update the code so that alloc_ordered_workqueue() is used if @is_dev_replace.
* fs_info->discard_ctl.discard_workers initialized in btrfs_init_workqueues() was directly using alloc_workqueue() w/ @max_active==1. Converted to alloc_ordered_workqueue().
* fs_info->fixup_workers and fs_info->qgroup_rescan_workers initialized in btrfs_queue_work() use the btrfs's workqueue wrapper, btrfs_workqueue, which are allocated with btrfs_alloc_workqueue().
btrfs_workqueue implements automatic @max_active adjustment which is disabled when the specified max limit is below a certain threshold, so calling btrfs_alloc_workqueue() with @limit_active==1 yields an ordered workqueue whose @max_active won't be changed as the auto-tuning is disabled.
This is rather brittle in that nothing clearly indicates that the two workqueues should be ordered or btrfs_alloc_workqueue() must disable auto-tuning when @limit_active==1.
This patch factors out the common btrfs_workqueue init code into btrfs_init_workqueue() and add explicit btrfs_alloc_ordered_workqueue(). The two workqueues are converted to use the new ordered allocation interface.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.30, v6.1.29 |
|
#
b9cb105e |
| 12-May-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
btrfs: scrub: remove more unused functions
These functions are defined in the scrub.c file, but last callers were removed in e9255d6c4054 ("btrfs: scrub: remove the old scrub recheck code").
fs/btr
btrfs: scrub: remove more unused functions
These functions are defined in the scrub.c file, but last callers were removed in e9255d6c4054 ("btrfs: scrub: remove the old scrub recheck code").
fs/btrfs/scrub.c:553:20: warning: unused function 'scrub_stripe_index_and_offset'. fs/btrfs/scrub.c:543:19: warning: unused function 'scrub_nr_raid_mirrors'.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4937 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.28 |
|
#
b7f9945a |
| 11-May-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: handle tree backref walk error properly
[BUG] Smatch reports the following errors related to commit ("btrfs: output affected files when relocation fails"):
fs/btrfs/inode.c:283 print_data_r
btrfs: handle tree backref walk error properly
[BUG] Smatch reports the following errors related to commit ("btrfs: output affected files when relocation fails"):
fs/btrfs/inode.c:283 print_data_reloc_error() error: uninitialized symbol 'ref_level'.
[CAUSE] That part of code is mostly copied from scrub, but unfortunately scrub code from the beginning is not doing the error handling properly.
The offending code looks like this:
do { ret = tree_backref_for_extent(); btrfs_warn_rl(); } while (ret != 1);
There are several problems involved:
- No error handling If that tree_backref_for_extent() failed, we would output the same error again and again, never really exit as it requires ret == 1 to exit.
- Always do one extra output As tree_backref_for_extent() only return > 0 if there is no more backref item. This means after the last item we hit, we would output an invalid error message for ret > 0 case.
[FIX] Fix the old code by:
- Move @ref_root and @ref_level into the if branch And do not initialize them, so we can catch such uninitialized values just like what we do in the inode.c
- Explicitly check the return value of tree_backref_for_extent() And handle ret < 0 and ret > 0 cases properly.
- No more do {} while () loop Instead go while (true) {} loop since we will handle @ret manually.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24 |
|
#
94ead93e |
| 13-Apr-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read
For P/Q stripe scrub, we have quite some duplicated read IO:
- Data stripes read for verification This is triggered by
btrfs: scrub: use recovered data stripes as cache to avoid unnecessary read
For P/Q stripe scrub, we have quite some duplicated read IO:
- Data stripes read for verification This is triggered by the scrub_submit_initial_read() inside scrub_raid56_parity_stripe().
- Data stripes read (again) for P/Q stripe verification This is triggered by scrub_assemble_read_bios() from scrub_rbio().
Although we can have hit rbio cache and avoid unnecessary read, the chance is very low, as scrub would easily flush the whole rbio cache.
This means, even we're just scrubbing a single P/Q stripe, we would read the data stripes twice for the best case scenario. If we need to recover some data stripes, it would cause more reads on the same data stripes, again and again.
However before we call raid56_parity_submit_scrub_rbio() we already have all data stripes repaired and their contents ready to use. But RAID56 cache is unaware about the scrub cache, thus RAID56 layer itself still needs to re-read the data stripes.
To avoid such cache miss, this patch would:
- Introduce a new helper, raid56_parity_cache_data_pages() This function would grab the pages from an array, and copy the content to the rbio, marking all the involved sectors uptodate.
The page copy is unavoidable because of the cache pages of rbio are all self managed, thus can not utilize outside pages without screwing up the lifespan.
- Use the repaired data stripes as cache inside scrub_raid56_parity_stripe()
By this, we ensure all the data sectors of the scrub rbio are already uptodate, and no need to read them again from disk.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b50f2d04 |
| 14-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: fix a return value overwrite in scrub_stripe()
[RETURN VALUE OVERWRITE] Inside scrub_stripe(), we would submit all the remaining stripes after iterating all extents.
But since flush_s
btrfs: scrub: fix a return value overwrite in scrub_stripe()
[RETURN VALUE OVERWRITE] Inside scrub_stripe(), we would submit all the remaining stripes after iterating all extents.
But since flush_scrub_stripes() can return error, we need to avoid overwriting the existing @ret if there is any error.
However the existing check is doing the wrong check:
ret2 = flush_scrub_stripes(); if (!ret2) ret = ret2;
This would overwrite the existing @ret to 0 as long as the final flush detects no critical errors.
[FIX] We should check @ret other than @ret2 in that case.
Fixes: 8eb3dd17eadd ("btrfs: dev-replace: error out if we have unrepaired metadata error during") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
79b8ee70 |
| 06-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: scrub: also report errors hit during the initial read
[BUG] After the recent scrub rework introduced in commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infras
btrfs: scrub: also report errors hit during the initial read
[BUG] After the recent scrub rework introduced in commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), btrfs scrub no longer reports repaired errors any more:
# mkfs.btrfs -f $dev -d DUP # mount $dev $mnt # xfs_io -f -d -c "pwrite -b 64K -S 0xaa 0 64" $mnt/file # umount $dev # xfs_io -f -c "pwrite -S 0xff $phy1 64K" $dev # Corrupt the first mirror # mount $dev $mnt # btrfs scrub start -BR $mnt scrub done for 725e7cb7-8a4a-4c77-9f2a-86943619e218 Scrub started: Tue Jun 6 14:56:50 2023 Status: finished Duration: 0:00:00 data_extents_scrubbed: 2 tree_extents_scrubbed: 18 data_bytes_scrubbed: 131072 tree_bytes_scrubbed: 294912 read_errors: 0 csum_errors: 0 <<< No errors here verify_errors: 0 [...] uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 16 <<< Only corrected errors last_physical: 2723151872
This can confuse btrfs-progs, as it relies on the csum_errors to determine if there is anything wrong.
While on v6.3.x kernels, the report is different:
csum_errors: 16 <<< verify_errors: 0 [...] uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 16 <<<
[CAUSE] In the reworked scrub, we update the scrub progress inside scrub_stripe_report_errors(), using various bitmaps to update the result.
For example for csum_errors, we use bitmap_weight() of stripe->csum_error_bitmap.
Unfortunately at that stage, all error bitmaps (except init_error_bitmap) are the result of the latest repair attempt, thus if the stripe is fully repaired, those error bitmaps will all be empty, resulting the above output mismatch.
To fix this, record the number of errors into stripe->init_nr_*_errors. Since we don't really care about where those errors are, we only need to record the number of errors.
Then in scrub_stripe_report_errors(), use those initial numbers to update the progress other than using the latest error bitmaps.
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|