#
cb3e217b |
| 12-Nov-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: use btrfs_dev_name() helper to handle missing devices better
[BUG] If dev-replace failed to re-construct its data/metadata, the kernel message would be incorrect for the missing device:
BTR
btrfs: use btrfs_dev_name() helper to handle missing devices better
[BUG] If dev-replace failed to re-construct its data/metadata, the kernel message would be incorrect for the missing device:
BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev (efault)
Note the above "dev (efault)" of the second line. While the first line is properly reporting "<missing disk>".
[CAUSE] Although dev-replace is using btrfs_dev_name(), the heavy lifting work is still done by scrub (scrub is reused by both dev-replace and regular scrub).
Unfortunately scrub code never uses btrfs_dev_name() helper, as it's only declared locally inside dev-replace.c.
[FIX] Fix the output by:
- Move the btrfs_dev_name() helper to volumes.h
- Use btrfs_dev_name() to replace open-coded rcu_str_deref() calls Only zoned code is not touched, as I'm not familiar with degraded zoned code.
- Constify return value and parameter
Now the output looks pretty sane:
BTRFS info (device dm-1): dev_replace from <missing disk> (devid 2) to /dev/mapper/test-scratch2 started BTRFS error (device dm-1): failed to rebuild valid logical 38862848 for dev <missing disk>
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bb21e302 |
| 07-Nov-2022 |
Anand Jain <anand.jain@oracle.com> |
btrfs: move device->name RCU allocation and assign to btrfs_alloc_device()
There is a repeating code section in the parent function after calling btrfs_alloc_device(), as below:
name = rcu_st
btrfs: move device->name RCU allocation and assign to btrfs_alloc_device()
There is a repeating code section in the parent function after calling btrfs_alloc_device(), as below:
name = rcu_string_strdup(path, GFP_...); if (!name) { btrfs_free_device(device); return ERR_PTR(-ENOMEM); } rcu_assign_pointer(device->name, name);
Except in add_missing_dev() for obvious reasons.
This patch consolidates that repeating code into the btrfs_alloc_device() itself so that the parent function doesn't have to duplicate code. This consolidation also helps to review issues regarding RCU lock violation with device->name.
Parent function device_list_add() and add_missing_dev() use GFP_NOFS for the allocation, whereas the rest of the parent functions use GFP_KERNEL, so bring the NOFS allocation context using memalloc_nofs_save() in the function device_list_add() and add_missing_dev() is already doing it.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
35da5a7e |
| 27-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: drop private_data parameter from extent_io_tree_init
All callers except one pass NULL, so the parameter can be dropped and the inode::io_tree initialization can be open coded.
Reviewed-by: A
btrfs: drop private_data parameter from extent_io_tree_init
All callers except one pass NULL, so the parameter can be dropped and the inode::io_tree initialization can be open coded.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
428c8e03 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: simplify percent calculation helpers, rename div_factor
The div_factor* helpers calculate fraction or percentage fraction. The name is a bit confusing, we use it only for percentage calculati
btrfs: simplify percent calculation helpers, rename div_factor
The div_factor* helpers calculate fraction or percentage fraction. The name is a bit confusing, we use it only for percentage calculations and there are two helpers.
There's a helper mult_frac that's for general fractions, that tries to be accurate but we multiply and divide by small numbers so we can use the div_u64 helper.
Rename the div_factor* helpers and use 1..100 percentage range, also drop the case checking for percentage == 100, it's never hit.
The conversions:
* div_factor calculates tenths and the numbers need to be adjusted * div_factor_fine is direct replacement
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
7f0add25 |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move super_block specific helpers into super.h
This will make syncing fs.h to user space a little easier if we can pull the super block specific helpers out of fs.h and put them in super.h.
btrfs: move super_block specific helpers into super.h
This will make syncing fs.h to user space a little easier if we can pull the super block specific helpers out of fs.h and put them in super.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2fc6822c |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move scrub prototypes into scrub.h
Move these out of ctree.h into scrub.h to cut down on code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Ba
btrfs: move scrub prototypes into scrub.h
Move these out of ctree.h into scrub.h to cut down on code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
67707479 |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move relocation prototypes into relocation.h
Move these out of ctree.h into relocation.h to cut down on code in ctree.h
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-of
btrfs: move relocation prototypes into relocation.h
Move these out of ctree.h into relocation.h to cut down on code in ctree.h
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
7572dec8 |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move ioctl prototypes into ioctl.h
Move these out of ctree.h into ioctl.h to cut down on code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Ba
btrfs: move ioctl prototypes into ioctl.h
Move these out of ctree.h into ioctl.h to cut down on code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c7a03b52 |
| 26-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move uuid tree prototypes to uuid-tree.h
Move these out of ctree.h into uuid-tree.h to cut down on the code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-of
btrfs: move uuid tree prototypes to uuid-tree.h
Move these out of ctree.h into uuid-tree.h to cut down on the code in ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
43dd529a |
| 27-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: update function comments
Update, reformat or reword function comments. This also removes the kdoc marker so we don't get reports when the function name is missing.
Changes made:
- remove kd
btrfs: update function comments
Update, reformat or reword function comments. This also removes the kdoc marker so we don't get reports when the function name is missing.
Changes made:
- remove kdoc markers - reformat the brief description to be a proper sentence - reword to imperative voice - align parameter list - fix typos
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.3 |
|
#
07e81dc9 |
| 19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move accessor helpers into accessors.h
This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in a
btrfs: move accessor helpers into accessors.h
This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in accessors.h, and then update any files to include the header so everything compiles.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments, style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c7f13d42 |
| 19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move fs wide helpers out of ctree.h
We have several fs wide related helpers in ctree.h. The bulk of these are the incompat flag test helpers, but there are things such as btrfs_fs_closing()
btrfs: move fs wide helpers out of ctree.h
We have several fs wide related helpers in ctree.h. The bulk of these are the incompat flag test helpers, but there are things such as btrfs_fs_closing() and the read only helpers that also aren't directly related to the ctree code. Move these into a fs.h header, which will serve as the location for file system wide related helpers.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
63a7cb13 |
| 26-Jul-2022 |
David Sterba <dsterba@suse.com> |
btrfs: auto enable discard=async when possible
There's a request to automatically enable async discard for capable devices. We can do that, the async mode is designed to wait for larger freed extent
btrfs: auto enable discard=async when possible
There's a request to automatically enable async discard for capable devices. We can do that, the async mode is designed to wait for larger freed extents and is not intrusive, with limits to iops, kbps or latency.
The status and tunables will be exported in /sys/fs/btrfs/FSID/discard .
The automatic selection is done if there's at least one discard capable device in the filesystem (not capable devices are skipped). Mounting with any other discard option will honor that option, notably mounting with nodiscard will keep it disabled.
Link: https://lore.kernel.org/linux-btrfs/CAEg-Je_b1YtdsCR0zS5XZ_SbvJgN70ezwvRwLiCZgDGLbeMB=w@mail.gmail.com/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a8d1b164 |
| 04-Nov-2022 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: zoned: initialize device's zone info for seeding
When performing seeding on a zoned filesystem it is necessary to initialize each zoned device's btrfs_zoned_device_info structure, otherwise m
btrfs: zoned: initialize device's zone info for seeding
When performing seeding on a zoned filesystem it is necessary to initialize each zoned device's btrfs_zoned_device_info structure, otherwise mounting the filesystem will cause a NULL pointer dereference.
This was uncovered by fstests' testcase btrfs/163.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
21e61ec6 |
| 04-Nov-2022 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: zoned: clone zoned device info when cloning a device
When cloning a btrfs_device, we're not cloning the associated btrfs_zoned_device_info structure of the device in case of a zoned filesyste
btrfs: zoned: clone zoned device info when cloning a device
When cloning a btrfs_device, we're not cloning the associated btrfs_zoned_device_info structure of the device in case of a zoned filesystem.
Later on this leads to a NULL pointer dereference when accessing the device's zone_info for instance when setting a zone as active.
This was uncovered by fstests' testcase btrfs/161.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
0fca385d |
| 03-Nov-2022 |
Liu Shixin <liushixin2@huawei.com> |
btrfs: fix match incorrectly in dev_args_match_device
syzkaller found a failed assertion:
assertion failed: (args->devid != (u64)-1) || args->missing, in fs/btrfs/volumes.c:6921
This can be trig
btrfs: fix match incorrectly in dev_args_match_device
syzkaller found a failed assertion:
assertion failed: (args->devid != (u64)-1) || args->missing, in fs/btrfs/volumes.c:6921
This can be triggered when we set devid to (u64)-1 by ioctl. In this case, the match of devid will be skipped and the match of device may succeed incorrectly.
Patch 562d7b1512f7 introduced this function which is used to match device. This function contains two matching scenarios, we can distinguish them by checking the value of args->missing rather than check whether args->devid and args->uuid is default value.
Reported-by: syzbot+031687116258450f9853@syzkaller.appspotmail.com Fixes: 562d7b1512f7 ("btrfs: handle device lookup with btrfs_dev_lookup_args") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
76a66ba1 |
| 20-Oct-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: don't use btrfs_chunk::sub_stripes from disk
[BUG] There are two reports (the earliest one from LKP, a more recent one from kernel bugzilla) that we can have some chunks with 0 as sub_stripes
btrfs: don't use btrfs_chunk::sub_stripes from disk
[BUG] There are two reports (the earliest one from LKP, a more recent one from kernel bugzilla) that we can have some chunks with 0 as sub_stripes.
This will cause divide-by-zero errors at btrfs_rmap_block, which is introduced by a recent kernel patch ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block"):
if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { stripe_nr = stripe_nr * map->num_stripes + i; stripe_nr = div_u64(stripe_nr, map->sub_stripes); <<< }
[CAUSE] From the more recent report, it has been proven that we have some chunks with 0 as sub_stripes, mostly caused by older mkfs.
It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa ("btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk") which is included in v5.4 btrfs-progs release.
So there would be quite some old filesystems with such 0 sub_stripes.
[FIX] Just don't trust the sub_stripes values from disk.
We have a trusted btrfs_raid_array[] to fetch the correct sub_stripes numbers for each profile and that are fixed.
By this, we can keep the compatibility with older filesystems while still avoid divide-by-zero bugs.
Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Viktor Kuzmin <kvaster@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216559 Fixes: ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block") CC: stable@vger.kernel.org # 6.0 Reviewed-by: Su Yue <glass@fydeos.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a05d3c91 |
| 24-Aug-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: check superblock to ensure the fs was not modified at thaw time
[BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted
btrfs: check superblock to ensure the fs was not modified at thaw time
[BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted.
Then by some incident, the btrfs got mounted and modified by another system/OS, then back to the hibernated system.
After resuming from the hibernation, new write happened into the victim btrfs.
Now the fs is completely broken, since the underlying btrfs is no longer the same one before the hibernation, and the user lost their data due to various transid mismatch.
[REPRODUCER] We can emulate the situation using the following small script:
truncate -s 1G $dev mkfs.btrfs -f $dev mount $dev $mnt fsstress -w -d $mnt -n 500 sync xfs_freeze -f $mnt cp $dev $dev.backup
# There is no way to mount the same cloned fs on the same system, # as the conflicting fsid will be rejected by btrfs. # Thus here we have to wipe the fs using a different btrfs. mkfs.btrfs -f $dev.backup
dd if=$dev.backup of=$dev bs=1M xfs_freeze -u $mnt fsstress -w -d $mnt -n 20 umount $mnt btrfs check $dev
The final fsck will fail due to some tree blocks has incorrect fsid.
This is enough to emulate the problem hit by the unfortunate user.
[ENHANCEMENT] Although such case should not be that common, it can still happen from time to time.
From the view of btrfs, we can detect any unexpected super block change, and if there is any unexpected change, we just mark the fs read-only, and thaw the fs.
By this we can limit the damage to minimal, and I hope no one would lose their data by this anymore.
Suggested-by: Goffredo Baroncelli <kreijack@libero.it> Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
928ff3be |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: stop allocation a btrfs_io_context for simple I/O
The I/O context structure is only used to pass the btrfs_device to the end I/O handler for I/Os that go to a single device.
Stop allocating
btrfs: stop allocation a btrfs_io_context for simple I/O
The I/O context structure is only used to pass the btrfs_device to the end I/O handler for I/Os that go to a single device.
Stop allocating the I/O context for these cases by passing the optional btrfs_io_stripe argument to __btrfs_map_block to query the mapping information and then using a fast path submission and I/O completion handler. As the old btrfs_io_context based I/O submission path is only used for mirrored writes, rename the functions to make that clear and stop setting the btrfs_bio device and mirror_num field that is only used for reads.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
03793cbb |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: add fast path for single device io in __btrfs_map_block
There is no need for most of the btrfs_io_context when doing I/O to a single device. To support such I/O without the extra btrfs_io_co
btrfs: add fast path for single device io in __btrfs_map_block
There is no need for most of the btrfs_io_context when doing I/O to a single device. To support such I/O without the extra btrfs_io_context allocation, turn the mirror_num argument into a pointer so that it can be used to output the selected mirror number, and add an optional argument that points to a btrfs_io_stripe structure, which will be filled with a single extent if provided by the caller.
In that case the btrfs_io_context allocation can be skipped as all information for the single device I/O is provided in the mirror_num argument and the on-stack btrfs_io_stripe. A caller that makes use of this new argument will be added in the next commit.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
28793b19 |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: decide bio cloning inside submit_stripe_bio
Remove the orig_bio argument as it can be derived from the bioc, and the clone argument as it can be calculated from bioc and dev_nr.
Reviewed-by:
btrfs: decide bio cloning inside submit_stripe_bio
Remove the orig_bio argument as it can be derived from the bioc, and the clone argument as it can be calculated from bioc and dev_nr.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
32747c44 |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: factor out low-level bio setup from submit_stripe_bio
Split out a low-level btrfs_submit_dev_bio helper that just submits the bio without any cloning decisions or setting up the end I/O handl
btrfs: factor out low-level bio setup from submit_stripe_bio
Split out a low-level btrfs_submit_dev_bio helper that just submits the bio without any cloning decisions or setting up the end I/O handler for future reuse by a different caller.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
917f32a2 |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: give struct btrfs_bio a real end_io handler
Currently btrfs_bio end I/O handling is a bit of a mess. The bi_end_io handler and bi_private pointer of the embedded struct bio are both used to
btrfs: give struct btrfs_bio a real end_io handler
Currently btrfs_bio end I/O handling is a bit of a mess. The bi_end_io handler and bi_private pointer of the embedded struct bio are both used to handle the completion of the high-level btrfs_bio and for the I/O completion for the low-level device that the embedded bio ends up being sent to.
To support this bi_end_io and bi_private are saved into the btrfs_io_context structure and then restored after the bio sent to the underlying device has completed the actual I/O.
Untangle this by adding an end I/O handler and private data to struct btrfs_bio for the high-level btrfs_bio based completions, and leave the actual bio bi_end_io handler and bi_private pointer entirely to the low-level device I/O.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f1c29379 |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: properly abstract the parity raid bio handling
The parity raid write/recover functionality is currently not very well abstracted from the bio submission and completion handling in volumes.c:
btrfs: properly abstract the parity raid bio handling
The parity raid write/recover functionality is currently not very well abstracted from the bio submission and completion handling in volumes.c:
- the raid56 code directly completes the original btrfs_bio fed into btrfs_submit_bio instead of dispatching back to volumes.c - the raid56 code consumes the bioc and bio_counter references taken by volumes.c, which also leads to special casing of the calls from the scrub code into the raid56 code
To fix this up supply a bi_end_io handler that calls back into the volumes.c machinery, which then puts the bioc, decrements the bio_counter and completes the original bio, and updates the scrub code to also take ownership of the bioc and bio_counter in all cases.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c3a62baf |
| 06-Aug-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: use chained bios when cloning
The stripes_pending in the btrfs_io_context counts number of inflight low-level bios for an upper btrfs_bio. For reads this is generally one as reads are never
btrfs: use chained bios when cloning
The stripes_pending in the btrfs_io_context counts number of inflight low-level bios for an upper btrfs_bio. For reads this is generally one as reads are never cloned, while for writes we can trivially use the bio remaining mechanisms that is used for chained bios.
To be able to make use of that mechanism, split out a separate trivial end_io handler for the cloned bios that does a minimal amount of error tracking and which then calls bio_endio on the original bio to transfer control to that, with the remaining counter making sure it is completed last. This then allows to merge btrfs_end_bioc into the original bio bi_end_io handler.
To make this all work all error handling needs to happen through the bi_end_io handler, which requires a small amount of reshuffling in submit_stripe_bio so that the bio is cloned already by the time the suitability of the device is checked.
This reduces the size of the btrfs_io_context and prepares splitting the btrfs_bio at the stripe boundary.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|