Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
f6d4d29a |
| 19-Feb-2024 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
[ Upstream commit c7bb26b847e5b97814f522686068c5628e2b3646 ]
At btrfs_use_block_rsv() we read the size of a block reserve
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
[ Upstream commit c7bb26b847e5b97814f522686068c5628e2b3646 ]
At btrfs_use_block_rsv() we read the size of a block reserve without locking its spinlock, which makes KCSAN complain because the size of a block reserve is always updated while holding its spinlock. The report from KCSAN is the following:
[653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs]
[653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0: [653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs] [653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs] [653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs] [653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs] [653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs] [653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs] [653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs] [653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs] [653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs] [653.322140] flush_space+0x5e4/0x718 [btrfs] [653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs] [653.323781] process_one_work+0x3b6/0x838 [653.323800] worker_thread+0x75e/0xb10 [653.323817] kthread+0x21a/0x230 [653.323836] __ret_from_fork+0x6c/0xb8 [653.323855] ret_from_fork+0xa/0x30
[653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3: [653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs] [653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs] [653.325494] btrfs_free_extent+0x76/0x120 [btrfs] [653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs] [653.327064] btrfs_dec_ref+0x50/0x70 [btrfs] [653.327849] walk_up_proc+0x236/0xa50 [btrfs] [653.328633] walk_up_tree+0x21c/0x448 [btrfs] [653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs] [653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs] [653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs] [653.331781] kthread+0x21a/0x230 [653.331800] __ret_from_fork+0x6c/0xb8 [653.331818] ret_from_fork+0xa/0x30
So add a helper to get the size of a block reserve while holding the lock. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40 |
|
#
8dbfc14f |
| 20-Jul-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: account block group tree when calculating global reserve size
When using the block group tree feature, this tree is a critical tree just like the extent, csum and free space trees, and just l
btrfs: account block group tree when calculating global reserve size
When using the block group tree feature, this tree is a critical tree just like the extent, csum and free space trees, and just like them it uses the delayed refs block reserve.
So take into account the block group tree, and its current size, when calculating the size for the global reserve.
CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27 |
|
#
54d687c1 |
| 29-Apr-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c
This is completely related to block rsv's, move it out of the free space cache code and into block-rsv.c.
Reviewed-by: Johannes Thums
btrfs: move btrfs_check_trunc_cache_free_space into block-rsv.c
This is completely related to block rsv's, move it out of the free space cache code and into block-rsv.c.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
d246331b |
| 02-May-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: don't free qgroup space unless specified
Boris noticed in his simple quotas testing that he was getting a leak with Sweet Tea's change to subvol create that stopped doing a transaction commit
btrfs: don't free qgroup space unless specified
Boris noticed in his simple quotas testing that he was getting a leak with Sweet Tea's change to subvol create that stopped doing a transaction commit. This was just a side effect of that change.
In the delayed inode code we have an optimization that will free extra reservations if we think we can pack a dir item into an already modified leaf. Previously this wouldn't be triggered in the subvolume create case because we'd commit the transaction, it was still possible but much harder to trigger. It could actually be triggered if we did a mkdir && subvol create with qgroups enabled.
This occurs because in btrfs_insert_delayed_dir_index(), which gets called when we're adding the dir item, we do the following:
btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL);
if we're able to skip reserving space.
The problem here is that trans->block_rsv points at the temporary block rsv for the subvolume create, which has qgroup reservations in the block rsv.
This is a problem because btrfs_block_rsv_release() will do the following:
if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) { qgroup_to_release = block_rsv->qgroup_rsv_reserved - block_rsv->qgroup_rsv_size; block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size; }
The temporary block rsv just has ->qgroup_rsv_reserved set, ->qgroup_rsv_size == 0. The optimization in btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0. Then later on when we call btrfs_subvolume_release_metadata() which has
btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);
qgroup_to_release is set to 0, and we do not convert the reserved metadata space.
The problem here is that the block rsv code has been unconditionally messing with ->qgroup_rsv_reserved, because the main place this is used is delalloc, and any time we call btrfs_block_rsv_release() we do it with qgroup_to_release set, and thus do the proper accounting.
The subvolume code is the only other code that uses the qgroup reservation stuff, but it's intermingled with the above optimization, and thus was getting its reservation freed out from underneath it and thus leaking the reserved space.
The solution is to simply not mess with the qgroup reservations if we don't have qgroup_to_release set. This works with the existing code as anything that messes with the delalloc reservations always have qgroup_to_release set. This fixes the leak that Boris was observing.
Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21 |
|
#
f8f210dc |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: calculate the right space for delayed refs when updating global reserve
When updating the global block reserve, we account for the 6 items needed by an unlink operation and the 6 delayed refe
btrfs: calculate the right space for delayed refs when updating global reserve
When updating the global block reserve, we account for the 6 items needed by an unlink operation and the 6 delayed references for each one of those items. However the calculation for the delayed references is not correct in case we have the free space tree enabled, as in that case we need to touch the free space tree as well and therefore need twice the number of bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the number of bytes need for the delayed references at btrfs_update_global_block_rsv().
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5630e2bc |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use a constant for the number of metadata units needed for an unlink
Instead of hard coding the number of metadata units for an unlink operation in a couple places, define a macro and use it
btrfs: use a constant for the number of metadata units needed for an unlink
Instead of hard coding the number of metadata units for an unlink operation in a couple places, define a macro and use it instead. This eliminates the problem of one place getting out of sync with the other, such as recently fixed by the previous patch in the series ("btrfs: fix calculation of the global block reserve's size").
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ba4ec8fb |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix calculation of the global block reserve's size
At btrfs_update_global_block_rsv(), we are assuming an unlink operation uses 5 metadata units, but that's not true anymore, it uses 6 since
btrfs: fix calculation of the global block reserve's size
At btrfs_update_global_block_rsv(), we are assuming an unlink operation uses 5 metadata units, but that's not true anymore, it uses 6 since the commit bca4ad7c0b54 ("btrfs: reserve correct number of items for unlink and rmdir"). So update the code and comments to consider 6 units.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
4e8313e5 |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: simplify variables in btrfs_block_rsv_refill()
At btrfs_block_rsv_refill(), there's no point in initializing the 'num_bytes' variable to 0 and then, after taking the block reserve's spinlock,
btrfs: simplify variables in btrfs_block_rsv_refill()
At btrfs_block_rsv_refill(), there's no point in initializing the 'num_bytes' variable to 0 and then, after taking the block reserve's spinlock, initializing it to the value of the 'min_reserved' parameter.
So just get rid of the 'num_bytes' local variable and rename the 'min_reserved' parameter to 'num_bytes'.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b93fa4ac |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove check for NULL block reserve at btrfs_block_rsv_check()
The block reserve passed to btrfs_block_rsv_check() is never NULL, so remove the check. In case it can ever become NULL in the f
btrfs: remove check for NULL block reserve at btrfs_block_rsv_check()
The block reserve passed to btrfs_block_rsv_check() is never NULL, so remove the check. In case it can ever become NULL in the future, then we'll get a pretty obvious and clear NULL pointer dereference crash and stack trace.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6 |
|
#
428c8e03 |
| 26-Oct-2022 |
David Sterba <dsterba@suse.com> |
btrfs: simplify percent calculation helpers, rename div_factor
The div_factor* helpers calculate fraction or percentage fraction. The name is a bit confusing, we use it only for percentage calculati
btrfs: simplify percent calculation helpers, rename div_factor
The div_factor* helpers calculate fraction or percentage fraction. The name is a bit confusing, we use it only for percentage calculations and there are two helpers.
There's a helper mult_frac that's for general fractions, that tries to be accurate but we multiply and divide by small numbers so we can use the div_u64 helper.
Rename the div_factor* helpers and use 1..100 percentage range, also drop the case checking for percentage == 100, it's never hit.
The conversions:
* div_factor calculates tenths and the numbers need to be adjusted * div_factor_fine is direct replacement
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.5, v5.15.75, v6.0.4 |
|
#
1751850f |
| 25-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: remove unused btrfs_cond_migrate_bytes
The last user of this was removed in 7f9fe6144076 ("btrfs: improve global reserve stealing logic"), drop this code as it's no longer called by anybody.
btrfs: remove unused btrfs_cond_migrate_bytes
The last user of this was removed in 7f9fe6144076 ("btrfs: improve global reserve stealing logic"), drop this code as it's no longer called by anybody.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.3 |
|
#
07e81dc9 |
| 19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move accessor helpers into accessors.h
This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in a
btrfs: move accessor helpers into accessors.h
This is a large patch, but because they're all macros it's impossible to split up. Simply copy all of the item accessors in ctree.h and paste them in accessors.h, and then update any files to include the header so everything compiles.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ reformat comments, style fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
fc97a410 |
| 19-Oct-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move mount option definitions to fs.h
These are fs wide definitions and helpers, move them out of ctree.h and into fs.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed
btrfs: move mount option definitions to fs.h
These are fs wide definitions and helpers, move them out of ctree.h and into fs.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68 |
|
#
765c3fe9 |
| 09-Sep-2022 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY
Inside of FB, as well as some user reports, we've had a consistent problem of occasional ENOSPC transaction aborts. Inside FB we were seeing ~100-200
btrfs: introduce BTRFS_RESERVE_FLUSH_EMERGENCY
Inside of FB, as well as some user reports, we've had a consistent problem of occasional ENOSPC transaction aborts. Inside FB we were seeing ~100-200 ENOSPC aborts per day in the fleet, which is a really low occurrence rate given the size of our fleet, but it's not nothing.
There are two causes of this particular problem.
First is delayed allocation. The reservation system for delalloc assumes that contiguous dirty ranges will result in 1 file extent item. However if there is memory pressure that results in fragmented writeout, or there is fragmentation in the block groups, this won't necessarily be true. Consider the case where we do a single 256MiB write to a file and then close it. We will have 1 reservation for the inode update, the reservations for the checksum updates, and 1 reservation for the file extent item. At some point later we decide to write this entire range out, but we're so fragmented that we break this into 100 different file extents. Since we've already closed the file and are no longer writing to it there's nothing to trigger a refill of the delalloc block rsv to satisfy the 99 new file extent reservations we need. At this point we exhaust our delalloc reservation, and we begin to steal from the global reserve. If you have enough of these cases going in parallel you can easily exhaust the global reserve, get an ENOSPC at btrfs_alloc_tree_block() time, and then abort the transaction.
The other case is the delayed refs reserve. The delayed refs reserve updates its size based on outstanding delayed refs and dirty block groups. However we only refill this block reserve when returning excess reservations and when we call btrfs_start_transaction(root, X). We will reserve 2*X credits at transaction start time, and fill in X into the delayed refs reserve to make sure it stays topped off. Generally this works well, but clearly has downsides. If we do a particularly delayed ref heavy operation we may never catch up in our reservations. Additionally running delayed refs generates more delayed refs, and at that point we may be committing the transaction and have no way to trigger a refill of our delayed refs rsv. Then a similar thing occurs with the delalloc reserve.
Generally speaking we well over-reserve in all of our block rsvs. If we reserve 1 credit we're usually reserving around 264k of space, but we'll often not use any of that reservation, or use a few blocks of that reservation. We can be reasonably sure that as long as you were able to reserve space up front for your operation you'll be able to find space on disk for that reservation.
So introduce a new flushing state, BTRFS_RESERVE_FLUSH_EMERGENCY. This gets used in the case that we've exhausted our reserve and the global reserve. It simply forces a reservation if we have enough actual space on disk to make the reservation, which is almost always the case. This keeps us from hitting ENOSPC aborts in these odd occurrences where we've not kept up with the delayed work.
Fixing this in a complete way is going to be relatively complicated and time consuming. This patch is what I discussed with Filipe earlier this year, and what I put into our kernels inside FB. With this patch we're down to 1-2 ENOSPC aborts per week, which is a significant reduction. This is a decent stop gap until we can work out a more wholistic solution to these two corner cases.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.67, v5.15.66 |
|
#
748f553c |
| 05-Sep-2022 |
David Sterba <dsterba@suse.com> |
btrfs: add KCSAN annotations for unlocked access to block_rsv->full
KCSAN reports that there's unlocked access mixed with locked access, which is technically correct but is not a bug. To avoid fals
btrfs: add KCSAN annotations for unlocked access to block_rsv->full
KCSAN reports that there's unlocked access mixed with locked access, which is technically correct but is not a bug. To avoid false alerts at least from KCSAN, add annotation and use a wrapper whenever ->full is accessed for read outside of lock.
It is used as a fast check and only advisory. In the worst case the block reserve is found !full and becomes full in the meantime, but properly handled.
Depending on the value of ->full, btrfs_block_rsv_release decides where to return the reservation, and block_rsv_release_bytes handles a NULL pointer for block_rsv and if it's not NULL then it double checks the full status under a lock.
Link: https://lore.kernel.org/linux-btrfs/CAAwBoOJDjei5Hnem155N_cJwiEkVwJYvgN-tQrwWbZQGhFU=cA@mail.gmail.com/ Link: https://lore.kernel.org/linux-btrfs/YvHU/vsXd7uz5V6j@hungrycats.org Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60 |
|
#
14033b08 |
| 09-Aug-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: don't save block group root into super block
The extent tree v2 needs a new root for storing all block group items, the whole feature hasn't been finished yet so we can afford to do some chan
btrfs: don't save block group root into super block
The extent tree v2 needs a new root for storing all block group items, the whole feature hasn't been finished yet so we can afford to do some changes.
My initial proposal years ago just added a new tree rootid, and load it from tree root, just like what we did for quota/free space tree/uuid/extent roots.
But the extent tree v2 patches introduced a completely new way to store block group tree root into super block which is arguably wasteful.
Currently there are only 3 trees stored in super blocks, and they all have their valid reasons:
- Chunk root Needed for bootstrap.
- Tree root Really the entry point for all trees.
- Log root This is special as log root has to be updated out of existing transaction mechanism.
There is not even any reason to put block group root into super blocks, the block group tree is updated at the same time as the old extent tree, no need for extra bootstrap/out-of-transaction update.
So just move block group root from super block into tree root.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50 |
|
#
8bfc9b2c |
| 23-Jun-2022 |
David Sterba <dsterba@suse.com> |
btrfs: use enum for btrfs_block_rsv::type
The number of block group reserve types BTRFS_BLOCK_RSV_* is small and fits to u8 and there's enough left in case we want to add more. For type safety use t
btrfs: use enum for btrfs_block_rsv::type
The number of block group reserve types BTRFS_BLOCK_RSV_* is small and fits to u8 and there's enough left in case we want to add more. For type safety use the enum but make it 8 bits in the structure to save space.
The structure size is now 48 on release build, making a slight improvement in structures where it's embedded, like btrfs_fs_info or btrfs_inode.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c70c2c5b |
| 23-Jun-2022 |
David Sterba <dsterba@suse.com> |
btrfs: switch btrfs_block_rsv::full to bool
Use simple bool type for the block reserve full status, there's short to save space as there used to be int but there's no reason for that.
Reviewed-by:
btrfs: switch btrfs_block_rsv::full to bool
Use simple bool type for the block reserve full status, there's short to save space as there used to be int but there's no reason for that.
Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
c18e3235 |
| 02-Dec-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: reserve extra space for the free space tree
Filipe reported a problem where sometimes he'd get an ENOSPC abort when running delayed refs with generic/619 and the free space tree enabled. This
btrfs: reserve extra space for the free space tree
Filipe reported a problem where sometimes he'd get an ENOSPC abort when running delayed refs with generic/619 and the free space tree enabled. This is partly because we do not reserve space for modifying the free space tree, nor do we have a block rsv associated with that tree.
The delayed_refs_rsv tracks the amount of space required to run delayed refs. This means 1 modification means 1 change to the extent root. With the free space tree this turns into 2 changes, because modifying 1 extent means updating the extent tree and potentially updating the free space tree to either remove that entry or add the free space. Thus if we have the FST enabled, simply double the reservation size for our modification.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9506f953 |
| 02-Dec-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: include the free space tree in the global rsv minimum calculation
Filipe reported a problem where generic/619 was failing with an ENOSPC abort while running delayed refs, like the following
btrfs: include the free space tree in the global rsv minimum calculation
Filipe reported a problem where generic/619 was failing with an ENOSPC abort while running delayed refs, like the following
BTRFS: Transaction aborted (error -28) WARNING: CPU: 3 PID: 522920 at fs/btrfs/free-space-tree.c:1049 add_to_free_space_tree+0xe5/0x110 [btrfs] CPU: 3 PID: 522920 Comm: kworker/u16:19 Tainted: G W 5.16.0-rc2-btrfs-next-106 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] RIP: 0010:add_to_free_space_tree+0xe5/0x110 [btrfs] RSP: 0000:ffffa65087fb7b20 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff9131eeaa RDI: 00000000ffffffff RBP: ffff8d62e26481b8 R08: ffffffff9ad97ce0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffe4 R13: ffff8d61c25fe688 R14: ffff8d61ebd88800 R15: ffff8d61ebd88a90 FS: 0000000000000000(0000) GS:ffff8d64ed400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa46a8b1000 CR3: 0000000148d18003 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __btrfs_free_extent+0x516/0x950 [btrfs] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs] btrfs_run_delayed_refs+0x86/0x210 [btrfs] flush_space+0x403/0x630 [btrfs] ? call_rcu_tasks_generic+0x50/0x80 ? lock_release+0x223/0x4a0 ? btrfs_get_alloc_profile+0xb5/0x290 [btrfs] ? do_raw_spin_unlock+0x4b/0xa0 btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs] process_one_work+0x24c/0x5b0 worker_thread+0x55/0x3c0 ? process_one_work+0x5b0/0x5b0 kthread+0x17c/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30
There's a couple of reasons for this, but in generic/619's case the largest reason is because it is a very small file system, ad we do not reserve enough space for the global reserve.
With the free space tree we now have the free space tree that we need to modify when running delayed refs. This means we need the global reserve to take this into account when it calculates the minimum size it needs to be. This is especially important for very small file systems.
Fix this by adjusting the minimum global block rsv size math to include the size of the free space tree when calculating the size.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1 |
|
#
fc28b25e |
| 05-Nov-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: stop accessing ->csum_root directly
We are going to have multiple csum roots in the future, so convert all users of ->csum_root to btrfs_csum_root() and rename ->csum_root to ->_csum_root so
btrfs: stop accessing ->csum_root directly
We are going to have multiple csum roots in the future, so convert all users of ->csum_root to btrfs_csum_root() and rename ->csum_root to ->_csum_root so we can easily find remaining users in the future.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
29cbcf40 |
| 05-Nov-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: stop accessing ->extent_root directly
When we start having multiple extent roots we'll need to use a helper to get to the correct extent_root. Rename fs_info->extent_root to _extent_root and
btrfs: stop accessing ->extent_root directly
When we start having multiple extent roots we'll need to use a helper to get to the correct extent_root. Rename fs_info->extent_root to _extent_root and convert all of the users of the extent root to using the btrfs_extent_root() helper. This will allow us to easily clean up the remaining direct accesses in the future.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2e608bd1 |
| 05-Nov-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: init root block_rsv at init root time
In the future we're going to have multiple csum and extent root trees, so init the roots block_rsv at setup_root time based on their root key objectid.
btrfs: init root block_rsv at init root time
In the future we're going to have multiple csum and extent root trees, so init the roots block_rsv at setup_root time based on their root key objectid.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9270501c |
| 09-Nov-2021 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: change root to fs_info for btrfs_reserve_metadata_bytes
We used to need the root for btrfs_reserve_metadata_bytes to check the orphan cleanup state, but we no longer need that, we simply need
btrfs: change root to fs_info for btrfs_reserve_metadata_bytes
We used to need the root for btrfs_reserve_metadata_bytes to check the orphan cleanup state, but we no longer need that, we simply need the fs_info. Change btrfs_reserve_metadata_bytes() to use the fs_info, and change both btrfs_block_rsv_refill() and btrfs_block_rsv_add() to do the same as they simply call btrfs_reserve_metadata_bytes() and then manipulate the block_rsv that is being used.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16 |
|
#
42437a63 |
| 16-Oct-2020 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: introduce mount option rescue=ignorebadroots
In the face of extent root corruption, or any other core fs wide root corruption we will fail to mount the file system. This makes recovery kind
btrfs: introduce mount option rescue=ignorebadroots
In the face of extent root corruption, or any other core fs wide root corruption we will fail to mount the file system. This makes recovery kind of a pain, because you need to fall back to userspace tools to scrape off data. Instead provide a mechanism to gracefully handle bad roots, so we can at least mount read-only and possibly recover data from the file system.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|