#
25ac047c |
| 23-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove a pointless NULL check in btrfs_lookup_fs_root
btrfs_grab_root already checks for a NULL root itself.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Chris
btrfs: remove a pointless NULL check in btrfs_lookup_fs_root
btrfs_grab_root already checks for a NULL root itself.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e91909aa |
| 23-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: convert btrfs_get_global_root to use a switch statement
Use a switch statement instead of an endless chain of if statements to make the code a little cleaner.
Reviewed-by: Johannes Thumshirn
btrfs: convert btrfs_get_global_root to use a switch statement
Use a switch statement instead of an endless chain of if statements to make the code a little cleaner.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
85724171 |
| 23-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: fix the btrfs_get_global_root return value
btrfs_grab_root returns either the root or NULL, and the callers of btrfs_get_global_root expect it to return the same. But all the more recently a
btrfs: fix the btrfs_get_global_root return value
btrfs_grab_root returns either the root or NULL, and the callers of btrfs_get_global_root expect it to return the same. But all the more recently added roots instead return an ERR_PTR, so fix this.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
25984a5a |
| 24-May-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: consolidate uuid comparisons in btrfs_validate_super
There are three ways the fsid is validated in btrfs_validate_super():
- verify that super_copy::fsid is the same as fs_devices::fsid
- i
btrfs: consolidate uuid comparisons in btrfs_validate_super
There are three ways the fsid is validated in btrfs_validate_super():
- verify that super_copy::fsid is the same as fs_devices::fsid
- if the metadata_uuid flag is set, verify if super_copy::metadata_uuid and fs_devices::metadata_uuid are the same.
- a few lines below, often missed out, verify if dev_item::fsid is the same as fs_devices::metadata_uuid.
The function btrfs_validate_super() contains multiple if-statements with memcmp() to check UUIDs. This patch consolidates them into a single location.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
413fb1bc |
| 24-May-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: return bool from check_tree_block_fsid instead of int
Simplify the return type of check_tree_block_fsid() from int (1 or 0) to bool. Its only user is interested in knowing the success or fail
btrfs: return bool from check_tree_block_fsid instead of int
Simplify the return type of check_tree_block_fsid() from int (1 or 0) to bool. Its only user is interested in knowing the success or failure.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9e2aff90 |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: stop using lock_extent in btrfs_buffer_uptodate
The only other place that locks extents on the btree inode is read_extent_buffer_subpage while reading in the partial page for a buffer. This
btrfs: stop using lock_extent in btrfs_buffer_uptodate
The only other place that locks extents on the btree inode is read_extent_buffer_subpage while reading in the partial page for a buffer. This means locking the extent in btrfs_buffer_uptodate does not synchronize with anything on non-subpage file systems, and on subpage file systems it only waits for a parallel read(-ahead) to finish, which seems to be counter to what the callers actually expect.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
31d89399 |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove the extent_buffer lookup in btree block checksumming
The checksumming of btree blocks always operates on the entire extent_buffer, and because btree blocks are always allocated contigu
btrfs: remove the extent_buffer lookup in btree block checksumming
The checksumming of btree blocks always operates on the entire extent_buffer, and because btree blocks are always allocated contiguously on disk they are never split by btrfs_submit_bio.
Simplify the checksumming code by finding the extent_buffer in the btrfs_bio private data instead of trying to search through the bio_vec.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
046b562b |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: use a separate end_io handler for read_extent_buffer
Now that we always use a single bio to read an extent_buffer, the buffer can be passed to the end_io handler as private data. This allows
btrfs: use a separate end_io handler for read_extent_buffer
Now that we always use a single bio to read an extent_buffer, the buffer can be passed to the end_io handler as private data. This allows implementing a much simplified dedicated end I/O handler for metadata reads.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
d87e6575 |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
verify_parent_transid is only called by btrfs_buffer_uptodate, which confusingly inverts the return value. Merge the two functions and r
btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
verify_parent_transid is only called by btrfs_buffer_uptodate, which confusingly inverts the return value. Merge the two functions and reflow the parent_transid so that error handling is in a branch.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
aebcc159 |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: move setting the buffer uptodate out of validate_extent_buffer
Setting the buffer uptodate in a function that is named as a validation helper is a it confusing. Move the call from validate_e
btrfs: move setting the buffer uptodate out of validate_extent_buffer
Setting the buffer uptodate in a function that is named as a validation helper is a it confusing. Move the call from validate_extent_buffer to the one of its two callers that didn't already have a duplicate call to set_extent_buffer_uptodate.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f880fe6e |
| 08-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: don't hold an extra reference for redirtied buffers
When btrfs_redirty_list_add redirties a buffer, it also acquires an extra reference that is released on transaction commit. But this is no
btrfs: don't hold an extra reference for redirtied buffers
When btrfs_redirty_list_add redirties a buffer, it also acquires an extra reference that is released on transaction commit. But this is not required as buffers that are dirty or under writeback are never freed (look for calls to extent_buffer_under_io())).
Remove the extra reference and the infrastructure used to drop it again.
History behind redirty logic:
In the first place, it used releasing_list to hold all the to-be-released extent buffers, and decided which buffers to re-dirty at the commit time. Then, in a later version, the behaviour got changed to re-dirty a necessary buffer and add re-dirtied one to the list in btrfs_free_tree_block(). In short, the list was there mostly for the patch series' historical reason.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ add Naohiro's comment regarding history ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f18cc978 |
| 08-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: fix dirty_metadata_bytes for redirtied buffers
dirty_metadata_bytes is decremented in both places that clear the dirty bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which
btrfs: fix dirty_metadata_bytes for redirtied buffers
dirty_metadata_bytes is decremented in both places that clear the dirty bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which means that a buffer that is redirtied using btrfs_redirty_list_add won't be added to dirty_metadata_bytes, but it will be subtracted when written out, leading an inconsistency in the counter.
Move the dirty_metadata_bytes from btrfs_mark_buffer_dirty into set_extent_buffer_dirty to also account for the redirty case, and remove the now unused set_extent_buffer_dirty return value.
Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2cac5af1 |
| 29-Apr-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: move btrfs_verify_level_key into tree-checker.c
This is more a buffer validation helper, move it into the tree-checker files where it makes more sense.
Reviewed-by: Johannes Thumshirn <johan
btrfs: move btrfs_verify_level_key into tree-checker.c
This is more a buffer validation helper, move it into the tree-checker files where it makes more sense.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
85d8a826 |
| 29-Apr-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: simplify btrfs_check_leaf_* helpers into a single helper
We have two helpers for checking leaves, because we have an extra check for debugging in btrfs_mark_buffer_dirty(), and at that stage
btrfs: simplify btrfs_check_leaf_* helpers into a single helper
We have two helpers for checking leaves, because we have an extra check for debugging in btrfs_mark_buffer_dirty(), and at that stage we may have item data that isn't consistent yet. However we can handle this case internally in the helper, if BTRFS_HEADER_FLAG_WRITTEN is set we know the buffer should be internally consistent, otherwise we need to skip checking the item data.
Simplify this helper down a single helper and handle the item data checking logic internally to the helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
8bfec2e4 |
| 03-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove hipri_workers workqueue
Now that btrfs_wq_submit_bio is never called for synchronous I/O, the hipri_workers workqueue is not used anymore and can be removed.
Reviewed-by: Chris Mason
btrfs: remove hipri_workers workqueue
Now that btrfs_wq_submit_bio is never called for synchronous I/O, the hipri_workers workqueue is not used anymore and can be removed.
Reviewed-by: Chris Mason <clm@fb.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
745806fb |
| 10-Jun-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: do not ASSERT() on duplicated global roots
[BUG] Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot mount option on a corrupted fs.
The full report can be found here: htt
btrfs: do not ASSERT() on duplicated global roots
[BUG] Syzbot reports a reproducible ASSERT() when using rescue=usebackuproot mount option on a corrupted fs.
The full report can be found here: https://syzkaller.appspot.com/bug?extid=c4614eae20a166c25bf0
BTRFS error (device loop0: state C): failed to load root csum assertion failed: !tmp, in fs/btrfs/disk-io.c:1103 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3664! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 3608 Comm: syz-executor356 Not tainted 6.0.0-rc7-syzkaller-00029-g3800a713b607 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 RIP: 0010:assertfail+0x1a/0x1c fs/btrfs/ctree.h:3663 RSP: 0018:ffffc90003aaf250 EFLAGS: 00010246 RAX: 0000000000000032 RBX: 0000000000000000 RCX: f21c13f886638400 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: ffff888021c640a0 R08: ffffffff816bd38d R09: ffffed10173667f1 R10: ffffed10173667f1 R11: 1ffff110173667f0 R12: dffffc0000000000 R13: ffff8880229c21f7 R14: ffff888021c64060 R15: ffff8880226c0000 FS: 0000555556a73300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a2637d7a00 CR3: 00000000709c4000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btrfs_global_root_insert+0x1a7/0x1b0 fs/btrfs/disk-io.c:1103 load_global_roots_objectid+0x482/0x8c0 fs/btrfs/disk-io.c:2467 load_global_roots fs/btrfs/disk-io.c:2501 [inline] btrfs_read_roots fs/btrfs/disk-io.c:2528 [inline] init_tree_roots+0xccb/0x203c fs/btrfs/disk-io.c:2939 open_ctree+0x1e53/0x33df fs/btrfs/disk-io.c:3574 btrfs_fill_super+0x1c6/0x2d0 fs/btrfs/super.c:1456 btrfs_mount_root+0x885/0x9a0 fs/btrfs/super.c:1824 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1530 fc_mount fs/namespace.c:1043 [inline] vfs_kern_mount+0xc9/0x160 fs/namespace.c:1073 btrfs_mount+0x3d3/0xbb0 fs/btrfs/super.c:1884
[CAUSE] Since the introduction of global roots, we handle csum/extent/free-space-tree roots as global roots, even if no extent-tree-v2 feature is enabled.
So for regular csum/extent/fst roots, we load them into fs_info::global_root_tree rb tree.
And we should not expect any conflicts in that rb tree, thus we have an ASSERT() inside btrfs_global_root_insert().
But rescue=usebackuproot can break the assumption, as we will try to load those trees again and again as long as we have bad roots and have backup roots slot remaining.
So in that case we can have conflicting roots in the rb tree, and triggering the ASSERT() crash.
[FIX] We can safely remove that ASSERT(), as the caller will properly put the offending root.
To make further debugging easier, also add two explicit error messages:
- Error message for conflicting global roots - Error message when using backup roots slot
Reported-by: syzbot+a694851c6ab28cbcfb9c@syzkaller.appspotmail.com Fixes: abed4aaae4f7 ("btrfs: track the csum, extent, and free space trees in a rb tree") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
917ac778 |
| 26-May-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: subpage: fix a crash in metadata repair path
[BUG] Test case btrfs/027 would crash with subpage (64K page size, 4K sectorsize) with the following dying messages:
debug: map_length=16384 le
btrfs: subpage: fix a crash in metadata repair path
[BUG] Test case btrfs/027 would crash with subpage (64K page size, 4K sectorsize) with the following dying messages:
debug: map_length=16384 length=65536 type=metadata|raid6(0x104) assertion failed: map_length >= length, in fs/btrfs/volumes.c:8093 ------------[ cut here ]------------ kernel BUG at fs/btrfs/messages.c:259! Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_assertfail+0x28/0x2c [btrfs] btrfs_map_repair_block+0x150/0x2b8 [btrfs] btrfs_repair_io_failure+0xd4/0x31c [btrfs] btrfs_read_extent_buffer+0x150/0x16c [btrfs] read_tree_block+0x38/0xbc [btrfs] read_tree_root_path+0xfc/0x1bc [btrfs] btrfs_get_root_ref.part.0+0xd4/0x3a8 [btrfs] open_ctree+0xa30/0x172c [btrfs] btrfs_mount_root+0x3c4/0x4a4 [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xec vfs_kern_mount.part.0+0x90/0xd4 vfs_kern_mount+0x14/0x28 btrfs_mount+0x114/0x418 [btrfs] legacy_get_tree+0x30/0x60 vfs_get_tree+0x28/0xec path_mount+0x3e0/0xb64 __arm64_sys_mount+0x200/0x2d8 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x60/0x11c do_el0_svc+0x38/0x98 el0_svc+0x40/0xa8 el0t_64_sync_handler+0xf4/0x120 el0t_64_sync+0x190/0x194 Code: aa0403e2 b0fff060 91010000 959c2024 (d4210000)
[CAUSE] In btrfs/027 we test RAID6 with missing devices, in this particular case, we're repairing a metadata at the end of a data stripe.
But at btrfs_repair_io_failure(), we always pass a full PAGE for repair, and for subpage case this can cross stripe boundary and lead to the above BUG_ON().
This metadata repair code is always there, since the introduction of subpage support, but this can trigger BUG_ON() after the bio split ability at btrfs_map_bio().
[FIX] Instead of passing the old PAGE_SIZE, we calculate the correct length based on the eb size and page size for both regular and subpage cases.
CC: stable@vger.kernel.org # 6.3+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5ad9b471 |
| 23-May-2023 |
pengfuyuan <pengfuyuan@kylinos.cn> |
btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds
When compiling on a MIPS 64-bit machine we get these warnings:
In file included from ./arch/mips/include/asm/
btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds
When compiling on a MIPS 64-bit machine we get these warnings:
In file included from ./arch/mips/include/asm/cacheflush.h:13, from ./include/linux/cacheflush.h:5, from ./include/linux/highmem.h:8, from ./include/linux/bvec.h:10, from ./include/linux/blk_types.h:10, from ./include/linux/blkdev.h:9, from fs/btrfs/disk-io.c:7: fs/btrfs/disk-io.c: In function ‘csum_tree_block’: fs/btrfs/disk-io.c:100:34: error: array subscript 1 is above array bounds of ‘struct page *[1]’ [-Werror=array-bounds] 100 | kaddr = page_address(buf->pages[i]); | ~~~~~~~~~~^~~ ./include/linux/mm.h:2135:48: note: in definition of macro ‘page_address’ 2135 | #define page_address(page) lowmem_page_address(page) | ^~~~ cc1: all warnings being treated as errors
We can check if i overflows to solve the problem. However, this doesn't make much sense, since i == 1 and num_pages == 1 doesn't execute the body of the loop. In addition, i < num_pages can also ensure that buf->pages[i] will not cross the boundary. Unfortunately, this doesn't help with the problem observed here: gcc still complains.
To fix this add a compile-time condition for the extent buffer page array size limit, which would eventually lead to eliminating the whole for loop.
CC: stable@vger.kernel.org # 5.10+ Signed-off-by: pengfuyuan <pengfuyuan@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
597441b3 |
| 11-May-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: use nofs when cleaning up aborted transactions
Our CI system caught a lockdep splat:
====================================================== WARNING: possible circular locking dependency
btrfs: use nofs when cleaning up aborted transactions
Our CI system caught a lockdep splat:
====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc7+ #1167 Not tainted ------------------------------------------------------ kswapd0/46 is trying to acquire lock: ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120
but task is already holding lock: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0xa5/0xe0 kmem_cache_alloc+0x31/0x2c0 alloc_extent_state+0x1d/0xd0 __clear_extent_bit+0x2e0/0x4f0 try_release_extent_mapping+0x216/0x280 btrfs_release_folio+0x2e/0x90 invalidate_inode_pages2_range+0x397/0x470 btrfs_cleanup_dirty_bgs+0x9e/0x210 btrfs_cleanup_one_transaction+0x22/0x760 btrfs_commit_transaction+0x3b7/0x13a0 create_subvol+0x59b/0x970 btrfs_mksubvol+0x435/0x4f0 __btrfs_ioctl_snap_create+0x11e/0x1b0 btrfs_ioctl_snap_create_v2+0xbf/0x140 btrfs_ioctl+0xa45/0x28f0 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc
-> #0 (sb_internal#2){++++}-{0:0}: __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 start_transaction+0x401/0x730 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 kthread+0xf0/0x120 ret_from_fork+0x29/0x50
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sb_internal#2); lock(fs_reclaim); lock(sb_internal#2);
*** DEADLOCK ***
3 locks held by kswapd0/46: #0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0 #1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290 #2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0
stack backtrace: CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x58/0x90 check_noncircular+0xd6/0x100 ? save_trace+0x3f/0x310 ? add_lock_to_list+0x97/0x120 __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 start_transaction+0x401/0x730 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 ? lock_release+0x134/0x270 ? __pfx_wake_bit_function+0x10/0x10 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 ? __pfx_inode_lru_isolate+0x10/0x10 ? __pfx_inode_lru_isolate+0x10/0x10 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_kswapd+0x10/0x10 kthread+0xf0/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK>
This happens because when we abort the transaction in the transaction commit path we call invalidate_inode_pages2_range on our block group cache inodes (if we have space cache v1) and any delalloc inodes we may have. The plain invalidate_inode_pages2_range() call passes through GFP_KERNEL, which makes sense in most cases, but not here. Wrap these two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to make sure we don't end up with the fs reclaim dependency under the transaction dependency.
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1d6a4fc8 |
| 28-Apr-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: make clear_cache mount option to rebuild FST without disabling it
Previously clear_cache mount option would simply disable free-space-tree feature temporarily then re-enable it to rebuild the
btrfs: make clear_cache mount option to rebuild FST without disabling it
Previously clear_cache mount option would simply disable free-space-tree feature temporarily then re-enable it to rebuild the whole free space tree.
But this is problematic for block-group-tree feature, as we have an artificial dependency on free-space-tree feature.
If we go the existing method, after clearing the free-space-tree feature, we would flip the filesystem to read-only mode, as we detect a super block write with block-group-tree but no free-space-tree feature.
This patch would change the behavior by properly rebuilding the free space tree without disabling this feature, thus allowing clear_cache mount option to work with block group tree.
Now we can mount a filesystem with block-group-tree feature and clear_mount option:
$ mkfs.btrfs -O block-group-tree /dev/test/scratch1 -f $ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache $ sudo dmesg -t | head -n 5 BTRFS info (device dm-1): force clearing of disk cache BTRFS info (device dm-1): using free space tree BTRFS info (device dm-1): auto enabling async discard BTRFS info (device dm-1): rebuilding free space tree BTRFS info (device dm-1): checking UUID tree
CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.22 |
|
#
7e812f20 |
| 27-Mar-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: use test_and_clear_bit() in wait_dev_flush()
The function wait_dev_flush() tests for the BTRFS_DEV_STATE_FLUSH_SENT bit and then clears it separately. Instead, use test_and_clear_bit(). Thoug
btrfs: use test_and_clear_bit() in wait_dev_flush()
The function wait_dev_flush() tests for the BTRFS_DEV_STATE_FLUSH_SENT bit and then clears it separately. Instead, use test_and_clear_bit(). Though we don't need to do the atomic test and clear, it's following a common pattern.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1b465784 |
| 27-Mar-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: change wait_dev_flush() return type to bool
The flush error code is maintained in btrfs_device::last_flush_error, so there is no point in returning it in wait_dev_flush() when it is not being
btrfs: change wait_dev_flush() return type to bool
The flush error code is maintained in btrfs_device::last_flush_error, so there is no point in returning it in wait_dev_flush() when it is not being used. Instead, we can return a boolean value.
Note that even though btrfs_device::last_flush_error may not be used, we will keep it for now.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
de38a206 |
| 27-Mar-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: open code check_barrier_error()
check_barrier_error() is almost a single line function, and just calls btrfs_check_rw_degradable(). Instead, open code it.
Signed-off-by: Anand Jain <anand.ja
btrfs: open code check_barrier_error()
check_barrier_error() is almost a single line function, and just calls btrfs_check_rw_degradable(). Instead, open code it.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
bfd3ea94 |
| 27-Mar-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: move last_flush_error to write_dev_flush and wait_dev_flush
We parallelize the flush command across devices using our own code, write_dev_flush() sends the flush command to each device and wa
btrfs: move last_flush_error to write_dev_flush and wait_dev_flush
We parallelize the flush command across devices using our own code, write_dev_flush() sends the flush command to each device and wait_dev_flush() waits for the flush to complete on all devices. Errors from each device are recorded at device->last_flush_error and reset to BLK_STS_OK in write_dev_flush() and to the error, if any, in wait_dev_flush(). These functions are called from barrier_all_devices().
This patch consolidates the use of device->last_flush_error in write_dev_flush() and wait_dev_flush() to remove it from barrier_all_devices().
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.21 |
|
#
a8fdc051 |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove obsolete delayed ref throttling logic when truncating items
We have this logic encapsulated in btrfs_should_throttle_delayed_refs() where we try to estimate if running the current amou
btrfs: remove obsolete delayed ref throttling logic when truncating items
We have this logic encapsulated in btrfs_should_throttle_delayed_refs() where we try to estimate if running the current amount of delayed references we have will take more than half a second, and if so, the caller btrfs_should_throttle_delayed_refs() should do something to prevent more and more delayed refs from being accumulated.
This logic was added in commit 0a2b2a844af6 ("Btrfs: throttle delayed refs better") and then further refined in commit a79b7d4b3e81 ("Btrfs: async delayed refs"). The idea back then was that the caller of btrfs_should_throttle_delayed_refs() would release its transaction handle (by calling btrfs_end_transaction()) when that function returned true, then btrfs_end_transaction() would trigger an async job to run delayed references in a workqueue, and later start/join a transaction again and do more work.
However we don't run delayed references asynchronously anymore, that was removed in commit db2462a6ad3d ("btrfs: don't run delayed refs in the end transaction logic"). That makes the logic that tries to estimate how long we will take to run our current delayed references, at btrfs_should_throttle_delayed_refs(), pointless as we don't take any action to run delayed references anymore. We do have other type of throttling, which consists of checking the size and reserved space of the delayed and global block reserves, as well as if fluhsing delayed references for the current transaction was already started, etc - this is all done by btrfs_should_end_transaction(), and the only user of btrfs_should_throttle_delayed_refs() does periodically call btrfs_should_end_transaction().
So remove btrfs_should_throttle_delayed_refs() and the infrastructure that keeps track of the average time used for running delayed references, as well as adapting btrfs_truncate_inode_items() to call btrfs_check_space_for_delayed_refs() instead.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|