Revision tags: v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22 |
|
#
8ba7d5f5 |
| 23-Mar-2023 |
Genjian Zhang <zhanggenjian@kylinos.cn> |
btrfs: fix uninitialized variable warnings
There are some warnings on older compilers (gcc 10, 7) or non-x86_64 architectures (aarch64). As btrfs wants to enable -Wmaybe-uninitialized by default, f
btrfs: fix uninitialized variable warnings
There are some warnings on older compilers (gcc 10, 7) or non-x86_64 architectures (aarch64). As btrfs wants to enable -Wmaybe-uninitialized by default, fix the warnings even though it's not necessary on recent compilers (gcc 12+).
../fs/btrfs/volumes.c: In function ‘btrfs_init_new_device’: ../fs/btrfs/volumes.c:2703:3: error: ‘seed_devices’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2703 | btrfs_setup_sprout(fs_info, seed_devices); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../fs/btrfs/send.c: In function ‘get_cur_inode_state’: ../include/linux/compiler.h:70:32: error: ‘right_gen’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 70 | (__if_trace.miss_hit[1]++,1) : \ | ^ ../fs/btrfs/send.c:1878:6: note: ‘right_gen’ was declared here 1878 | u64 right_gen; | ^~~~~~~~~
Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.21 |
|
#
4886ff7b |
| 19-Mar-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: introduce a new helper to submit write bio for repair
Both scrub and read-repair are utilizing a special repair writes that:
- Only writes back to a single device Even for read-repair on R
btrfs: introduce a new helper to submit write bio for repair
Both scrub and read-repair are utilizing a special repair writes that:
- Only writes back to a single device Even for read-repair on RAID56, we only update the corrupted data stripe itself, not triggering the full RMW path.
- Requires a valid @mirror_num For RAID56 case, only @mirror_num == 1 is valid. For non-RAID56 cases, we need @mirror_num to locate our stripe.
- No data csum generation needed
These two call sites still have some differences though:
- Read-repair goes plain bio It doesn't need a full btrfs_bio, and goes submit_bio_wait().
- New scrub repair would go btrfs_bio To simplify both read and write path.
So here this patch would:
- Introduce a common helper, btrfs_map_repair_block() Due to the single device nature, we can use an on-stack btrfs_io_stripe to pass device and its physical bytenr.
- Introduce a new interface, btrfs_submit_repair_bio(), for later scrub code This is for the incoming scrub code.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f0bb5474 |
| 04-Apr-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: remove redundant release of btrfs_device::alloc_state
Commit 321f69f86a0f ("btrfs: reset device back to allocation state when removing") included adding extent_io_tree_release(&device->alloc_
btrfs: remove redundant release of btrfs_device::alloc_state
Commit 321f69f86a0f ("btrfs: reset device back to allocation state when removing") included adding extent_io_tree_release(&device->alloc_state) to btrfs_close_one_device(), which had already been called in btrfs_free_device().
The alloc_state tree (IO_TREE_DEVICE_ALLOC_STATE), is created in btrfs_alloc_device() and released in btrfs_close_one_device(). Therefore, the additional call to extent_io_tree_release(&device->alloc_state) in btrfs_free_device() is unnecessary and can be removed.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1f16033c |
| 04-Apr-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: warn for any missed cleanup at btrfs_close_one_device
During my recent search for the root cause of a reported bug, I realized that it's a good idea to issue a warning for missed cleanup inst
btrfs: warn for any missed cleanup at btrfs_close_one_device
During my recent search for the root cause of a reported bug, I realized that it's a good idea to issue a warning for missed cleanup instead of using debug-only assertions. Since most installations run with debug off, missed cleanups and premature calls to close could go unnoticed. However, these issues are serious enough to warrant reporting and fixing.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5758d1bd |
| 21-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: remove bytes_used argument from btrfs_make_block_group()
The only caller of btrfs_make_block_group() always passes 0 as the value for the bytes_used argument, so remove it.
Reviewed-by: Jose
btrfs: remove bytes_used argument from btrfs_make_block_group()
The only caller of btrfs_make_block_group() always passes 0 as the value for the bytes_used argument, so remove it.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13 |
|
#
5f50fa91 |
| 22-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: do not use replace target device as an extra mirror
[BUG] Currently btrfs can use dev-replace device as an extra mirror for read-repair. But it can lead to NODATASUM corruption in the follow
btrfs: do not use replace target device as an extra mirror
[BUG] Currently btrfs can use dev-replace device as an extra mirror for read-repair. But it can lead to NODATASUM corruption in the following case:
There is a RAID1 data chunk, and dev-replace is running from dev2 to dev0.
|//| = Replaced data X X+1MB X+2MB Dev 2: | | | <- Source dev Dev 0: |///////| | <- Target dev
Then a read on dev 2 X+2MB happens. And something wrong happened inside devid 2, causing an -EIO.
In that case, read-repair would try the next mirror, and since we can use target device as an extra mirror, we will use that mirror instead.
But unfortunately since the read is beyond the current replace cursor, we should not trust it at all, what we get would be just uninitialized garbage.
But if this read is for NODATASUM range, then we just trust them and cause data corruption.
[CAUSE] We used to have some checks to make sure we only return such extra mirror when the range is before our left cursor.
The first commit introducing this behavior is ad6d620e2a57 ("Btrfs: allow repair code to include target disk when searching mirrors").
But later a fix, 22ab04e814f4 ("Btrfs: fix race between device replace and chunk allocation") changed the behavior, to always let btrfs_map_block() include the extra mirror to address a race in dev-replace which can cause missing writes to target device.
This means, we lose the tracking of cursor for the extra mirror, thus can lead to above corruption.
[FIX] The extra mirror is never a reliable one, at the beginning of dev-replace, the reliability is zero, while only at the end of the replace it's a fully reliable mirror.
We either do the complex tracking, or never trust it.
IMHO it's much easier to maintain if we don't trust it at all, and the extra mirror can only benefit for a limited period of time (during replace).
Thus this patch would completely remove the ability to use target device as an extra mirror.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.2 |
|
#
18d758a2 |
| 16-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: replace btrfs_io_context::raid_map with a fixed u64 value
In btrfs_io_context structure, we have a pointer raid_map, which indicates the logical bytenr for each stripe.
But considering we al
btrfs: replace btrfs_io_context::raid_map with a fixed u64 value
In btrfs_io_context structure, we have a pointer raid_map, which indicates the logical bytenr for each stripe.
But considering we always call sort_parity_stripes(), the result raid_map[] is always sorted, thus raid_map[0] is always the logical bytenr of the full stripe.
So why we waste the space and time (for sorting) for raid_map?
This patch will replace btrfs_io_context::raid_map with a single u64 number, full_stripe_start, by:
- Replace btrfs_io_context::raid_map with full_stripe_start
- Replace call sites using raid_map[0] to use full_stripe_start
- Replace call sites using raid_map[i] to compare with nr_data_stripes.
The benefits are:
- Less memory wasted on raid_map It's sizeof(u64) * num_stripes vs sizeof(u64). It'll always save at least one u64, and the benefit grows larger with num_stripes.
- No more weird alloc_btrfs_io_context() behavior As there is only one fixed size + one variable length array.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.12, v6.1.11 |
|
#
1faf3885 |
| 06-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: use an efficient way to represent source of duplicated stripes
For btrfs dev-replace, we have to duplicate writes to the source device into the target device.
For non-RAID56, all writes into
btrfs: use an efficient way to represent source of duplicated stripes
For btrfs dev-replace, we have to duplicate writes to the source device into the target device.
For non-RAID56, all writes into the same mapped ranges are sharing the same content, thus they don't really need to bother anything. (E.g. in btrfs_submit_bio() for non-RAID56 range we just submit the same write to all involved devices).
But for RAID56, all stripes contain different content, thus we must have a clear mapping of which stripe is duplicated from which original stripe.
Currently we use a complex way using tgtdev_map[] array, e.g:
num_tgtdevs = 1 tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, and it's duplicated to stripes[3]. tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace.
But this is wasting some space, and ignores one important thing for dev-replace, there is at most one running replace.
Thus we can change it to a fixed array to represent the mapping:
replace_nr_stripes = 1 replace_stripe_src = 1 <- Means stripes[1] is involved in replace. thus the extra stripe is a copy of stripes[1]
By this we can save some space for bioc on RAID56 chunks with many devices. And we get rid of one variable sized array from bioc.
Thus the patch involves the following changes:
- Replace @num_tgtdevs and @tgtdev_map[] with @replace_nr_stripes and @replace_stripe_src.
@num_tgtdevs is just renamed to @replace_nr_stripes. While the mapping is completely changed.
- Add extra ASSERT()s for RAID56 code
- Only add two more extra stripes for dev-replace cases. As we have an upper limit on how many dev-replace stripes we can have.
- Unify the behavior of handle_ops_on_dev_replace() Previously handle_ops_on_dev_replace() go two different paths for WRITE and GET_READ_MIRRORS. Now unify them by always going the WRITE path first (with at most 2 replace stripes), then if we're doing GET_READ_MIRRORS and we have 2 extra stripes, just drop one stripe.
- Remove the @real_stripes argument from alloc_btrfs_io_context() As we don't need the old variable length array any more.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
4ced85f8 |
| 06-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: reduce type width of btrfs_io_contexts
That structure is our ultimate object for all __btrfs_map_block() related functions. We have some hard to understand members, like tgtdev_map, but with
btrfs: reduce type width of btrfs_io_contexts
That structure is our ultimate object for all __btrfs_map_block() related functions. We have some hard to understand members, like tgtdev_map, but without any comments.
This patch will improve the situation:
- Add extra comments for num_stripes, mirror_num, num_tgtdevs and tgtdev_map[] Especially for the last two members, add a dedicated (thus very long) comments for them, with example to explain it.
- Shrink those int members to u16. In fact our on-disk format is only using u16 for num_stripes, thus no need to use int at all.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
be5c7edb |
| 06-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: simplify the bioc argument for handle_ops_on_dev_replace()
There is no memory re-allocation for handle_ops_on_dev_replace(), thus we don't need to pass a btrfs_io_context pointer.
Reviewed-b
btrfs: simplify the bioc argument for handle_ops_on_dev_replace()
There is no memory re-allocation for handle_ops_on_dev_replace(), thus we don't need to pass a btrfs_io_context pointer.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
6ded22c1 |
| 16-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32
There are quite some div64 calls inside btrfs_map_block() and its variants.
Such calls are for @stripe_nr, where @strip
btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32
There are quite some div64 calls inside btrfs_map_block() and its variants.
Such calls are for @stripe_nr, where @stripe_nr is the number of stripes before our logical bytenr inside a chunk.
However we can eliminate such div64 calls by just reducing the width of @stripe_nr from 64 to 32.
This can be done because our chunk size limit is already 10G, with fixed stripe length 64K. Thus a U32 is definitely enough to contain the number of stripes.
With such width reduction, we can get rid of slower div64, and extra warning for certain 32bit arch.
This patch would do:
- Add a new tree-checker chunk validation on chunk length Make sure no chunk can reach 256G, which can also act as a bitflip checker.
- Reduce the width from u64 to u32 for @stripe_nr variables
- Replace unnecessary div64 calls with regular modulo and division 32bit division and modulo are much faster than 64bit operations, and we are finally free of the div64 fear at least in those involved functions.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a97699d1 |
| 16-Feb-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN
Currently btrfs doesn't support stripe lengths other than 64KiB. This is already set in the tree-checker.
There is really no meaning to rec
btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN
Currently btrfs doesn't support stripe lengths other than 64KiB. This is already set in the tree-checker.
There is really no meaning to record that fixed value in map_lookup for now, and can all be replaced with BTRFS_STRIPE_LEN.
Furthermore we can use the fix stripe length to do the following optimization:
- Use BTRFS_STRIPE_LEN_SHIFT to replace some 64bit division Now we only need to do a right shift.
And the value of BTRFS_STRIPE_LEN itself is already too large for bit shift, thus if we accidentally use BTRFS_STRIPE_LEN to do bit shift, a compiler warning would be triggered.
Thus this bit shift optimization would be safe.
- Use BTRFS_STRIPE_LEN_MASK to calculate the offset inside a stripe
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2d82a40a |
| 22-Mar-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: fix deadlock when aborting transaction during relocation with scrub
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requi
btrfs: fix deadlock when aborting transaction during relocation with scrub
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANS_STATE_COMMIT_START), we will deadlock if there is a paused scrub.
That results in stack traces like the following:
[42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? start_transaction+0xcb/0x610 [btrfs] [42.936] prepare_to_relocate+0x111/0x1a0 [btrfs] [42.936] relocate_block_group+0x57/0x5d0 [btrfs] [42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [42.936] ? __pfx_autoremove_wake_function+0x10/0x10 [42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [42.936] btrfs_balance+0x8ff/0x11d0 [btrfs] [42.936] ? __kmem_cache_alloc_node+0x14a/0x410 [42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs] [42.937] ? mod_objcg_state+0xd2/0x360 [42.937] ? refill_obj_stock+0xb0/0x160 [42.937] ? seq_release+0x25/0x30 [42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [42.937] ? percpu_counter_add_batch+0x2e/0xa0 [42.937] ? __x64_sys_ioctl+0x88/0xc0 [42.937] __x64_sys_ioctl+0x88/0xc0 [42.937] do_syscall_64+0x38/0x90 [42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.196] task:btrfs state:D stack:0 pid:346772 ppid:1 flags:0x00004002 [59.196] Call Trace: [59.196] <TASK> [59.196] __schedule+0x392/0xa70 [59.196] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.196] schedule+0x5d/0xd0 [59.196] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.197] ? __pfx_autoremove_wake_function+0x10/0x10 [59.197] scrub_pause_off+0x21/0x50 [btrfs] [59.197] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.197] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] scrub_stripe+0x20d/0x740 [btrfs] [59.198] scrub_chunk+0xc4/0x130 [btrfs] [59.198] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.199] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.199] ? _copy_from_user+0x7b/0x80 [59.199] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.199] ? refill_stock+0x33/0x50 [59.199] ? should_failslab+0xa/0x20 [59.199] ? kmem_cache_alloc_node+0x151/0x460 [59.199] ? alloc_io_context+0x1b/0x80 [59.199] ? preempt_count_add+0x70/0xa0 [59.199] ? __x64_sys_ioctl+0x88/0xc0 [59.199] __x64_sys_ioctl+0x88/0xc0 [59.199] do_syscall_64+0x38/0x90 [59.199] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.199] RIP: 0033:0x7f82ffaffe9b [59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b [59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003 [59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640 [59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.199] </TASK> [59.199] INFO: task btrfs:346773 blocked for more than 120 seconds. [59.200] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.201] task:btrfs state:D stack:0 pid:346773 ppid:1 flags:0x00004002 [59.201] Call Trace: [59.201] <TASK> [59.201] __schedule+0x392/0xa70 [59.201] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.201] schedule+0x5d/0xd0 [59.201] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.201] ? __pfx_autoremove_wake_function+0x10/0x10 [59.201] scrub_pause_off+0x21/0x50 [btrfs] [59.202] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.202] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.202] ? __pfx_autoremove_wake_function+0x10/0x10 [59.202] scrub_stripe+0x20d/0x740 [btrfs] [59.202] scrub_chunk+0xc4/0x130 [btrfs] [59.203] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.203] ? __pfx_autoremove_wake_function+0x10/0x10 [59.203] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.203] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.203] ? _copy_from_user+0x7b/0x80 [59.203] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.204] ? should_failslab+0xa/0x20 [59.204] ? kmem_cache_alloc_node+0x151/0x460 [59.204] ? alloc_io_context+0x1b/0x80 [59.204] ? preempt_count_add+0x70/0xa0 [59.204] ? __x64_sys_ioctl+0x88/0xc0 [59.204] __x64_sys_ioctl+0x88/0xc0 [59.204] do_syscall_64+0x38/0x90 [59.204] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.204] RIP: 0033:0x7f82ffaffe9b [59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b [59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003 [59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640 [59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.204] </TASK> [59.204] INFO: task btrfs:346774 blocked for more than 120 seconds. [59.205] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.206] task:btrfs state:D stack:0 pid:346774 ppid:1 flags:0x00004002 [59.206] Call Trace: [59.206] <TASK> [59.206] __schedule+0x392/0xa70 [59.206] schedule+0x5d/0xd0 [59.206] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.206] ? __pfx_autoremove_wake_function+0x10/0x10 [59.206] scrub_pause_off+0x21/0x50 [btrfs] [59.207] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.207] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.207] ? __pfx_autoremove_wake_function+0x10/0x10 [59.207] scrub_stripe+0x20d/0x740 [btrfs] [59.208] scrub_chunk+0xc4/0x130 [btrfs] [59.208] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.208] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.208] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.208] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.209] ? _copy_from_user+0x7b/0x80 [59.209] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.209] ? should_failslab+0xa/0x20 [59.209] ? kmem_cache_alloc_node+0x151/0x460 [59.209] ? alloc_io_context+0x1b/0x80 [59.209] ? preempt_count_add+0x70/0xa0 [59.209] ? __x64_sys_ioctl+0x88/0xc0 [59.209] __x64_sys_ioctl+0x88/0xc0 [59.209] do_syscall_64+0x38/0x90 [59.209] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.209] RIP: 0033:0x7f82ffaffe9b [59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b [59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003 [59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640 [59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.209] </TASK> [59.209] INFO: task btrfs:346775 blocked for more than 120 seconds. [59.210] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.211] task:btrfs state:D stack:0 pid:346775 ppid:1 flags:0x00004002 [59.211] Call Trace: [59.211] <TASK> [59.211] __schedule+0x392/0xa70 [59.211] schedule+0x5d/0xd0 [59.211] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.211] ? __pfx_autoremove_wake_function+0x10/0x10 [59.211] scrub_pause_off+0x21/0x50 [btrfs] [59.212] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.212] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.212] ? __pfx_autoremove_wake_function+0x10/0x10 [59.212] scrub_stripe+0x20d/0x740 [btrfs] [59.213] scrub_chunk+0xc4/0x130 [btrfs] [59.213] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.213] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.213] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.213] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.214] ? _copy_from_user+0x7b/0x80 [59.214] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.214] ? should_failslab+0xa/0x20 [59.214] ? kmem_cache_alloc_node+0x151/0x460 [59.214] ? alloc_io_context+0x1b/0x80 [59.214] ? preempt_count_add+0x70/0xa0 [59.214] ? __x64_sys_ioctl+0x88/0xc0 [59.214] __x64_sys_ioctl+0x88/0xc0 [59.214] do_syscall_64+0x38/0x90 [59.214] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.214] RIP: 0033:0x7f82ffaffe9b [59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b [59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003 [59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640 [59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.214] </TASK> [59.214] INFO: task btrfs:346776 blocked for more than 120 seconds. [59.215] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.217] task:btrfs state:D stack:0 pid:346776 ppid:1 flags:0x00004002 [59.217] Call Trace: [59.217] <TASK> [59.217] __schedule+0x392/0xa70 [59.217] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.217] schedule+0x5d/0xd0 [59.217] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.217] ? __pfx_autoremove_wake_function+0x10/0x10 [59.217] scrub_pause_off+0x21/0x50 [btrfs] [59.217] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.217] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.218] ? __pfx_autoremove_wake_function+0x10/0x10 [59.218] scrub_stripe+0x20d/0x740 [btrfs] [59.218] scrub_chunk+0xc4/0x130 [btrfs] [59.218] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.219] ? __pfx_autoremove_wake_function+0x10/0x10 [59.219] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.219] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.219] ? _copy_from_user+0x7b/0x80 [59.219] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.219] ? should_failslab+0xa/0x20 [59.219] ? kmem_cache_alloc_node+0x151/0x460 [59.219] ? alloc_io_context+0x1b/0x80 [59.219] ? preempt_count_add+0x70/0xa0 [59.219] ? __x64_sys_ioctl+0x88/0xc0 [59.219] __x64_sys_ioctl+0x88/0xc0 [59.219] do_syscall_64+0x38/0x90 [59.219] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.219] RIP: 0033:0x7f82ffaffe9b [59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b [59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003 [59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640 [59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.219] </TASK> [59.219] INFO: task btrfs:346822 blocked for more than 120 seconds. [59.220] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.222] task:btrfs state:D stack:0 pid:346822 ppid:1 flags:0x00004002 [59.222] Call Trace: [59.222] <TASK> [59.222] __schedule+0x392/0xa70 [59.222] schedule+0x5d/0xd0 [59.222] btrfs_scrub_cancel+0x91/0x100 [btrfs] [59.222] ? __pfx_autoremove_wake_function+0x10/0x10 [59.222] btrfs_commit_transaction+0x572/0xeb0 [btrfs] [59.223] ? start_transaction+0xcb/0x610 [btrfs] [59.223] prepare_to_relocate+0x111/0x1a0 [btrfs] [59.223] relocate_block_group+0x57/0x5d0 [btrfs] [59.223] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [59.223] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [59.224] ? __pfx_autoremove_wake_function+0x10/0x10 [59.224] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [59.224] btrfs_balance+0x8ff/0x11d0 [btrfs] [59.224] ? __kmem_cache_alloc_node+0x14a/0x410 [59.224] btrfs_ioctl+0x2334/0x32c0 [btrfs] [59.225] ? mod_objcg_state+0xd2/0x360 [59.225] ? refill_obj_stock+0xb0/0x160 [59.225] ? seq_release+0x25/0x30 [59.225] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [59.225] ? percpu_counter_add_batch+0x2e/0xa0 [59.225] ? __x64_sys_ioctl+0x88/0xc0 [59.225] __x64_sys_ioctl+0x88/0xc0 [59.225] do_syscall_64+0x38/0x90 [59.225] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.225] RIP: 0033:0x7f381a6ffe9b [59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [59.225] </TASK>
What happens is the following:
1) A scrub is running, so fs_info->scrubs_running is 1;
2) Task A starts block group relocation, and at btrfs_relocate_chunk() it pauses scrub by calling btrfs_scrub_pause(). That increments fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);
3) The scrub task pauses at scrub_pause_off(), waiting for fs_info->scrub_pause_req to decrease to 0;
4) Task A then enters btrfs_relocate_block_group(), and down that call chain we start a transaction and then attempt to commit it;
5) When task A calls btrfs_commit_transaction(), it either will do the commit itself or wait for some other task that already started the commit of the transaction - it doesn't matter which case;
6) The transaction commit enters state TRANS_STATE_COMMIT_START;
7) An error happens during the transaction commit, like -ENOSPC when running delayed refs or delayed items for example;
8) This results in calling transaction.c:cleanup_transaction(), where we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req from 0 to 1, and blocking this task waiting for fs_info->scrubs_running to decrease to 0;
9) From this point on, both the transaction commit and the scrub task hang forever:
1) The transaction commit is waiting for fs_info->scrubs_running to be decreased to 0;
2) The scrub task is at scrub_pause_off() waiting for fs_info->scrub_pause_req to decrease to 0 - so it can not proceed to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.
Therefore resulting in a deadlock.
Fix this by having cleanup_transaction(), called if a transaction commit fails, not call btrfs_scrub_cancel() if relocation is in progress, and having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if the relocation failed and a transaction abort happened.
This was triggered with btrfs/061 from fstests.
Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
50d281fc |
| 23-Mar-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd scan.
During the device scan initiated by systemd-udevd, other user space EXCL operation
btrfs: scan device in non-exclusive mode
This fixes mkfs/mount/check failures due to race with systemd-udevd scan.
During the device scan initiated by systemd-udevd, other user space EXCL operations such as mkfs, mount, or check may get blocked and result in a "Device or resource busy" error. This is because the device scan process opens the device with the EXCL flag in the kernel.
Two reports were received:
- btrfs/179 test case, where the fsck command failed with the -EBUSY error
- LTP pwritev03 test case, where mkfs.vfs failed with the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem on the device.
In both cases, fsck and mkfs (respectively) were racing with a systemd-udevd device scan, and systemd-udevd won, resulting in the -EBUSY error for fsck and mkfs.
Reproducing the problem has been difficult because there is a very small window during which these userspace threads can race to acquire the exclusive device open. Even on the system where the problem was observed, the problem occurrences were anywhere between 10 to 400 iterations and chances of reproducing decreases with debug printk()s.
However, an exclusive device open is unnecessary for the scan process, as there are no write operations on the device during scan. Furthermore, during the mount process, the superblock is re-read in the below function call chain:
btrfs_mount_root btrfs_open_devices open_fs_devices btrfs_open_one_device btrfs_get_bdev_and_sb
So, to fix this issue, removes the FMODE_EXCL flag from the scan operation, and add a comment.
The case where mkfs may still write to the device and a scan is running, the btrfs signature is not written at that time so scan will not recognize such device.
Reported-by: Sherry Yang <sherry.yang@oracle.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1c3ab6df |
| 01-Mar-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: handle missing chunk mapping more gracefully
[BUG] During my scrub rework, I did a stupid thing like this:
bio->bi_iter.bi_sector = stripe->logical; btrfs_submit_bio(fs_info,
btrfs: handle missing chunk mapping more gracefully
[BUG] During my scrub rework, I did a stupid thing like this:
bio->bi_iter.bi_sector = stripe->logical; btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
Above bi_sector assignment is using logical address directly, which lacks ">> SECTOR_SHIFT".
This results a read on a range which has no chunk mapping.
This results the following crash:
BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536 assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
Sure this is all my fault, but this shows a possible problem in real world, that some bit flip in file extents/tree block can point to unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT is not configured, cause invalid pointer access.
[PROBLEMS] In the above call chain, we just don't handle the possible error from btrfs_get_chunk_map() inside __btrfs_map_block().
[FIX] The fix is straightforward, replace the ASSERT() with proper error handling (callers handle errors already).
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8 |
|
#
f8a02dc6 |
| 21-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: remove struct btrfs_io_geometry
Now that btrfs_get_io_geometry has a single caller, we can massage it into a form that is more suitable for that caller and remove the marshalling into and out
btrfs: remove struct btrfs_io_geometry
Now that btrfs_get_io_geometry has a single caller, we can massage it into a form that is more suitable for that caller and remove the marshalling into and out of struct btrfs_io_geometry.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.7 |
|
#
67da05b3 |
| 17-Jan-2023 |
Colin Ian King <colin.i.king@gmail.com> |
btrfs: fix spelling mistakes found using codespell
There quite a few spelling mistakes as found using codespell. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David
btrfs: fix spelling mistakes found using codespell
There quite a few spelling mistakes as found using codespell. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5f58d783 |
| 20-Jan-2023 |
Anand Jain <anand.jain@oracle.com> |
btrfs: free device in btrfs_close_devices for a single device filesystem
We have this check to make sure we don't accidentally add older devices that may have disappeared and re-appeared with an old
btrfs: free device in btrfs_close_devices for a single device filesystem
We have this check to make sure we don't accidentally add older devices that may have disappeared and re-appeared with an older generation from being added to an fs_devices (such as a replace source device). This makes sense, we don't want stale disks in our file system. However for single disks this doesn't really make sense.
I've seen this in testing, but I was provided a reproducer from a project that builds btrfs images on loopback devices. The loopback device gets cached with the new generation, and then if it is re-used to generate a new file system we'll fail to mount it because the new fs is "older" than what we have in cache.
Fix this by freeing the cache when closing the device for a single device filesystem. This will ensure that the mount command passed device path is scanned successfully during the next mount.
CC: stable@vger.kernel.org # 5.10+ Reported-by: Daan De Meyer <daandemeyer@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
3c538de0 |
| 18-Jan-2023 |
Josef Bacik <josef@toxicpanda.com> |
btrfs: limit device extents to the device size
There was a recent regression in btrfs/177 that started happening with the size class patches ("btrfs: introduce size class to block group allocator").
btrfs: limit device extents to the device size
There was a recent regression in btrfs/177 that started happening with the size class patches ("btrfs: introduce size class to block group allocator"). This however isn't a regression introduced by those patches, but rather the bug was uncovered by a change in behavior in these patches. The patches triggered more chunk allocations in the ^free-space-tree case, which uncovered a race with device shrink.
The problem is we will set the device total size to the new size, and use this to find a hole for a device extent. However during shrink we may have device extents allocated past this range, so we could potentially find a hole in a range past our new shrink size. We don't actually limit our found extent to the device size anywhere, we assume that we will not find a hole past our device size. This isn't true with shrink as we're relocating block groups and thus creating holes past the device size.
Fix this by making sure we do not search past the new device size, and if we wander into any device extents that start after our device size simply break from the loop and use whatever hole we've already found.
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80 |
|
#
26ecf243 |
| 21-Nov-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: stop using write_one_page in btrfs_scratch_superblock
write_one_page is an awkward interface that expects the page locked and ->writepage to be implemented. Replace that by zeroing the signa
btrfs: stop using write_one_page in btrfs_scratch_superblock
write_one_page is an awkward interface that expects the page locked and ->writepage to be implemented. Replace that by zeroing the signature bytes and synchronize the block device page using the proper bdev helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
0e0078f7 |
| 21-Nov-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: factor out scratching of one regular super block
btrfs_scratch_superblocks open codes scratching super block of a non-zoned super block. Split the code to read, zero and write the superblock
btrfs: factor out scratching of one regular super block
btrfs_scratch_superblocks open codes scratching super block of a non-zoned super block. Split the code to read, zero and write the superblock for regular devices into a separate helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ed02363f |
| 11-Dec-2022 |
Qu Wenruo <wqu@suse.com> |
btrfs: add extra error messages to cover non-ENOMEM errors from device_add_list()
[BUG] When test case btrfs/219 (aka, mount a registered device but with a lower generation) failed, there is not any
btrfs: add extra error messages to cover non-ENOMEM errors from device_add_list()
[BUG] When test case btrfs/219 (aka, mount a registered device but with a lower generation) failed, there is not any useful information for the end user to find out what's going wrong.
The mount failure just looks like this:
# mount -o loop /tmp/219.img2 /mnt/btrfs/ mount: /mnt/btrfs: mount(2) system call failed: File exists. dmesg(1) may have more information after failed mount system call.
While the dmesg contains nothing but the loop device change:
loop1: detected capacity change from 0 to 524288
[CAUSE] In device_list_add() we have a lot of extra checks to reject invalid cases.
That function also contains the regular device scan result like the following prompt:
BTRFS: device fsid 6222333e-f9f1-47e6-b306-55ddd4dcaef4 devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (3027)
But unfortunately not all errors have their own error messages, thus if we hit something wrong in device_add_list(), there may be no error messages at all.
[FIX] Add errors message for all non-ENOMEM errors.
For ENOMEM, I'd say we're in a much worse situation, and there should be some OOM messages way before our call sites.
CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1742e1c9 |
| 23-Nov-2022 |
void0red <void0red@gmail.com> |
btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
Store the error code before freeing the extent_map. Though it's reference counted structure, in that function it's
btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
Store the error code before freeing the extent_map. Though it's reference counted structure, in that function it's the first and last allocation so this would lead to a potential use-after-free.
The error can happen eg. when chunk is stored on a missing device and the degraded mount option is missing.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216721 Reported-by: eriri <1527030098@qq.com> Fixes: adfb69af7d8c ("btrfs: add_missing_dev() should return the actual error") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: void0red <void0red@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.9, v5.15.79 |
|
#
103c1972 |
| 15-Nov-2022 |
Christoph Hellwig <hch@lst.de> |
btrfs: split the bio submission path into a separate file
The code used by btrfs_submit_bio only interacts with the rest of volumes.c through __btrfs_map_block (which itself is a more generic versio
btrfs: split the bio submission path into a separate file
The code used by btrfs_submit_bio only interacts with the rest of volumes.c through __btrfs_map_block (which itself is a more generic version of two exported helpers) and does not really have anything to do with volumes.c. Create a new bio.c file and a bio.h header going along with it for the btrfs_bio-based storage layer, which will grow even more going forward.
Also update the file with my copyright notice given that a large part of the moved code was written or rewritten by me.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4 |
|
#
9f0eac07 |
| 25-Oct-2022 |
Li zeming <zeming@nfschina.com> |
btrfs: allocate btrfs_io_context without GFP_NOFAIL
The __GFP_NOFAIL flag could loop indefinitely when allocation memory in alloc_btrfs_io_context. The callers starting from __btrfs_map_block alread
btrfs: allocate btrfs_io_context without GFP_NOFAIL
The __GFP_NOFAIL flag could loop indefinitely when allocation memory in alloc_btrfs_io_context. The callers starting from __btrfs_map_block already handle errors so it's safe to drop the flag.
Signed-off-by: Li zeming <zeming@nfschina.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|