Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9 |
|
#
6f74989f |
| 21-Dec-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: fix lock ordering in btrfs_zone_activate()
commit b18f3b60b35a8c01c9a2a0f0d6424c6d73971dc3 upstream.
The btrfs CI reported a lockdep warning as follows by running generic generic/129.
btrfs: zoned: fix lock ordering in btrfs_zone_activate()
commit b18f3b60b35a8c01c9a2a0f0d6424c6d73971dc3 upstream.
The btrfs CI reported a lockdep warning as follows by running generic generic/129.
WARNING: possible circular locking dependency detected 6.7.0-rc5+ #1 Not tainted ------------------------------------------------------ kworker/u5:5/793427 is trying to acquire lock: ffff88813256d028 (&cache->lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x5e/0x130 but task is already holding lock: ffff88810a23a318 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x34/0x130 which lock already depends on the new lock.
the existing dependency chain (in reverse order) is: -> #1 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}: ... -> #0 (&cache->lock){+.+.}-{2:2}: ...
This is because we take fs_info->zone_active_bgs_lock after a block_group's lock in btrfs_zone_activate() while doing the opposite in other places.
Fix the issue by expanding the fs_info->zone_active_bgs_lock's critical section and taking it before a block_group's lock.
Fixes: a7e1ac7bdc5a ("btrfs: zoned: reserve zones for an active metadata/system block group") CC: stable@vger.kernel.org # 6.6 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
c02d35d8 |
| 18-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: skip splitting and logical rewriting on pre-alloc write
When doing a relocation, there is a chance that at the time of btrfs_reloc_clone_csums(), there is no checksum for the correspon
btrfs: zoned: skip splitting and logical rewriting on pre-alloc write
When doing a relocation, there is a chance that at the time of btrfs_reloc_clone_csums(), there is no checksum for the corresponding region.
In this case, btrfs_finish_ordered_zoned()'s sum points to an invalid item and so ordered_extent's logical is set to some invalid value. Then, btrfs_lookup_block_group() in btrfs_zone_finish_endio() failed to find a block group and will hit an assert or a null pointer dereference as following.
This can be reprodcued by running btrfs/028 several times (e.g, 4 to 16 times) with a null_blk setup. The device's zone size and capacity is set to 32 MB and the storage size is set to 5 GB on my setup.
KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 6 PID: 3105720 Comm: kworker/u16:13 Tainted: G W 6.5.0-rc6-kts+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] Code: 41 54 49 89 fc 55 48 89 f5 53 e8 57 7d fc ff 48 8d b8 88 00 00 00 48 89 c3 48 b8 00 00 00 00 00 > 3c 02 00 0f 85 02 01 00 00 f6 83 88 00 00 00 01 0f 84 a8 00 00 RSP: 0018:ffff88833cf87b08 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102877b827 R10: ffff888143bdc13b R11: ffff888125b1cbc0 R12: ffff888143bdc000 R13: 0000000000007000 R14: ffff888125b1cba8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88881e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3ed85223d5 CR3: 00000001519b4005 CR4: 00000000001706e0 Call Trace: <TASK> ? die_addr+0x3c/0xa0 ? exc_general_protection+0x148/0x220 ? asm_exc_general_protection+0x22/0x30 ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] ? btrfs_zone_finish_endio.part.0+0x19/0x160 [btrfs] btrfs_finish_one_ordered+0x7b8/0x1de0 [btrfs] ? rcu_is_watching+0x11/0xb0 ? lock_release+0x47a/0x620 ? btrfs_finish_ordered_zoned+0x59b/0x800 [btrfs] ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs] ? btrfs_finish_ordered_zoned+0x358/0x800 [btrfs] ? __smp_call_single_queue+0x124/0x350 ? rcu_is_watching+0x11/0xb0 btrfs_work_helper+0x19f/0xc60 [btrfs] ? __pfx_try_to_wake_up+0x10/0x10 ? _raw_spin_unlock_irq+0x24/0x50 ? rcu_is_watching+0x11/0xb0 process_one_work+0x8c1/0x1430 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0x52/0x60 worker_thread+0x100/0x12c0 ? __kthread_parkme+0xc1/0x1f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x2ea/0x3c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
On the zoned mode, writing to pre-allocated region means data relocation write. Such write always uses WRITE command so there is no need of splitting and rewriting logical address. Thus, we can just skip the function for the case.
Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40 |
|
#
332581bd |
| 21-Jul-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: do not zone finish data relocation block group
When multiple writes happen at once, we may need to sacrifice a currently active block group to be zone finished for a new allocation. We
btrfs: zoned: do not zone finish data relocation block group
When multiple writes happen at once, we may need to sacrifice a currently active block group to be zone finished for a new allocation. We choose a block group with the least free space left, and zone finish it.
To do the finishing, we need to send IOs for already allocated region and wait for them and on-going IOs. Otherwise, these IOs fail because the zone is already finished at the time the IO reach a device.
However, if a block group dedicated to the data relocation is zone finished, there is a chance that finishing it before an ongoing write IO reaches the device. That is because there is timing gap between an allocation is done (block_group->reservations == 0, as pre-allocation is done) and an ordered extent is created when the relocation IO starts. Thus, if we finish the zone between them, we can fail the IOs.
We cannot simply use "fs_info->data_reloc_bg == block_group->start" to avoid the zone finishing. Because, the data_reloc_bg may already switch to a new block group, while there are still ongoing write IOs to the old data_reloc_bg.
So, this patch reworks the BLOCK_GROUP_FLAG_ZONED_DATA_RELOC bit to indicate there is a data relocation allocation and/or ongoing write to the block group. The bit is set on allocation and cleared in end_io function of the last IO for the currently allocated region.
To change the timing of the bit setting also solves the issue that the bit being left even after there is no IO going on. With the current code, if the data_reloc_bg switches after the last IO to the current data_reloc_bg, the bit is set at this timing and there is no one clearing that bit. As a result, that block group is kept unallocatable for anything.
Fixes: 343d8a30851c ("btrfs: zoned: prevent allocation from previous data relocation BG") Fixes: 74e91b12b115 ("btrfs: zoned: zone finish unused block group") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
6a8ebc77 |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: no longer count fresh BG region as zone unusable
Now that we switched to write time activation, we no longer need to (and must not) count the fresh region as zone unusable. This commit
btrfs: zoned: no longer count fresh BG region as zone unusable
Now that we switched to write time activation, we no longer need to (and must not) count the fresh region as zone unusable. This commit is similar to revert of commit fa2068d7e922b434eb ("btrfs: zoned: count fresh BG region as zone unusable").
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
13bb483d |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation time to ensure that all reserved bytes can be written to an active
btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation time to ensure that all reserved bytes can be written to an active metadata block group. However, this approach has proven to be less efficient, as it activates block groups more frequently than necessary, putting pressure on the active zone resource and leading to potential issues such as early ENOSPC or hung_task.
Another drawback of the current method is that it hampers metadata over-commit, and necessitates additional flush operations and block group allocations, resulting in decreased overall performance.
To address these issues, this commit introduces a write-time activation of metadata and system block group. This involves reserving at least one active block group specifically for a metadata and system block group.
Since metadata write-out is always allocated sequentially, when we need to write to a non-active block group, we can wait for the ongoing IOs to complete, activate a new block group, and then proceed with writing to the new block group.
Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a7e1ac7b |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: reserve zones for an active metadata/system block group
Ensure a metadata and system block group can be activated on write time, by leaving a certain number of active zones when trying
btrfs: zoned: reserve zones for an active metadata/system block group
Ensure a metadata and system block group can be activated on write time, by leaving a certain number of active zones when trying to activate a data block group.
Zones for two metadata block groups (normal and tree-log) and one system block group are reserved, according to the profile type: two zones per block group on the DUP profile and one zone per block group otherwise.
The reservation must be freed once a non-data block group is allocated. If not, we over-reserve the active zones and data block group activation will suffer. For the dynamic reservation count, we need to manage the reservation count per device.
The reservation count variable is protected by fs_info->zone_active_bgs_lock.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c1c3c2bc |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: update meta write pointer on zone finish
On finishing a zone, the meta_write_pointer should be set of the end of the zone to reflect the actual write pointer position.
Reviewed-by: Ch
btrfs: zoned: update meta write pointer on zone finish
On finishing a zone, the meta_write_pointer should be set of the end of the zone to reflect the actual write pointer position.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
0356ad41 |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: defer advancing meta write pointer
We currently advance the meta_write_pointer in btrfs_check_meta_write_pointer(). That makes it necessary to revert it when locking the buffer failed.
btrfs: zoned: defer advancing meta write pointer
We currently advance the meta_write_pointer in btrfs_check_meta_write_pointer(). That makes it necessary to revert it when locking the buffer failed. Instead, we can advance it just before sending the buffer.
Also, this is necessary for the following commit. In the commit, it needs to release the zoned_meta_io_lock to allow IOs to come in and wait for them to fill the currently active block group. If we advance the meta_write_pointer before locking the extent buffer, the following extent buffer can pass the meta_write_pointer check, resulting in an unaligned write failure.
Advancing the pointer is still thread-safe as the extent buffer is locked.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2ad8c051 |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: return int from btrfs_check_meta_write_pointer
Now that we have writeback_control passed to btrfs_check_meta_write_pointer(), we can move the wbc condition in submit_eb_page() to btrfs
btrfs: zoned: return int from btrfs_check_meta_write_pointer
Now that we have writeback_control passed to btrfs_check_meta_write_pointer(), we can move the wbc condition in submit_eb_page() to btrfs_check_meta_write_pointer() and return int.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
7db94301 |
| 07-Aug-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: introduce block group context to btrfs_eb_write_context
For metadata write out on the zoned mode, we call btrfs_check_meta_write_pointer() to check if an extent buffer to be written is
btrfs: zoned: introduce block group context to btrfs_eb_write_context
For metadata write out on the zoned mode, we call btrfs_check_meta_write_pointer() to check if an extent buffer to be written is aligned to the write pointer.
We look up a block group containing the extent buffer for every extent buffer, which takes unnecessary effort as the writing extent buffers are mostly contiguous.
Introduce "zoned_bg" to cache the block group working on. Also, while at it, rename "cache" to "block_group".
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4 |
|
#
07a3bb95 |
| 23-Jun-2023 |
Julia Lawall <Julia.Lawall@inria.fr> |
btrfs: zoned: use vcalloc instead of for vzalloc in btrfs_get_dev_zone_info
Use vcalloc that checks potential multiplication overflows. The changes were done using Coccinelle semantic patch.
Revie
btrfs: zoned: use vcalloc instead of for vzalloc in btrfs_get_dev_zone_info
Use vcalloc that checks potential multiplication overflows. The changes were done using Coccinelle semantic patch.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
95ca6599 |
| 29-Jun-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: do not enable async discard
The zoned mode need to reset a zone before using it. We rely on btrfs's original discard functionality (discarding unused block group range) to do the reset
btrfs: zoned: do not enable async discard
The zoned mode need to reset a zone before using it. We rely on btrfs's original discard functionality (discarding unused block group range) to do the resetting.
While the commit 63a7cb130718 ("btrfs: auto enable discard=async when possible") made the discard done in an async manner, a zoned reset do not need to be async, as it is fast enough.
Even worth, delaying zone rests prevents using those zones again. So, let's disable async discard on the zoned mode.
Fixes: 63a7cb130718 ("btrfs: auto enable discard=async when possible") CC: stable@vger.kernel.org # 6.3+ Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update message text ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
723b8bb1 |
| 30-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: open code btrfs_map_sblock
btrfs_map_sblock just hard codes three arguments and calls btrfs_map_sblock. Remove it as it doesn't provide any real value, but makes following the btrfs_map_bloc
btrfs: open code btrfs_map_sblock
btrfs_map_sblock just hard codes three arguments and calls btrfs_map_sblock. Remove it as it doesn't provide any real value, but makes following the btrfs_map_block call chains harder.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.31, v6.1.30 |
|
#
f000bc6f |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: pass the new logical address to split_extent_map
split_extent_map splits off the first chunk of an extent map into a new one. One of the two users is the zoned I/O completion code that wants
btrfs: pass the new logical address to split_extent_map
split_extent_map splits off the first chunk of an extent map into a new one. One of the two users is the zoned I/O completion code that wants to rewrite the logical block start address right after this split. Pass in the logical address to be set in the split off first extent_map as an argument to avoid an extra extent tree lookup for this case.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
71df088c |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: defer splitting of ordered extents until I/O completion
The btrfs zoned completion code currently needs an ordered_extent and extent_map per bio so that it can account for the non-predictable
btrfs: defer splitting of ordered extents until I/O completion
The btrfs zoned completion code currently needs an ordered_extent and extent_map per bio so that it can account for the non-predictable write location from Zone Append. To archive that it currently splits the ordered_extent and extent_map at I/O submission time, and then records the actual physical address in the ->physical field of the ordered_extent.
This patch instead switches to record the "original" physical address that the btrfs allocator assigned in spare space in the btrfs_bio, and then rewrites the logical address in the btrfs_ordered_sum structure at I/O completion time. This allows the ordered extent completion handler to simply walk the list of ordered csums and split the ordered extent as needed. This removes an extra ordered extent and extent_map lookup and manipulation during the I/O submission path, and instead batches it in the I/O completion path where we need to touch these anyway.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
cbfce4c7 |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: optimize the logical to physical mapping for zoned writes
The current code to store the final logical to physical mapping for a zone append write in the extent tree is rather inefficient. It
btrfs: optimize the logical to physical mapping for zoned writes
The current code to store the final logical to physical mapping for a zone append write in the extent tree is rather inefficient. It first has to split the ordered extent so that there is one ordered extent per bio, so that it can look up the ordered extent on I/O completion in btrfs_record_physical_zoned and store the physical LBA returned by the block driver in the ordered extent.
btrfs_rewrite_logical_zoned then has to do a lookup in the chunk tree to see what physical address the logical address for this bio / ordered extent is mapped to, and then rewrite it in the extent tree.
To optimize this process, we can store the physical address assigned in the chunk tree to the original logical address and a pointer to btrfs_ordered_sum structure the in the btrfs_bio structure, and then use this information to rewrite the logical address in the btrfs_ordered_sum structure directly at I/O completion time in btrfs_record_physical_zoned. btrfs_rewrite_logical_zoned then simply updates the logical address in the extent tree and the ordered_extent itself.
The code in btrfs_rewrite_logical_zoned now runs for all data I/O completions in zoned file systems, which is fine as there is no remapping to do for non-append writes to conventional zones or for relocation, and the overhead for quickly breaking out of the loop is very low.
Because zoned file systems now need the ordered_sums structure to record the actual write location returned by zone append, allocate dummy structures without the csum array for them when the I/O doesn't use checksums, and free them when completing the ordered_extent.
Note that the btrfs_bio doesn't grow as the new field are places into a union that is so far not used for data writes and has plenty of space left in it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
5cfe76f8 |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical
btrfs_ordered_sum::bytendr stores a logical address. Make that clear by renaming it to ->logical.
Reviewed-by: Johannes Thumsh
btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical
btrfs_ordered_sum::bytendr stores a logical address. Make that clear by renaming it to ->logical.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
1d126800 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: drop gfp from parameter extent state helpers
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This
btrfs: drop gfp from parameter extent state helpers
Now that all extent state bit helpers effectively take the GFP_NOFS mask (and GFP_NOWAIT is encoded in the bits) we can remove the parameter. This reduces stack consumption in many functions and simplifies a lot of code.
Net effect on module on a release build:
text data bss dec hex filename 1250432 20985 16088 1287505 13a551 pre/btrfs.ko 1247074 20985 16088 1284147 139833 post/btrfs.ko
DELTA: -3358
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
62bc6047 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: pass NOWAIT for set/clear extent bits as another bit
The only flags we now pass to set_extent_bit/__clear_extent_bit are GFP_NOFS and GFP_NOWAIT (a few functions handling mappings). This requ
btrfs: pass NOWAIT for set/clear extent bits as another bit
The only flags we now pass to set_extent_bit/__clear_extent_bit are GFP_NOFS and GFP_NOWAIT (a few functions handling mappings). This requires an extra parameter to be passed everywhere but is almost always the same.
Encode the GFP_NOWAIT as an artificial extent bit and extract the real bits and gfp mask in the lowest level helpers. Now the passed gfp mask is not actually used and can be removed.
Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
e85de967 |
| 24-May-2023 |
David Sterba <dsterba@suse.com> |
btrfs: open code set_extent_bits_nowait
The helper only passes GFP_NOWAIT as gfp flags and is used two times.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signe
btrfs: open code set_extent_bits_nowait
The helper only passes GFP_NOWAIT as gfp flags and is used two times.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.29, v6.1.28 |
|
#
f880fe6e |
| 08-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: don't hold an extra reference for redirtied buffers
When btrfs_redirty_list_add redirties a buffer, it also acquires an extra reference that is released on transaction commit. But this is no
btrfs: don't hold an extra reference for redirtied buffers
When btrfs_redirty_list_add redirties a buffer, it also acquires an extra reference that is released on transaction commit. But this is not required as buffers that are dirty or under writeback are never freed (look for calls to extent_buffer_under_io())).
Remove the extra reference and the infrastructure used to drop it again.
History behind redirty logic:
In the first place, it used releasing_list to hold all the to-be-released extent buffers, and decided which buffers to re-dirty at the commit time. Then, in a later version, the behaviour got changed to re-dirty a necessary buffer and add re-dirtied one to the list in btrfs_free_tree_block(). In short, the list was there mostly for the patch series' historical reason.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [ add Naohiro's comment regarding history ] Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.27, v6.1.26 |
|
#
b5345d6c |
| 25-Apr-2023 |
Naohiro Aota <naota@elisp.net> |
btrfs: export bitmap_test_range_all_{set,zero}
bitmap_test_range_all_{set,zero} defined in subpage.c are useful for other components. Move them to misc.h and use them in zoned.c. Also, as find_next{
btrfs: export bitmap_test_range_all_{set,zero}
bitmap_test_range_all_{set,zero} defined in subpage.c are useful for other components. Move them to misc.h and use them in zoned.c. Also, as find_next{,_zero}_bit take/return "unsigned long" instead of "unsigned int", convert the type to "unsigned long".
While at it, also rewrite the "if (...) return true; else return false;" pattern and add const to the input bitmap.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c83b56d1 |
| 08-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs_redirty_list_add zeroes the buffer data and sets the EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus
btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs_redirty_list_add zeroes the buffer data and sets the EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus header. But it does that after already marking the buffer dirty, which means that writeback could already be looking at the buffer.
Switch the order of operations around so that the buffer is only marked dirty when we're ready to write it.
Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
02ca9e6f |
| 09-May-2023 |
Naohiro Aota <Naohiro.Aota@wdc.com> |
btrfs: zoned: fix full zone super block reading on ZNS
When both of the superblock zones are full, we need to check which superblock is newer. The calculation of last superblock position is wrong as
btrfs: zoned: fix full zone super block reading on ZNS
When both of the superblock zones are full, we need to check which superblock is newer. The calculation of last superblock position is wrong as it does not consider zone_capacity and uses the length.
Fixes: 9658b72ef300 ("btrfs: zoned: locate superblock position using zone capacity") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.3, v6.1.25 |
|
#
631003e2 |
| 18-Apr-2023 |
Naohiro Aota <naohiro.aota@wdc.com> |
btrfs: zoned: fix wrong use of bitops API in btrfs_ensure_empty_zones
find_next_bit and find_next_zero_bit take @size as the second parameter and @offset as the third parameter. They are specified o
btrfs: zoned: fix wrong use of bitops API in btrfs_ensure_empty_zones
find_next_bit and find_next_zero_bit take @size as the second parameter and @offset as the third parameter. They are specified opposite in btrfs_ensure_empty_zones(). Thanks to the later loop, it never failed to detect the empty zones. Fix them and (maybe) return the result a bit faster.
Note: the naming is a bit confusing, size has two meanings here, bitmap and our range size.
Fixes: 1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|