Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51 |
|
#
f3c3091b |
| 09-Sep-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.49' into for/openbmc/dev-6.6
This is the 6.6.49 stable release
|
Revision tags: v6.6.50, v6.6.49, v6.6.48, v6.6.47 |
|
#
51722b99 |
| 17-Aug-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()
commit 10d9d8c3512f16cad47b2ff81ec6fc4b27d8ee10 upstream.
[BUG] There is an internal report that KASAN is reporting use-a
btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()
commit 10d9d8c3512f16cad47b2ff81ec6fc4b27d8ee10 upstream.
[BUG] There is an internal report that KASAN is reporting use-after-free, with the following backtrace:
BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs] Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45 CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] Call Trace: dump_stack_lvl+0x61/0x80 print_address_description.constprop.0+0x5e/0x2f0 print_report+0x118/0x216 kasan_report+0x11d/0x1f0 btrfs_check_read_bio+0xa68/0xb70 [btrfs] process_one_work+0xce0/0x12a0 worker_thread+0x717/0x1250 kthread+0x2e3/0x3c0 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x11/0x20
Allocated by task 20917: kasan_save_stack+0x37/0x60 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x7d/0x80 kmem_cache_alloc_noprof+0x16e/0x3e0 mempool_alloc_noprof+0x12e/0x310 bio_alloc_bioset+0x3f0/0x7a0 btrfs_bio_alloc+0x2e/0x50 [btrfs] submit_extent_page+0x4d1/0xdb0 [btrfs] btrfs_do_readpage+0x8b4/0x12a0 [btrfs] btrfs_readahead+0x29a/0x430 [btrfs] read_pages+0x1a7/0xc60 page_cache_ra_unbounded+0x2ad/0x560 filemap_get_pages+0x629/0xa20 filemap_read+0x335/0xbf0 vfs_read+0x790/0xcb0 ksys_read+0xfd/0x1d0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53
Freed by task 20917: kasan_save_stack+0x37/0x60 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x50 __kasan_slab_free+0x4b/0x60 kmem_cache_free+0x214/0x5d0 bio_free+0xed/0x180 end_bbio_data_read+0x1cc/0x580 [btrfs] btrfs_submit_chunk+0x98d/0x1880 [btrfs] btrfs_submit_bio+0x33/0x70 [btrfs] submit_one_bio+0xd4/0x130 [btrfs] submit_extent_page+0x3ea/0xdb0 [btrfs] btrfs_do_readpage+0x8b4/0x12a0 [btrfs] btrfs_readahead+0x29a/0x430 [btrfs] read_pages+0x1a7/0xc60 page_cache_ra_unbounded+0x2ad/0x560 filemap_get_pages+0x629/0xa20 filemap_read+0x335/0xbf0 vfs_read+0x790/0xcb0 ksys_read+0xfd/0x1d0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53
[CAUSE] Although I cannot reproduce the error, the report itself is good enough to pin down the cause.
The call trace is the regular endio workqueue context, but the free-by-task trace is showing that during btrfs_submit_chunk() we already hit a critical error, and is calling btrfs_bio_end_io() to error out. And the original endio function called bio_put() to free the whole bio.
This means a double freeing thus causing use-after-free, e.g.:
1. Enter btrfs_submit_bio() with a read bio The read bio length is 128K, crossing two 64K stripes.
2. The first run of btrfs_submit_chunk()
2.1 Call btrfs_map_block(), which returns 64K 2.2 Call btrfs_split_bio() Now there are two bios, one referring to the first 64K, the other referring to the second 64K. 2.3 The first half is submitted.
3. The second run of btrfs_submit_chunk()
3.1 Call btrfs_map_block(), which by somehow failed Now we call btrfs_bio_end_io() to handle the error
3.2 btrfs_bio_end_io() calls the original endio function Which is end_bbio_data_read(), and it calls bio_put() for the original bio.
Now the original bio is freed.
4. The submitted first 64K bio finished Now we call into btrfs_check_read_bio() and tries to advance the bio iter. But since the original bio (thus its iter) is already freed, we trigger the above use-after free.
And even if the memory is not poisoned/corrupted, we will later call the original endio function, causing a double freeing.
[FIX] Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(), which has the extra check on split bios and do the proper refcounting for cloned bios.
Furthermore there is already one extra btrfs_cleanup_bio() call, but that is duplicated to btrfs_orig_bbio_end_io() call, so remove that label completely.
Reported-by: David Sterba <dsterba@suse.com> Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37 |
|
#
57904291 |
| 27-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.36' into dev-6.6
This is the 6.6.36 stable release
|
Revision tags: v6.6.36, v6.6.35, v6.6.34, v6.6.33 |
|
#
082b3d4e |
| 07-Jun-2024 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
[ Upstream commit cebae292e0c32a228e8f2219c270a7237be24a6a ]
Shin'ichiro reported that when he's running fstests' test-case btrfs/1
btrfs: zoned: allocate dummy checksums for zoned NODATASUM writes
[ Upstream commit cebae292e0c32a228e8f2219c270a7237be24a6a ]
Shin'ichiro reported that when he's running fstests' test-case btrfs/167 on emulated zoned devices, he's seeing the following NULL pointer dereference in 'btrfs_zone_finish_endio()':
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000011: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 4 PID: 2332440 Comm: kworker/u80:15 Tainted: G W 6.10.0-rc2-kts+ #4 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs]
RSP: 0018:ffff88867f107a90 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff893e5534 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed1081696028 R10: ffff88840b4b0143 R11: ffff88834dfff600 R12: ffff88840b4b0000 R13: 0000000000020000 R14: 0000000000000000 R15: ffff888530ad5210 FS: 0000000000000000(0000) GS:ffff888e3f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f87223fff38 CR3: 00000007a7c6a002 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? die_addr+0x46/0x70 ? exc_general_protection+0x14f/0x250 ? asm_exc_general_protection+0x26/0x30 ? do_raw_read_unlock+0x44/0x70 ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] btrfs_finish_one_ordered+0x5d9/0x19a0 [btrfs] ? __pfx_lock_release+0x10/0x10 ? do_raw_write_lock+0x90/0x260 ? __pfx_do_raw_write_lock+0x10/0x10 ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs] ? _raw_write_unlock+0x23/0x40 ? btrfs_finish_ordered_zoned+0x5a9/0x850 [btrfs] ? lock_acquire+0x435/0x500 btrfs_work_helper+0x1b1/0xa70 [btrfs] ? __schedule+0x10a8/0x60b0 ? __pfx___might_resched+0x10/0x10 process_one_work+0x862/0x1410 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 ? assign_work+0x16c/0x240 worker_thread+0x5e6/0x1010 ? __pfx_worker_thread+0x10/0x10 kthread+0x2c3/0x3a0 ? trace_irq_enable.constprop.0+0xce/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x31/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>
Enabling CONFIG_BTRFS_ASSERT revealed the following assertion to trigger:
assertion failed: !list_empty(&ordered->list), in fs/btrfs/zoned.c:1815
This indicates, that we're missing the checksums list on the ordered_extent. As btrfs/167 is doing a NOCOW write this is to be expected.
Further analysis with drgn confirmed the assumption:
>>> inode = prog.crashed_thread().stack_trace()[11]['ordered'].inode >>> btrfs_inode = drgn.container_of(inode, "struct btrfs_inode", \ "vfs_inode") >>> print(btrfs_inode.flags) (u32)1
As zoned emulation mode simulates conventional zones on regular devices, we cannot use zone-append for writing. But we're only attaching dummy checksums if we're doing a zone-append write.
So for NOCOW zoned data writes on conventional zones, also attach a dummy checksum.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes") CC: Naohiro Aota <Naohiro.Aota@wdc.com> # 6.6+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1 |
|
#
1ac731c5 |
| 30-Aug-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.6 merge window.
|
Revision tags: v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44 |
|
#
2612e3bb |
| 07-Aug-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo V
Merge drm/drm-next into drm-intel-next
Catching-up with drm-next and drm-intel-gt-next. It will unblock a code refactor around the platform definitions (names vs acronyms).
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
show more ...
|
#
9f771739 |
| 07-Aug-2023 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/1
Merge drm/drm-next into drm-intel-gt-next
Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as a dependency for https://patchwork.freedesktop.org/series/121735/
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v6.1.43, v6.1.42, v6.1.41 |
|
#
61b73694 |
| 24-Jul-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v6.5-rc2.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.1.40, v6.1.39 |
|
#
50501936 |
| 17-Jul-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.4' into next
Sync up with mainline to bring in updates to shared infrastructure.
|
#
0791faeb |
| 17-Jul-2023 |
Mark Brown <broonie@kernel.org> |
ASoC: Merge v6.5-rc2
Get a similar baseline to my other branches, and fixes for people using the branch.
|
#
2f98e686 |
| 11-Jul-2023 |
Maxime Ripard <mripard@kernel.org> |
Merge v6.5-rc1 into drm-misc-fixes
Boris needs 6.5-rc1 in drm-misc-fixes to prevent a conflict.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
Revision tags: v6.1.38, v6.1.37 |
|
#
44f10dbe |
| 30-Jun-2023 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
#
0a30901b |
| 30-Jun-2023 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
Revision tags: v6.1.36 |
|
#
e80b5003 |
| 27-Jun-2023 |
Jiri Kosina <jkosina@suse.cz> |
Merge branch 'for-6.5/apple' into for-linus
- improved support for Keychron K8 keyboard (Lasse Brun)
|
#
5f004bca |
| 27-Jun-2023 |
Jason Gunthorpe <jgg@nvidia.com> |
Merge tag 'v6.4' into rdma.git for-next
Linux 6.4
Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:
drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622
Merge tag 'v6.4' into rdma.git for-next
Linux 6.4
Resolve conflicts between rdma rc and next in rxe_cq matching linux-next:
drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
cc423f63 |
| 26-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "Mainly core changes, refactoring and optimizations.
Performance is imp
Merge tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "Mainly core changes, refactoring and optimizations.
Performance is improved in some areas, overall there may be a cumulative improvement due to refactoring that removed lookups in the IO path or simplified IO submission tracking.
Core:
- submit IO synchronously for fast checksums (crc32c and xxhash), remove high priority worker kthread
- read extent buffer in one go, simplify IO tracking, bio submission and locking
- remove additional tracking of redirtied extent buffers, originally added for zoned mode but actually not needed
- track ordered extent pointer in bio to avoid rbtree lookups during IO
- scrub, use recovered data stripes as cache to avoid unnecessary read
- in zoned mode, optimize logical to physical mappings of extents
- remove PageError handling, not set by VFS nor writeback
- cleanups, refactoring, better structure packing
- lots of error handling improvements
- more assertions, lockdep annotations
- print assertion failure with the exact line where it happens
- tracepoint updates
- more debugging prints
Performance:
- speedup in fsync(), better tracking of inode logged status can avoid transaction commit
- IO path structures track logical offsets in data structures and does not need to look it up
User visible changes:
- don't commit transaction for every created subvolume, this can reduce time when many subvolumes are created in a batch
- print affected files when relocation fails
- trigger orphan file cleanup during START_SYNC ioctl
Notable fixes:
- fix crash when disabling quota and relocation
- fix crashes when removing roots from drity list
- fix transacion abort during relocation when converting from newer profiles not covered by fallback
- in zoned mode, stop reclaiming block groups if filesystem becomes read-only
- fix rare race condition in tree mod log rewind that can miss some btree node slots
- with enabled fsverity, drop up-to-date page bit in case the verification fails"
* tag 'for-6.5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (194 commits) btrfs: fix race between quota disable and relocation btrfs: add comment to struct btrfs_fs_info::dirty_cowonly_roots btrfs: fix race when deleting free space root from the dirty cow roots list btrfs: fix race when deleting quota root from the dirty cow roots list btrfs: tracepoints: also show actual number of the outstanding extents btrfs: update i_version in update_dev_time btrfs: make btrfs_compressed_bioset static btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile btrfs: scrub: remove btrfs_fs_info::scrub_wr_completion_workers btrfs: scrub: remove scrub_ctx::csum_list member btrfs: do not BUG_ON after failure to migrate space during truncation btrfs: do not BUG_ON on failure to get dir index for new snapshot btrfs: send: do not BUG_ON() on unexpected symlink data extent btrfs: do not BUG_ON() when dropping inode items from log root btrfs: replace BUG_ON() at split_item() with proper error handling btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() btrfs: abort transaction at update_ref_for_cow() when ref count is zero ...
show more ...
|
Revision tags: v6.4, v6.1.35 |
|
#
de8a334f |
| 19-Jun-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457d4 ("drm/amdgpu: support partition drm devices"), which is required to fix commit 0adec22702d4 ("drm: Rem
Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457d4 ("drm/amdgpu: support partition drm devices"), which is required to fix commit 0adec22702d4 ("drm: Remove struct drm_driver.gem_prime_mmap").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
show more ...
|
Revision tags: v6.1.34, v6.1.33, v6.1.32 |
|
#
ec63b84d |
| 31-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: add an ordered_extent pointer to struct btrfs_bio
Add a pointer to the ordered_extent to the existing union in struct btrfs_bio, so all code dealing with data write bios can just use a pointe
btrfs: add an ordered_extent pointer to struct btrfs_bio
Add a pointer to the ordered_extent to the existing union in struct btrfs_bio, so all code dealing with data write bios can just use a pointer dereference to retrieve the ordered_extent instead of doing multiple rbtree lookups per I/O.
The reference to this ordered_extent is dropped at end I/O time, which implies that an extra one must be acquired when the bio is split. This also requires moving the btrfs_extract_ordered_extent call into btrfs_split_bio so that the invariant of always having a valid ordered_extent reference for the btrfs_bio is kept.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
fbe96087 |
| 31-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: add a is_data_bbio helper
Add a helper to check for that a btrfs_bio has a valid inode, and that it is a data inode to key off all the special handling for data path checksumming. Note that
btrfs: add a is_data_bbio helper
Add a helper to check for that a btrfs_bio has a valid inode, and that it is a data inode to key off all the special handling for data path checksumming. Note that this uses is_data_inode instead of REQ_META as REQ_META is only set directly before submission in submit_one_bio and we'll also want to use this helper for error handling where REQ_META isn't set yet.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
a39da514 |
| 31-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: limit write bios to a single ordered extent
Currently buffered writeback bios are allowed to span multiple ordered_extents, although that basically never actually happens since commit 4a445b7
btrfs: limit write bios to a single ordered extent
Currently buffered writeback bios are allowed to span multiple ordered_extents, although that basically never actually happens since commit 4a445b7b6178 ("btrfs: don't merge pages into bio if their page offset is not contiguous").
Supporting bios than span ordered_extents complicates the file checksumming code, and prevents us from adding an ordered_extent pointer to the btrfs_bio structure. Use the existing code to limit a bio to single ordered_extent for zoned device writes for all writes.
This allows to remove the REQ_BTRFS_ONE_ORDERED flags, and the handling of multiple ordered_extents in btrfs_csum_one_bio.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
c731cd0b |
| 31-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split
If a bio gets split, it needs to have a proper file_offset for checksum validation and repair to work properly.
Based on feedbac
btrfs: fix file_offset for REQ_BTRFS_ONE_ORDERED bios that get split
If a bio gets split, it needs to have a proper file_offset for checksum validation and repair to work properly.
Based on feedback from Josef, commit 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") skipped this adjustment for ONE_ORDERED bios. But if we actually ever need to split a ONE_ORDERED read bio, this will lead to a wrong file offset in the repair code. Right now the only user of the file_offset is logging of an error message so this is mostly harmless, but the wrong offset might be more problematic for additional users in the future.
Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
cd4efd21 |
| 30-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: rename __btrfs_map_block to btrfs_map_block
Now that the old btrfs_map_block is gone, drop the leading underscores from __btrfs_map_block.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by:
btrfs: rename __btrfs_map_block to btrfs_map_block
Now that the old btrfs_map_block is gone, drop the leading underscores from __btrfs_map_block.
Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
Revision tags: v6.1.31, v6.1.30 |
|
#
71df088c |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: defer splitting of ordered extents until I/O completion
The btrfs zoned completion code currently needs an ordered_extent and extent_map per bio so that it can account for the non-predictable
btrfs: defer splitting of ordered extents until I/O completion
The btrfs zoned completion code currently needs an ordered_extent and extent_map per bio so that it can account for the non-predictable write location from Zone Append. To archive that it currently splits the ordered_extent and extent_map at I/O submission time, and then records the actual physical address in the ->physical field of the ordered_extent.
This patch instead switches to record the "original" physical address that the btrfs allocator assigned in spare space in the btrfs_bio, and then rewrites the logical address in the btrfs_ordered_sum structure at I/O completion time. This allows the ordered extent completion handler to simply walk the list of ordered csums and split the ordered extent as needed. This removes an extra ordered extent and extent_map lookup and manipulation during the I/O submission path, and instead batches it in the I/O completion path where we need to touch these anyway.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
3887653c |
| 09-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: record orig_physical only for the original bio
btrfs_submit_dev_bio is also called for clone bios that aren't embedded into a btrfs_bio structure, but previous commit "btrfs: optimize the log
btrfs: record orig_physical only for the original bio
btrfs_submit_dev_bio is also called for clone bios that aren't embedded into a btrfs_bio structure, but previous commit "btrfs: optimize the logical to physical mapping for zoned writes" added code to assign btrfs_bio.orig_physical in it.
This is harmless right now as only the single data profile can be used on zoned devices, but will blow up when the RAID stripe tree is added. Move it out into the single I/O specific branch in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
cbfce4c7 |
| 24-May-2023 |
Christoph Hellwig <hch@lst.de> |
btrfs: optimize the logical to physical mapping for zoned writes
The current code to store the final logical to physical mapping for a zone append write in the extent tree is rather inefficient. It
btrfs: optimize the logical to physical mapping for zoned writes
The current code to store the final logical to physical mapping for a zone append write in the extent tree is rather inefficient. It first has to split the ordered extent so that there is one ordered extent per bio, so that it can look up the ordered extent on I/O completion in btrfs_record_physical_zoned and store the physical LBA returned by the block driver in the ordered extent.
btrfs_rewrite_logical_zoned then has to do a lookup in the chunk tree to see what physical address the logical address for this bio / ordered extent is mapped to, and then rewrite it in the extent tree.
To optimize this process, we can store the physical address assigned in the chunk tree to the original logical address and a pointer to btrfs_ordered_sum structure the in the btrfs_bio structure, and then use this information to rewrite the logical address in the btrfs_ordered_sum structure directly at I/O completion time in btrfs_record_physical_zoned. btrfs_rewrite_logical_zoned then simply updates the logical address in the extent tree and the ordered_extent itself.
The code in btrfs_rewrite_logical_zoned now runs for all data I/O completions in zoned file systems, which is fine as there is no remapping to do for non-append writes to conventional zones or for relocation, and the overhead for quickly breaking out of the loop is very low.
Because zoned file systems now need the ordered_sums structure to record the actual write location returned by zone append, allocate dummy structures without the csum array for them when the I/O doesn't use checksums, and free them when completing the ordered_extent.
Note that the btrfs_bio doesn't grow as the new field are places into a union that is so far not used for data writes and has plenty of space left in it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|