Revision tags: v6.6.25, v6.6.24 |
|
#
7bc086bb |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix an off-by-one error in xreap_agextent_binval
commit c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 upstream.
Overall, this function tries to find and invalidate all buffers for a given extent of
xfs: fix an off-by-one error in xreap_agextent_binval
commit c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 upstream.
Overall, this function tries to find and invalidate all buffers for a given extent of space on the data device. The inner for loop in this function tries to find all xfs_bufs for a given daddr. The lengths of all possible cached buffers range from 1 fsblock to the largest needed to contain a 64k xattr value (~17fsb). The scan is capped to avoid looking at anything buffer going past the given extent.
Unfortunately, the loop continuation test is wrong -- max_fsbs is the largest size we want to scan, not one past that. Put another way, this loop is actually 1-indexed, not 0-indexed. Therefore, the continuation test should use <=, not <.
As a result, online repairs of btree blocks fails to stale any buffers for btrees that are being torn down, which causes later assertions in the buffer cache when another thread creates a different-sized buffer. This happens in xfs/709 when allocating an inode cluster buffer:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 3346128 at fs/xfs/xfs_message.c:104 assfail+0x3a/0x40 [xfs] CPU: 0 PID: 3346128 Comm: fsstress Not tainted 6.7.0-rc4-djwx #rc4 RIP: 0010:assfail+0x3a/0x40 [xfs] Call Trace: <TASK> _xfs_buf_obj_cmp+0x4a/0x50 xfs_buf_get_map+0x191/0xba0 xfs_trans_get_buf_map+0x136/0x280 xfs_ialloc_inode_init+0x186/0x340 xfs_ialloc_ag_alloc+0x254/0x720 xfs_dialloc+0x21f/0x870 xfs_create_tmpfile+0x1a9/0x2f0 xfs_rename+0x369/0xfd0 xfs_vn_rename+0xfa/0x170 vfs_rename+0x5fb/0xc30 do_renameat2+0x52d/0x6e0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x3b/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e
A later refactoring patch in the online repair series fixed this by accident, which is why I didn't notice this until I started testing only the patches that are likely to end up in 6.8.
Fixes: 1c7ce115e521 ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24 |
|
#
7bc086bb |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix an off-by-one error in xreap_agextent_binval
commit c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 upstream.
Overall, this function tries to find and invalidate all buffers for a given extent of
xfs: fix an off-by-one error in xreap_agextent_binval
commit c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 upstream.
Overall, this function tries to find and invalidate all buffers for a given extent of space on the data device. The inner for loop in this function tries to find all xfs_bufs for a given daddr. The lengths of all possible cached buffers range from 1 fsblock to the largest needed to contain a 64k xattr value (~17fsb). The scan is capped to avoid looking at anything buffer going past the given extent.
Unfortunately, the loop continuation test is wrong -- max_fsbs is the largest size we want to scan, not one past that. Put another way, this loop is actually 1-indexed, not 0-indexed. Therefore, the continuation test should use <=, not <.
As a result, online repairs of btree blocks fails to stale any buffers for btrees that are being torn down, which causes later assertions in the buffer cache when another thread creates a different-sized buffer. This happens in xfs/709 when allocating an inode cluster buffer:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 3346128 at fs/xfs/xfs_message.c:104 assfail+0x3a/0x40 [xfs] CPU: 0 PID: 3346128 Comm: fsstress Not tainted 6.7.0-rc4-djwx #rc4 RIP: 0010:assfail+0x3a/0x40 [xfs] Call Trace: <TASK> _xfs_buf_obj_cmp+0x4a/0x50 xfs_buf_get_map+0x191/0xba0 xfs_trans_get_buf_map+0x136/0x280 xfs_ialloc_inode_init+0x186/0x340 xfs_ialloc_ag_alloc+0x254/0x720 xfs_dialloc+0x21f/0x870 xfs_create_tmpfile+0x1a9/0x2f0 xfs_rename+0x369/0xfd0 xfs_vn_rename+0xfa/0x170 vfs_rename+0x5fb/0xc30 do_renameat2+0x52d/0x6e0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x3b/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e
A later refactoring patch in the online repair series fixed this by accident, which is why I didn't notice this until I started testing only the patches that are likely to end up in 6.8.
Fixes: 1c7ce115e521 ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45 |
|
#
014ad537 |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair
The AGFL repair code uses a series of bitmaps to figure out where there are OWN_AG blocks that are not claimed by the free spa
xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair
The AGFL repair code uses a series of bitmaps to figure out where there are OWN_AG blocks that are not claimed by the free space and rmap btrees. These blocks become the new AGFL, and any overflow is reaped. The bitmaps current track xfs_fsblock_t even though we already know the AG number.
In the last patch, we introduced a new bitmap "type" for tracking xfs_agblock_t extents. Port the reaping code and the AGFL repair to use this new type, which makes it very obvious what we're tracking. This also eliminates a bunch of unnecessary agblock <-> fsblock conversions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
1c7ce115 |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: reap large AG metadata extents when possible
When we're freeing extents that have been set in a bitmap, break the bitmap extent into multiple sub-extents organized by fate, and reap the extents
xfs: reap large AG metadata extents when possible
When we're freeing extents that have been set in a bitmap, break the bitmap extent into multiple sub-extents organized by fate, and reap the extents. This enables us to dispose of old resources more efficiently than doing them block by block.
While we're at it, rename the reaping functions to make it clear that they're reaping per-AG extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
9ed851f6 |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow scanning ranges of the buffer cache for live buffers
After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possib
xfs: allow scanning ranges of the buffer cache for live buffers
After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possible that parts of a tree that were previously cached in memory are no longer accessible due to media failure or other corruption on interior nodes, so repair figures out the old blocks from the reverse mapping data and scans the buffer cache directly.
In other words, online fsck needs to find all the live (i.e. non-stale) buffers for a range of fsblocks so that it can invalidate them.
Unfortunately, the current buffer cache code triggers asserts if the rhashtable lookup finds a non-stale buffer of a different length than the key we searched for. For regular operation this is desirable, but for this repair procedure, we don't care since we're going to forcibly stale the buffer anyway. Add an internal lookup flag to avoid the assert. Skip buffers that are already XBF_STALE.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
77a1396f |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rearrange xrep_reap_block to make future code flow easier
Rearrange the logic inside xrep_reap_block to make it more obvious that crosslinked metadata blocks are handled differently. Add a cou
xfs: rearrange xrep_reap_block to make future code flow easier
Rearrange the logic inside xrep_reap_block to make it more obvious that crosslinked metadata blocks are handled differently. Add a couple of tracepoints so that we can tell what's going on at the end of a btree rebuild operation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
5fee784e |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use deferred frees to reap old btree blocks
Use deferred frees (EFIs) to reap the blocks of a btree that we just replaced. This helps us to shrink the window in which those old blocks could be
xfs: use deferred frees to reap old btree blocks
Use deferred frees (EFIs) to reap the blocks of a btree that we just replaced. This helps us to shrink the window in which those old blocks could be lost due to a system crash, though we try to flush the EFIs every few hundred blocks so that we don't also overflow the transaction reservations during and after we commit the new btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
a55e0730 |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: only allow reaping of per-AG blocks in xrep_reap_extents
Now that we've refactored btree cursors to require the caller to pass in a perag structure, there are numerous problems in xrep_reap_ext
xfs: only allow reaping of per-AG blocks in xrep_reap_extents
Now that we've refactored btree cursors to require the caller to pass in a perag structure, there are numerous problems in xrep_reap_extents if it's being called to reap extents for an inode metadata repair. We don't have any repair functions that can do that, so drop the support for now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
8e54e06b |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: only invalidate blocks if we're going to free them
When we're discarding old btree blocks after a repair, only invalidate the buffers for the ones that we're freeing -- if the metadata was cros
xfs: only invalidate blocks if we're going to free them
When we're discarding old btree blocks after a repair, only invalidate the buffers for the ones that we're freeing -- if the metadata was crosslinked with another data structure, we don't want to touch it.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
e06ef14b |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move the post-repair block reaping code to a separate file
Reaping blocks after a repair is a complicated affair involving a lot of rmap btree lookups and figuring out if we're going to unmap o
xfs: move the post-repair block reaping code to a separate file
Reaping blocks after a repair is a complicated affair involving a lot of rmap btree lookups and figuring out if we're going to unmap or free old metadata blocks that might be crosslinked. Eventually, we will need to be able to reap per-AG metadata blocks, bmbt blocks from inode forks, garbage CoW staging extents, and (even later) blocks from btrees rooted in inodes. This results in a lot of reaping code, so we might as well split that off while it's easy.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|