Revision tags: v6.6.25, v6.6.24 |
|
#
1a48327c |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: force all buffers to be written during btree bulk load
commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 upstream.
While stress-testing online repair of btrees, I noticed periodic assertion fail
xfs: force all buffers to be written during btree bulk load
commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 upstream.
While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffers with incorrect DELWRI_Q state. Looking further, I observed this race between the AIL trying to write out a btree block and repair zapping a btree block after the fact:
AIL: Repair0:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
delwri_submit # oops
Worse yet, I discovered that running the same repair over and over in a tight loop can result in a second race that cause data integrity problems with the repair:
AIL: Repair0: Repair1:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
find free space X get buffer rewrite buffer delwri_queue: set DELWRI_Q already on a list, do not add commit
BAD: committed tree root before all blocks written
delwri_submit # too late now
I traced this to my own misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. Marking the buffer stale clears DELWRI_Q from the buffer state without removing the buffer from its delwri list. The buffer doesn't know which list it's on, so it cannot know which lock to take to protect the list for a removal.
If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep.
When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list.
This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency.
Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag.
To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. By waiting for the buffer to clear both the delwri list and any potential delwri wait list, we can be sure that repair will initiate writes of all buffers and report all write errors back to userspace instead of committing the new structure.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24 |
|
#
1a48327c |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: force all buffers to be written during btree bulk load
commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 upstream.
While stress-testing online repair of btrees, I noticed periodic assertion fail
xfs: force all buffers to be written during btree bulk load
commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 upstream.
While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffers with incorrect DELWRI_Q state. Looking further, I observed this race between the AIL trying to write out a btree block and repair zapping a btree block after the fact:
AIL: Repair0:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
delwri_submit # oops
Worse yet, I discovered that running the same repair over and over in a tight loop can result in a second race that cause data integrity problems with the repair:
AIL: Repair0: Repair1:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
find free space X get buffer rewrite buffer delwri_queue: set DELWRI_Q already on a list, do not add commit
BAD: committed tree root before all blocks written
delwri_submit # too late now
I traced this to my own misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. Marking the buffer stale clears DELWRI_Q from the buffer state without removing the buffer from its delwri list. The buffer doesn't know which list it's on, so it cannot know which lock to take to protect the list for a removal.
If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep.
When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list.
This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency.
Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag.
To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. By waiting for the buffer to clear both the delwri list and any potential delwri wait list, we can be sure that repair will initiate writes of all buffers and report all write errors back to userspace instead of committing the new structure.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45 |
|
#
9ed851f6 |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow scanning ranges of the buffer cache for live buffers
After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possib
xfs: allow scanning ranges of the buffer cache for live buffers
After an online repair, we need to invalidate buffers representing the blocks from the old metadata that we're replacing. It's possible that parts of a tree that were previously cached in memory are no longer accessible due to media failure or other corruption on interior nodes, so repair figures out the old blocks from the reverse mapping data and scans the buffer cache directly.
In other words, online fsck needs to find all the live (i.e. non-stale) buffers for a range of fsblocks so that it can invalidate them.
Unfortunately, the current buffer cache code triggers asserts if the rhashtable lookup finds a non-stale buffer of a different length than the key we searched for. For regular operation this is desirable, but for this repair procedure, we don't care since we're going to forcibly stale the buffer anyway. Add an internal lookup flag to avoid the assert. Skip buffers that are already XBF_STALE.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56 |
|
#
231f91ab |
| 18-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_buf cache destroy isn't RCU safe
Darrick and Sachin Sant reported that xfs/435 and xfs/436 would report an non-empty xfs_buf slab on module remove. This isn't easily to reproduce, but is cl
xfs: xfs_buf cache destroy isn't RCU safe
Darrick and Sachin Sant reported that xfs/435 and xfs/436 would report an non-empty xfs_buf slab on module remove. This isn't easily to reproduce, but is clearly a side effect of converting the buffer caceh to RUC freeing and lockless lookups. Sachin bisected and Darrick hit it when testing the patchset directly.
Turns out that the xfs_buf slab is not destroyed when all the other XFS slab caches are destroyed. Instead, it's got it's own little wrapper function that gets called separately, and so it doesn't have an rcu_barrier() call in it that is needed to drain all the rcu callbacks before the slab is destroyed.
Fix it by removing the xfs_buf_init/terminate wrappers that just allocate and destroy the xfs_buf slab, and move them to the same place that all the other slab caches are set up and destroyed.
Reported-and-tested-by: Sachin Sant <sachinp@linux.ibm.com> Fixes: 298f34224506 ("xfs: lockless buffer lookup") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
show more ...
|
Revision tags: v5.15.55 |
|
#
298f3422 |
| 13-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: lockless buffer lookup
Now that we have a standalone fast path for buffer lookup, we can easily convert it to use rcu lookups. When we continually hammer the buffer cache with trylock lookups,
xfs: lockless buffer lookup
Now that we have a standalone fast path for buffer lookup, we can easily convert it to use rcu lookups. When we continually hammer the buffer cache with trylock lookups, we end up with a huge amount of lock contention on the per-ag buffer hash locks:
- 92.71% 0.05% [kernel] [k] xfs_inodegc_worker - 92.67% xfs_inodegc_worker - 92.13% xfs_inode_unlink - 91.52% xfs_inactive_ifree - 85.63% xfs_read_agi - 85.61% xfs_trans_read_buf_map - 85.59% xfs_buf_read_map - xfs_buf_get_map - 85.55% xfs_buf_find - 72.87% _raw_spin_lock - do_raw_spin_lock 71.86% __pv_queued_spin_lock_slowpath - 8.74% xfs_buf_rele - 7.88% _raw_spin_lock - 7.88% do_raw_spin_lock 7.63% __pv_queued_spin_lock_slowpath - 1.70% xfs_buf_trylock - 1.68% down_trylock - 1.41% _raw_spin_lock_irqsave - 1.39% do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.76% _raw_spin_unlock 0.75% do_raw_spin_unlock
This is basically hammering the pag->pag_buf_lock from lots of CPUs doing trylocks at the same time. Most of the buffer trylock operations ultimately fail after we've done the lookup, so we're really hammering the buf hash lock whilst making no progress.
We can also see significant spinlock traffic on the same lock just under normal operation when lots of tasks are accessing metadata from the same AG, so let's avoid all this by converting the lookup fast path to leverages the rhashtable's ability to do rcu protected lookups.
We avoid races with the buffer release path by using atomic_inc_not_zero() on the buffer hold count. Any buffer that is in the LRU will have a non-zero count, thereby allowing the lockless fast path to be taken in most cache hit situations. If the buffer hold count is zero, then it is likely going through the release path so in that case we fall back to the existing lookup miss slow path.
The slow path will then do an atomic lookup and insert under the buffer hash lock and hence serialise correctly against buffer release freeing the buffer.
The use of rcu protected lookups means that buffer handles now need to be freed by RCU callbacks (same as inodes). We still free the buffer pages before the RCU callback - we won't be trying to access them at all on a buffer that has zero references - but we need the buffer handle itself to be present for the entire rcu protected read side to detect a zero hold count correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.15.54, v5.15.53 |
|
#
85c73bf7 |
| 07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: rework xfs_buf_incore() API
Make it consistent with the other buffer APIs to return a error and the buffer is placed in a parameter.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-
xfs: rework xfs_buf_incore() API
Make it consistent with the other buffer APIs to return a error and the buffer is placed in a parameter.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36 |
|
#
b9b3fe15 |
| 20-Apr-2022 |
Dave Chinner <david@fromorbit.com> |
xfs: convert buffer flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This manifests as a compiler error such as:
/kisskb/src/fs/xfs/./x
xfs: convert buffer flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This manifests as a compiler error such as:
/kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk' TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d " ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags' __print_flags(__entry->flags, "|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED' { XBF_UNMAPPED, "UNMAPPED" } ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS' __print_flags(__entry->flags, "|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class': /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant #define XBF_UNMAPPED (1 << 31)/* do not map the buffer */
as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an unsigned long for the flag. Since this results in the value of XBF_UNMAPPED causing a signed integer overflow, the result is technically undefined behavior, which gcc-5 does not accept as an integer constant.
This is based on a patch from Arnd Bergman <arnd@arndb.de>.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6 |
|
#
cd913c76 |
| 29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dax: return the partition offset from fs_dax_get_by_bdev
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file
dax: return the partition offset from fs_dax_get_by_bdev
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
5b5abbef |
| 29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: move dax device handling into xfs_{alloc,free}_buftarg
Hide the DAX device lookup from the xfs_super.c code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@
xfs: move dax device handling into xfs_{alloc,free}_buftarg
Hide the DAX device lookup from the xfs_super.c code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-22-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61 |
|
#
4c7f65ae |
| 18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: rename buffer cache index variable b_bn
To stop external users from using b_bn as the disk address of the buffer, rename it to b_rhash_key to indicate that it is the buffer cache index, not the
xfs: rename buffer cache index variable b_bn
To stop external users from using b_bn as the disk address of the buffer, rename it to b_rhash_key to indicate that it is the buffer cache index, not the block number of the buffer. Code that needs the disk address should use xfs_buf_daddr() to obtain it.
Do the rename and clean up any of the remaining internal b_bn users. Also clean up any remaining b_bn cruft that is now unused.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
04fcad80 |
| 18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_buf_daddr()
Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_
xfs: introduce xfs_buf_daddr()
Introduce a helper function xfs_buf_daddr() to extract the disk address of the buffer from the struct xfs_buf. This will replace direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as the XFS_BUF_ADDR() macro.
This patch introduces the helper function and replaces all uses of XFS_BUF_ADDR() as this is just a simple sed replacement.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
8cf07f3d |
| 18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: sb verifier doesn't handle uncached sb buffer
The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match the primary superblock buffer, but the primary superblock is an uncached buffe
xfs: sb verifier doesn't handle uncached sb buffer
The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match the primary superblock buffer, but the primary superblock is an uncached buffer and so bp->b_bn is always -1ULL. Hence this never matches and the CRC error reporting is wholly dependent on the mount superblock already being populated so CRC feature checks pass and allow CRC errors to be reported.
Fix this so that the primary superblock CRC error reporting is not dependent on already having read the superblock into memory.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.10.60 |
|
#
98fe2c3c |
| 09-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove kmem_alloc_io()
Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations suff
xfs: remove kmem_alloc_io()
Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee.
Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43 |
|
#
54cd3aa6 |
| 06-Jun-2021 |
Christoph Hellwig <hch@lst.de> |
xfs: remove ->b_offset handling for page backed buffers
->b_offset can only be non-zero for _XBF_KMEM backed buffers, so remove all code dealing with it for page backed buffers.
Signed-off-by: Chri
xfs: remove ->b_offset handling for page backed buffers
->b_offset can only be non-zero for _XBF_KMEM backed buffers, so remove all code dealing with it for page backed buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to fit this patchset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
8321ddb2 |
| 22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: don't drain buffer lru on freeze and read-only remount
xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the buffer cache is reclaimed during unmount. xfs_log_quiesce() is also cal
xfs: don't drain buffer lru on freeze and read-only remount
xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the buffer cache is reclaimed during unmount. xfs_log_quiesce() is also called from xfs_quiesce_attr(), however, which means that cache state is completely drained for filesystem freeze and read-only remount. While technically harmless, this is unnecessarily heavyweight. Both freeze and read-only mounts allow reads and thus allow population of the buffer cache. Therefore, the transitional sequence in either case really only needs to quiesce outstanding writes to return the filesystem in a generally read-only state.
Additionally, some users have reported that attempts to freeze a filesystem concurrent with a read-heavy workload causes the freeze process to stall for a significant amount of time. This occurs because, as mentioned above, the read workload repopulates the buffer LRU while the freeze task attempts to drain it.
To improve this situation, replace the drain in xfs_log_quiesce() with a buffer I/O quiesce and lift the drain into the unmount path. This removes buffer LRU reclaim from freeze and read-only [re]mount, but ensures the LRU is still drained before the filesystem unmounts.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
10fb9ac1 |
| 22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain()
xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its primary purpose is to reclaim all buffers from the provided buffer target LRU.
xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain()
xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its primary purpose is to reclaim all buffers from the provided buffer target LRU. In preparation to refactor xfs_wait_buftarg() into serialization and LRU draining components, rename the function and associated helpers to something more descriptive. This patch has no functional changes with the minor exception of renaming a tracepoint.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
e8222613 |
| 16-Dec-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf alignment by getting rid of the xfs_buf_t typedef from userspace.
[darrick: This patch is a port of a userspace patch removing the xfs_buf_
xfs: remove xfs_buf_t typedef
Prepare for kernel xfs_buf alignment by getting rid of the xfs_buf_t typedef from userspace.
[darrick: This patch is a port of a userspace patch removing the xfs_buf_t typedef in preparation to make the userspace xfs_buf code behave more like its kernel counterpart.]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62 |
|
#
26e32875 |
| 01-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: reuse _xfs_buf_read for re-reading the superblock
Instead of poking deeply into buffer cache internals when re-reading the superblock during log recovery just generalize _xfs_buf_read and use i
xfs: reuse _xfs_buf_read for re-reading the superblock
Instead of poking deeply into buffer cache internals when re-reading the superblock during log recovery just generalize _xfs_buf_read and use it there. Note that we don't have to explicitly set up the ops as they must be set from the initial read.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
#
6a7584b1 |
| 01-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: fold xfs_buf_ioend_finish into xfs_ioend
No need to keep a separate helper for this logic.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs: fold xfs_buf_ioend_finish into xfs_ioend
No need to keep a separate helper for this logic.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
#
76b2d323 |
| 01-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: mark xfs_buf_ioend static
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
Revision tags: v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7 |
|
#
b01d1461 |
| 29-Jun-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: call xfs_buf_iodone directly
All unmarked dirty buffers should be in the AIL and have log items attached to them. Hence when they are written, we will run a callback to remove the item from the
xfs: call xfs_buf_iodone directly
All unmarked dirty buffers should be in the AIL and have log items attached to them. Hence when they are written, we will run a callback to remove the item from the AIL if appropriate. Now that we've handled inode and dquot buffers, all remaining calls are to xfs_buf_iodone() and so we can hard code this rather than use an indirect call.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
#
9fe5c77c |
| 29-Jun-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: mark log recovery buffers for completion
Log recovery has it's own buffer write completion handler for buffers that it directly recovers. Convert these to direct calls by flagging these buffers
xfs: mark log recovery buffers for completion
Log recovery has it's own buffer write completion handler for buffers that it directly recovers. Convert these to direct calls by flagging these buffers as being log recovery buffers. The flag will get cleared by the log recovery IO completion routine, so it will never leak out of log recovery.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
#
0c7e5afb |
| 29-Jun-2020 |
Dave Chinner <david@fromorbit.com> |
xfs: mark dquot buffers in cache
dquot buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call,
xfs: mark dquot buffers in cache
dquot buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call, and makes future modifications much simpler.
This is largely a rearrangement of the code at this point - no IO completion functionality changes at this point, just how the code is run is modified.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
#
f593bf14 |
| 29-Jun-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: mark inode buffers in cache
Inode buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call,
xfs: mark inode buffers in cache
Inode buffers always have write IO callbacks, so by marking them directly we can avoid needing to attach ->b_iodone functions to them. This avoids an indirect call, and makes future modifications much simpler.
While this is largely a refactor of existing functionality, we broaden the scope of the flag to beyond where inodes are explicitly attached because future changes need to know what type of log items are attached to the buffer. Adding this buffer flag may invoke the inode iodone callback in cases where it wouldn't have been previously, but this is not a functional change because the callback is identical to the normal buffer write iodone callback when inodes are not attached.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|
Revision tags: v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40 |
|
#
f9bccfcc |
| 06-May-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: refactor ratelimited buffer error messages into helper
XFS has some inconsistent log message rate limiting with respect to buffer alerts. The metadata I/O error notification uses the generic ra
xfs: refactor ratelimited buffer error messages into helper
XFS has some inconsistent log message rate limiting with respect to buffer alerts. The metadata I/O error notification uses the generic ratelimited alert, the buffer push code uses a custom rate limit and the similar quiesce time failure checks are not rate limited at all (when they should be).
The custom rate limit defined in the buf item code is specifically crafted for buffer alerts. It is more aggressive than generic rate limiting code because it must accommodate a high frequency of I/O error events in a relative short timeframe.
Factor out the custom rate limit state from the buf item code into a per-buftarg rate limit so various alerts are limited based on the target. Define a buffer alert helper function and use it for the buffer alerts that are already ratelimited.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
show more ...
|