Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
49813a21 |
| 11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: make inode unlinked bucket recovery work with quotacheck
Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possib
xfs: make inode unlinked bucket recovery work with quotacheck
Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
#
f5bfa695 |
| 11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove the all-mounts list
Revert commit 0ed17f01c8540 ("xfs: introduce all-mounts list for cpu hotplug notifications") because the cpu hotplug hooks are now pointless, so we don't need this li
xfs: remove the all-mounts list
Revert commit 0ed17f01c8540 ("xfs: introduce all-mounts list for cpu hotplug notifications") because the cpu hotplug hooks are now pointless, so we don't need this list anymore.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
62334fab |
| 11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use per-mount cpumask to track nonempty percpu inodegc lists
Directly track which CPUs have contributed to the inodegc percpu lists instead of trusting the cpu online mask. This eliminates a t
xfs: use per-mount cpumask to track nonempty percpu inodegc lists
Directly track which CPUs have contributed to the inodegc percpu lists instead of trusting the cpu online mask. This eliminates a theoretical problem where the inodegc flush functions might fail to flush a CPU's inodes if that CPU happened to be dying at exactly the same time. Most likely nobody's noticed this because the CPU dead hook moves the percpu inodegc list to another CPU and schedules that worker immediately. But it's quite possible that this is a subtle race leading to UAF if the inodegc flush were part of an unmount.
Further benefits: This reduces the overhead of the inodegc flush code slightly by allowing us to ignore CPUs that have empty lists. Better yet, it reduces our dependence on the cpu online masks, which have been the cause of confusion and drama lately.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45 |
|
#
d7a74cad |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: track usage statistics of online fsck
Track the usage, outcomes, and run times of the online fsck code, and report these values via debugfs. The columns in the file are:
* scrubber name
*
xfs: track usage statistics of online fsck
Track the usage, outcomes, and run times of the online fsck code, and report these values via debugfs. The columns in the file are:
* scrubber name
* number of scrub invocations * clean objects found * corruptions found * optimizations found * cross referencing failures * inconsistencies found during cross referencing * incomplete scrubs * warnings * number of time scrub had to retry * cumulative amount of time spent scrubbing (microseconds)
* number of repair inovcations * successfully repaired objects * cumuluative amount of time spent repairing (microseconds)
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
a76dba3b |
| 10-Aug-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: create scaffolding for creating debugfs entries
Set up debugfs directories for xfs as a whole, and a subdirectory for each mounted filesystem. This will enable the creation of debugfs files in
xfs: create scaffolding for creating debugfs entries
Set up debugfs directories for xfs as a whole, and a subdirectory for each mounted filesystem. This will enable the creation of debugfs files in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
e7caa877 |
| 01-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
xfs: wire up sops->shutdown
Wire up the shutdown method to shut down the file system when the underlying block device is marked dead. Add a new message to clearly distinguish this shutdown reason f
xfs: wire up sops->shutdown
Wire up the shutdown method to shut down the file system when the underlying block device is marked dead. Add a new message to clearly distinguish this shutdown reason from other shutdowns.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d4d12c02 |
| 04-Jun-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: collect errors from inodegc for unlinked inode recovery
Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we
xfs: collect errors from inodegc for unlinked inode recovery
Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list.
This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down.
Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error.
In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain.
In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v6.1.31, v6.1.30, v6.1.29, v6.1.28 |
|
#
b37c4c83 |
| 01-May-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: check that per-cpu inodegc workers actually run on that cpu
Now that we've allegedly worked out the problem of the per-cpu inodegc workers being scheduled on the wrong cpu, let's put in a debug
xfs: check that per-cpu inodegc workers actually run on that cpu
Now that we've allegedly worked out the problem of the per-cpu inodegc workers being scheduled on the wrong cpu, let's put in a debugging knob to let us know if a worker ever gets mis-scheduled again.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12 |
|
#
20a5eab4 |
| 12-Feb-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert xfs_ialloc_next_ag() to an atomic
This is currently a spinlock lock protected rotor which can be implemented with a single atomic operation. Change it to be more efficient and get rid o
xfs: convert xfs_ialloc_next_ag() to an atomic
This is currently a spinlock lock protected rotor which can be implemented with a single atomic operation. Change it to be more efficient and get rid of the m_agirotor_lock. Noticed while converting the inode allocation AG selection loop to active perag references.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
6f643c57 |
| 03-Jun-2022 |
Shiyang Ruan <ruansy.fnst@fujitsu.com> |
xfs: implement ->notify_failure() for XFS
Introduce xfs_notify_failure.c to handle failure related works, such as implement ->notify_failure(), register/unregister dax holder in xfs, and so on.
If
xfs: implement ->notify_failure() for XFS
Introduce xfs_notify_failure.c to handle failure related works, such as implement ->notify_failure(), register/unregister dax holder in xfs, and so on.
If the rmap feature of XFS enabled, we can query it to find files and metadata which are associated with the corrupt data. For now all we do is kill processes with that file mapped into their address spaces, but future patches could actually do something about corrupt metadata.
After that, the memory failure needs to notify the processes who are using those files.
Link: https://lkml.kernel.org/r/20220603053738.1218681-7-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.wiliams@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Jane Chu <jane.chu@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
7cf2b0f9 |
| 16-Jun-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: bound maximum wait time for inodegc work
Currently inodegc work can sit queued on the per-cpu queue until the workqueue is either flushed of the queue reaches a depth that triggers work queuing
xfs: bound maximum wait time for inodegc work
Currently inodegc work can sit queued on the per-cpu queue until the workqueue is either flushed of the queue reaches a depth that triggers work queuing (and later throttling). This means that we could queue work that waits for a long time for some other event to trigger flushing.
Hence instead of just queueing work at a specific depth, use a delayed work that queues the work at a bound time. We can still schedule the work immediately at a given depth, but we no long need to worry about leaving a number of items on the list that won't get processed until external events prevail.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.15.44 |
|
#
202865cc |
| 26-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: warn about LARP once per mount
Since LARP is an experimental debug-only feature, we should try to warn about it being in use once per mount, not once per reboot.
Signed-off-by: Darrick J. Wong
xfs: warn about LARP once per mount
Since LARP is an experimental debug-only feature, we should try to warn about it being in use once per mount, not once per reboot.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
df5660cf |
| 26-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: implement per-mount warnings for scrub and shrink usage
Currently, we don't have a consistent story around logging when an EXPERIMENTAL feature gets turned on at runtime -- online fsck and shri
xfs: implement per-mount warnings for scrub and shrink usage
Currently, we don't have a consistent story around logging when an EXPERIMENTAL feature gets turned on at runtime -- online fsck and shrink log a message once per day across all mounts, and the recently merged LARP mode only ever does it once per insmod cycle or reboot.
Because EXPERIMENTAL tags are supposed to go away eventually, convert the existing daily warnings into state flags that travel with the mount, and warn once per mount. Making this an opstate flag means that we'll be able to capture the experimental usage in the ftrace output too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36 |
|
#
2eb7550d |
| 20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert shutdown reasons to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by:
xfs: convert shutdown reasons to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.15.35, v5.15.34 |
|
#
2229276c |
| 11-Apr-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use a separate frextents counter for rt extent reservations
As mentioned in the previous commit, the kernel misuses sb_frextents in the incore mount to reflect both incore reservations made by
xfs: use a separate frextents counter for rt extent reservations
As mentioned in the previous commit, the kernel misuses sb_frextents in the incore mount to reflect both incore reservations made by running transactions as well as the actual count of free rt extents on disk. This results in the superblock being written to the log with an underestimate of the number of rt extents that are marked free in the rtbitmap.
Teaching XFS to recompute frextents after log recovery avoids operational problems in the current mount, but it doesn't solve the problem of us writing undercounted frextents which are then recovered by an older kernel that doesn't have that fix.
Create an incore percpu counter to mirror the ondisk frextents. This new counter will track transaction reservations and the only time we will touch the incore super counter (i.e the one that gets logged) is when those transactions commit updates to the rt bitmap. This is in contrast to the lazysbcount counters (e.g. fdblocks), where we know that log recovery will always fix any incorrect counter that we log. As a bonus, we only take m_sb_lock at transaction commit time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3 |
|
#
919819f5 |
| 16-Nov-2021 |
Chandan Babu R <chandan.babu@oracle.com> |
xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems which support large per-inode extent counte
xfs: Introduce XFS_SB_FEAT_INCOMPAT_NREXT64 and associated per-fs feature bit
XFS_SB_FEAT_INCOMPAT_NREXT64 incompat feature bit will be set on filesystems which support large per-inode extent counters. This commit defines the new incompat feature bit and the corresponding per-fs feature bit (along with inline functions to work on it).
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
show more ...
|
#
c8c56825 |
| 16-Mar-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and request
xfs: don't include bnobt blocks when reserving free block pool
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67 |
|
#
b74e15d7 |
| 16-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: compute maximum AG btree height for critical reservation calculation
Compute the actual maximum AG btree height for deciding if a per-AG block reservation is critically low. This only affects
xfs: compute maximum AG btree height for critical reservation calculation
Compute the actual maximum AG btree height for deciding if a per-AG block reservation is critically low. This only affects the sanity check condition, since we /generally/ will trigger on the 10% threshold. This is a long-winded way of saying that we're removing one more usage of XFS_BTREE_MAXLEVELS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
7cb3efb4 |
| 13-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename m_ag_maxlevels to m_allocbt_maxlevels
Years ago when XFS was thought to be much more simple, we introduced m_ag_maxlevels to specify the maximum btree height of per-AG btrees for a given
xfs: rename m_ag_maxlevels to m_allocbt_maxlevels
Years ago when XFS was thought to be much more simple, we introduced m_ag_maxlevels to specify the maximum btree height of per-AG btrees for a given filesystem mount. Then we observed that inode btrees don't actually have the same height and split that off; and now we have rmap and refcount btrees with much different geometries and separate maxlevels variables.
The 'ag' part of the name doesn't make much sense anymore, so rename this to m_alloc_maxlevels to reinforce that this is the maximum height of the *free space* btrees. This sets us up for the next patch, which will add a variable to track the maximum height of all AG btrees.
(Also take the opportunity to improve adjacent comments and fix minor style problems.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
88beb994 |
| 14-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reser
xfs: don't include bnobt blocks when reserving free block pool
[ Upstream commit c8c568259772751a14e969b7230990508de73d9d ]
xfs_reserve_blocks controls the size of the user-visible free space reserve pool. Given the difference between the current and requested pool sizes, it will try to reserve free space from fdblocks. However, the amount requested from fdblocks is also constrained by the amount of space that we think xfs_mod_fdblocks will give us. If we forget to subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will return ENOSPC and we'll hang the kernel at mount due to the infinite loop.
In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand out the "free space" used by the free space btrees, because some portion of the free space btrees hold in reserve space for future btree expansion. Unfortunately, xfs_reserve_blocks' estimation of the number of blocks that it could request from xfs_mod_fdblocks was not updated to include m_allocbt_blks, so if space is extremely low, the caller hangs.
Fix this by creating a function to estimate the number of blocks that can be reserved from fdblocks, which needs to exclude the set-aside and m_allocbt_blks.
Found by running xfs/306 (which formats a single-AG 20MB filesystem) with an fstests configuration that specifies a 1k blocksize and a specially crafted log size that will consume 7/8 of the space (17920 blocks, specifically) in that AG.
Cc: Brian Foster <bfoster@redhat.com> Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|