Revision tags: v6.6.25, v6.6.24 |
|
#
680776e5 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: transfer recovered intent item ownership in ->iop_recover
commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 upstream.
Now that we pass the xfs_defer_pending object into the intent item recovery
xfs: transfer recovered intent item ownership in ->iop_recover
commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 upstream.
Now that we pass the xfs_defer_pending object into the intent item recovery functions, we know exactly when ownership of the sole refcount passes from the recovery context to the intent done item. At that point, we need to null out dfp_intent so that the recovery mechanism won't release it. This should fix the UAF problem reported by Long Li.
Note that we still want to recreate the full deferred work state. That will be addressed in the next patches.
Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
87db24c8 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: pass the xfs_defer_pending object to iop_recover
commit a050acdfa8003a44eae4558fddafc7afb1aef458 upstream.
Now that log intent item recovery recreates the xfs_defer_pending state, we should pa
xfs: pass the xfs_defer_pending object to iop_recover
commit a050acdfa8003a44eae4558fddafc7afb1aef458 upstream.
Now that log intent item recovery recreates the xfs_defer_pending state, we should pass that into the ->iop_recover routines so that the intent item can finish the recreation work.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
cd3c2cf3 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_pending objects to recover intent items
commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 upstream.
One thing I never quite got around to doing is porting the log intent item recov
xfs: use xfs_defer_pending objects to recover intent items
commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 upstream.
One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed.
This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done.
Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function.
However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed.
This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c0231292 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't leak recovered attri intent items
commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 upstream.
If recovery finds an xattr log intent item calling for the removal of an attribute and the fil
xfs: don't leak recovered attri intent items
commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 upstream.
If recovery finds an xattr log intent item calling for the removal of an attribute and the file doesn't even have an attr fork, we know that the removal is trivially complete. However, we can't just exit the recovery function without doing something about the recovered log intent item -- it's still on the AIL, and not logging an attrd item means it stays there forever.
This has likely not been seen in practice because few people use LARP and the runtime code won't log the attri for a no-attrfork removexattr operation. But let's fix this anyway.
Also we shouldn't really be testing the attr fork presence until we've taken the ILOCK, though this doesn't matter much in recovery, which is single threaded.
Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24 |
|
#
680776e5 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: transfer recovered intent item ownership in ->iop_recover
commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 upstream.
Now that we pass the xfs_defer_pending object into the intent item recovery
xfs: transfer recovered intent item ownership in ->iop_recover
commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 upstream.
Now that we pass the xfs_defer_pending object into the intent item recovery functions, we know exactly when ownership of the sole refcount passes from the recovery context to the intent done item. At that point, we need to null out dfp_intent so that the recovery mechanism won't release it. This should fix the UAF problem reported by Long Li.
Note that we still want to recreate the full deferred work state. That will be addressed in the next patches.
Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
87db24c8 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: pass the xfs_defer_pending object to iop_recover
commit a050acdfa8003a44eae4558fddafc7afb1aef458 upstream.
Now that log intent item recovery recreates the xfs_defer_pending state, we should pa
xfs: pass the xfs_defer_pending object to iop_recover
commit a050acdfa8003a44eae4558fddafc7afb1aef458 upstream.
Now that log intent item recovery recreates the xfs_defer_pending state, we should pass that into the ->iop_recover routines so that the intent item can finish the recreation work.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
cd3c2cf3 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_pending objects to recover intent items
commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 upstream.
One thing I never quite got around to doing is porting the log intent item recov
xfs: use xfs_defer_pending objects to recover intent items
commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 upstream.
One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed.
This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done.
Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function.
However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed.
This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c0231292 |
| 26-Mar-2024 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't leak recovered attri intent items
commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 upstream.
If recovery finds an xattr log intent item calling for the removal of an attribute and the fil
xfs: don't leak recovered attri intent items
commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 upstream.
If recovery finds an xattr log intent item calling for the removal of an attribute and the file doesn't even have an attr fork, we know that the removal is trivially complete. However, we can't just exit the recovery function without doing something about the recovered log intent item -- it's still on the AIL, and not logging an attrd item means it stays there forever.
This has likely not been seen in practice because few people use LARP and the runtime code won't log the attri for a no-attrfork removexattr operation. But let's fix this anyway.
Also we shouldn't really be testing the attr fork presence until we've taken the ILOCK, though this doesn't matter much in recovery, which is single threaded.
Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
3c919b09 |
| 11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: reserve less log space when recovering log intent items
Wengang Wang reports that a customer's system was running a number of truncate operations on a filesystem with a very small log. Content
xfs: reserve less log space when recovering log intent items
Wengang Wang reports that a customer's system was running a number of truncate operations on a filesystem with a very small log. Contention on the reserve heads lead to other threads stalling on smaller updates (e.g. mtime updates) long enough to result in the node being rebooted on account of the lack of responsivenes. The node failed to recover because log recovery of an EFI became stuck waiting for a grant of reserve space. From Wengang's report:
"For the file deletion, log bytes are reserved basing on xfs_mount->tr_itruncate which is:
tr_logres = 175488, tr_logcount = 2, tr_logflags = XFS_TRANS_PERM_LOG_RES,
"You see it's a permanent log reservation with two log operations (two transactions in rolling mode). After calculation (xlog_calc_unit_res() adds space for various log headers), the final log space needed per transaction changes from 175488 to 180208 bytes. So the total log space needed is 360416 bytes (180208 * 2). [That quantity] of log space (360416 bytes) needs to be reserved for both run time inode removing (xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())."
In other words, runtime pre-reserves 360K of space in anticipation of running a chain of two transactions in which each transaction gets a 180K reservation.
Now that we've allocated the transaction, we delete the bmap mapping, log an EFI to free the space, and roll the transaction as part of finishing the deferops chain. Rolling creates a new xfs_trans which shares its ticket with the old transaction. Next, xfs_trans_roll calls __xfs_trans_commit with regrant == true, which calls xlog_cil_commit with the same regrant parameter.
xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and subtracts t_curr_res from the reservation and write heads.
If the filesystem is fresh and the first transaction only used (say) 20K, then t_curr_res will be 160K, and we give that much reservation back to the reservation head. Or if the file is really fragmented and the first transaction actually uses 170K, then t_curr_res will be 10K, and that's what we give back to the reservation.
Having done that, we're now headed into the second transaction with an EFI and 180K of reservation. Other threads apparently consumed all the reservation for smaller transactions, such as timestamp updates.
Now let's say the first transaction gets written to disk and we crash without ever completing the second transaction. Now we remount the fs, log recovery finds the unfinished EFI, and calls xfs_efi_recover to finish the EFI. However, xfs_efi_recover starts a new tr_itruncate tranasction, which asks for 360K log reservation. This is a lot more than the 180K that we had reserved at the time of the crash. If the first EFI to be recovered is also pinning the tail of the log, we will be unable to free any space in the log, and recovery livelocks.
Wengang confirmed this:
"Now we have the second transaction which has 180208 log bytes reserved too. The second transaction is supposed to process intents including extent freeing. With my hacking patch, I blocked the extent freeing 5 hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved.
"With my test case, other transactions (update timestamps) then happen. As my hacking patch pins the journal tail, those timestamp-updating transactions finally use up (almost) all the left available log space (in memory in on disk). And finally the on disk (and in memory) available log space goes down near to 180208 bytes. Those 180208 bytes are reserved by [the] second (extent-free) transaction [in the chain]."
Wengang and I noticed that EFI recovery starts a transaction, completes one step of the chain, and commits the transaction without completing any other steps of the chain. Those subsequent steps are completed by xlog_finish_defer_ops, which allocates yet another transaction to finish the rest of the chain. That transaction gets the same tr_logres as the head transaction, but with tr_logcount = 1 to force regranting with every roll to avoid livelocks.
In other words, we already figured this out in commit 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation"), but should have applied that logic to each intent item's recovery function. For Wengang's case, the xfs_trans_alloc call in the EFI recovery function should only be asking for a single transaction's worth of log reservation -- 180K, not 360K.
Quoting Wengang again:
"With log recovery, during EFI recovery, we use tr_itruncate again to reserve two transactions that needs 360416 log bytes. Reserving 360416 bytes fails [stalls] because we now only have about 180208 available.
"Actually during the EFI recover, we only need one transaction to free the extents just like the 2nd transaction at RUNTIME. So it only needs to reserve 180208 rather than 360416 bytes. We have (a bit) more than 180208 available log bytes on disk, so [if we decrease the reservation to 180K] the reservation goes and the recovery [finishes]. That is to say: we can fix the log recover part to fix the issue. We can introduce a new xfs_trans_res xfs_mount->tr_ext_free
{ tr_logres = 175488, tr_logcount = 0, tr_logflags = 0, }
"and use tr_ext_free instead of tr_itruncate in EFI recover."
However, I don't think it quite makes sense to create an entirely new transaction reservation type to handle single-stepping during log recovery. Instead, we should copy the transaction reservation information in the xfs_mount, change tr_logcount to 1, and pass that into xfs_trans_alloc. We know this won't risk changing the min log size computation since we always ask for a fraction of the reservation for all known transaction types.
This looks like it's been lurking in the codebase since commit 3d3c8b5222b92, which changed the xfs_trans_reserve call in xlog_recover_process_efi to use the tr_logcount in tr_itruncate. That changed the EFI recovery transaction from making a non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth.
Fixes: 3d3c8b5222b92 ("xfs: refactor xfs_trans_reserve() interface") Complements: 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation") Suggested-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Srikanth C S <srikanth.c.s@oracle.com> [djwong: apply the same transformation to all log intent recovery] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4 |
|
#
950f0d50 |
| 25-Oct-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: dump corrupt recovered log intent items to dmesg consistently
If log recovery decides that an intent item is corrupt and wants to abort the mount, capture a hexdump of the corrupt log item in t
xfs: dump corrupt recovered log intent items to dmesg consistently
If log recovery decides that an intent item is corrupt and wants to abort the mount, capture a hexdump of the corrupt log item in the kernel log for further analysis. Some of the log item code already did this, so we're fixing the rest to do it consistently.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.0.3 |
|
#
59da7ff4 |
| 20-Oct-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix validation in attr log item recovery
Before we start fixing all the complaints about memcpy'ing log items around, let's fix some inadequate validation in the xattr log item recovery code an
xfs: fix validation in attr log item recovery
Before we start fixing all the complaints about memcpy'ing log items around, let's fix some inadequate validation in the xattr log item recovery code and get rid of the (now trivial) copy_format function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69 |
|
#
dc256418 |
| 18-Sep-2022 |
Zhiqiang Liu <liuzhiqiang26@huawei.com> |
xfs: do not need to check return value of xlog_kvmalloc()
In xfs_attri_log_nameval_alloc(), xlog_kvmalloc() is called to alloc memory, which will always return successfully, so we donot need to chec
xfs: do not need to check return value of xlog_kvmalloc()
In xfs_attri_log_nameval_alloc(), xlog_kvmalloc() is called to alloc memory, which will always return successfully, so we donot need to check return value.
Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51 |
|
#
e53bcffa |
| 25-Jun-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't hold xattr leaf buffers across transaction rolls
Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creat
xfs: don't hold xattr leaf buffers across transaction rolls
Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit.
The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block.
Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
show more ...
|
Revision tags: v5.15.50 |
|
#
f94e08b6 |
| 23-Jun-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clean up the end of xfs_attri_item_recover
The end of this function could use some cleanup -- the EAGAIN conditionals make it harder to figure out what's going on with the disposal of xattri_le
xfs: clean up the end of xfs_attri_item_recover
The end of this function could use some cleanup -- the EAGAIN conditionals make it harder to figure out what's going on with the disposal of xattri_leaf_bp, and the dual error/ret variables aren't needed. Turn the EAGAIN case into a separate block documenting all the subtleties of recovering in the middle of an xattr update chain, which makes the rest of the prologue much simpler.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
#
b822ea17 |
| 22-Jun-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: always free xattri_leaf_bp when cancelling a deferred op
While running the following fstest with logged xattrs DISabled, I noticed the following:
# FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f
xfs: always free xattri_leaf_bp when cancelling a deferred op
While running the following fstest with logged xattrs DISabled, I noticed the following:
# FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f creat=2 -f mkdir=2 -f getfattr=3 -f listfattr=3 -f attr_remove=4 -f removefattr=4 -f setfattr=20 -f attr_set=60" ./check generic/475
INFO: task u9:1:40 blocked for more than 61 seconds. Tainted: G O 5.19.0-rc2-djwx #rc2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:u9:1 state:D stack:12872 pid: 40 ppid: 2 flags:0x00004000 Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs] Call Trace: <TASK> __schedule+0x2db/0x1110 schedule+0x58/0xc0 schedule_timeout+0x115/0x160 __down_common+0x126/0x210 down+0x54/0x70 xfs_buf_lock+0x2d/0xe0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_buf_item_unpin+0x227/0x3a0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xfs_trans_committed_bulk+0x18e/0x320 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_committed+0x2ea/0x360 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] xlog_cil_push_work+0x60f/0x690 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73] process_one_work+0x1df/0x3c0 worker_thread+0x53/0x3b0 kthread+0xea/0x110 ret_from_fork+0x1f/0x30 </TASK>
This appears to be the result of shortform_to_leaf creating a new leaf buffer as part of adding an xattr to a file. The new leaf buffer is held and attached to the xfs_attr_intent structure, but then the filesystem shuts down. Instead of the usual path (which adds the attr to the held leaf buffer which releases the hold), we instead cancel the entire deferred operation.
Unfortunately, xfs_attr_cancel_item doesn't release any attached leaf buffers, so we leak the locked buffer. The CIL cannot do anything about that, and hangs. Fix this by teaching it to release leaf buffers, and make XFS a little more careful about not leaving a dangling reference.
The prologue of xfs_attri_item_recover is (in this author's opinion) a little hard to figure out, so I'll clean that up in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
f4288f01 |
| 05-Jun-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix TOCTOU race involving the new logged xattrs control knob
I found a race involving the larp control knob, aka the debugging knob that lets developers enable logging of extended attribute upd
xfs: fix TOCTOU race involving the new logged xattrs control knob
I found a race involving the larp control knob, aka the debugging knob that lets developers enable logging of extended attribute updates:
Thread 1 Thread 2
echo 0 > /sys/fs/xfs/debug/larp setxattr(REPLACE) xfs_has_larp (returns false) xfs_attr_set
echo 1 > /sys/fs/xfs/debug/larp
xfs_attr_defer_replace xfs_attr_init_replace_state xfs_has_larp (returns true) xfs_attr_init_remove_state
<oops, wrong DAS state!>
This isn't a particularly severe problem right now because xattr logging is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know what they're doing.
However, the eventual intent is that callers should be able to ask for the assistance of the log in persisting xattr updates. This capability might not be required for /all/ callers, which means that dynamic control must work correctly. Once an xattr update has decided whether or not to use logged xattrs, it needs to stay in that mode until the end of the operation regardless of what subsequent parallel operations might do.
Therefore, it is an error to continue sampling xfs_globals.larp once xfs_attr_change has made a decision about larp, and it was not correct for me to have told Allison that ->create_intent functions can sample the global log incompat feature bitfield to decide to elide a log item.
Instead, create a new op flag for the xfs_da_args structure, and convert all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within the attr update state machine to look for the operations flag.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
show more ...
|
Revision tags: v5.15.44, v5.15.43, v5.15.42 |
|
#
4183e4f2 |
| 22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: share xattr name and value buffers when logging xattr updates
While running xfs/297 and generic/642, I noticed a crash in xfs_attri_item_relog when it tries to copy the attr name to the new xat
xfs: share xattr name and value buffers when logging xattr updates
While running xfs/297 and generic/642, I noticed a crash in xfs_attri_item_relog when it tries to copy the attr name to the new xattri log item. I think what happened here was that we called ->iop_commit on the old attri item (which nulls out the pointers) as part of a log force at the same time that a chained attr operation was ongoing. The system was busy enough that at some later point, the defer ops operation decided it was necessary to relog the attri log item, but as we've detached the name buffer from the old attri log item, we can't copy it to the new one, and kaboom.
I think there's a broader refcounting problem with LARP mode -- the setxattr code can return to userspace before the CIL actually formats and commits the log item, which results in a UAF bug. Therefore, the xattr log item needs to be able to retain a reference to the name and value buffers until the log items have completely cleared the log. Furthermore, each time we create an intent log item, we allocate new memory and (re)copy the contents; sharing here would be very useful.
Solve the UAF and the unnecessary memory allocations by having the log code create a single refcounted buffer to contain the name and value contents. This buffer can be passed from old to new during a relog operation, and the logging code can (optionally) attach it to the xfs_attr_item for reuse when LARP mode is enabled.
This also fixes a problem where the xfs_attri_log_item objects weren't being freed back to the same cache where they came from.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
Revision tags: v5.18 |
|
#
73c348d4 |
| 22-May-2022 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
xfs: Remove duplicate include
Clean up the following includecheck warning:
./fs/xfs/xfs_attr_item.c: xfs_inode.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Sign
xfs: Remove duplicate include
Clean up the following includecheck warning:
./fs/xfs/xfs_attr_item.c: xfs_inode.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
e3c5de22 |
| 22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename struct xfs_attr_item to xfs_attr_intent
Everywhere else in XFS, structures that capture the state of an ongoing deferred work item all have names that end with "_intent". The new extend
xfs: rename struct xfs_attr_item to xfs_attr_intent
Everywhere else in XFS, structures that capture the state of an ongoing deferred work item all have names that end with "_intent". The new extended attribute deferred work items are not named as such, so fix it to follow the naming convention used elsewhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
4136e38a |
| 22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: put attr[id] log item cache init with the others
Initialize and destroy the xattr log item caches in the same places that we do all the other log item caches.
Signed-off-by: Darrick J. Wong <d
xfs: put attr[id] log item cache init with the others
Initialize and destroy the xattr log item caches in the same places that we do all the other log item caches.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
e2c78949 |
| 22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use a separate slab cache for deferred xattr work state
Create a separate slab cache for struct xfs_attr_item objects, since we can pack the (104-byte) intent items more tightly than we can wit
xfs: use a separate slab cache for deferred xattr work state
Create a separate slab cache for struct xfs_attr_item objects, since we can pack the (104-byte) intent items more tightly than we can with the general slab cache objects. On x86, this means 39 intents per memory page instead of 32.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
b53d212b |
| 22-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: put the xattr intent item op flags in their own namespace
The flags that are stored in the extended attr intent log item really should have a separate namespace from the rest of the XFS_ATTR_*
xfs: put the xattr intent item op flags in their own namespace
The flags that are stored in the extended attr intent log item really should have a separate namespace from the rest of the XFS_ATTR_* flags. Give them one to make it a little more obvious that they're intent item flags.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
2fe3ffcf |
| 19-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: free xfs_attrd_log_items correctly
Technically speaking, objects allocated out of a specific slab cache are supposed to be freed to that slab cache. The popular slab backends will take care of
xfs: free xfs_attrd_log_items correctly
Technically speaking, objects allocated out of a specific slab cache are supposed to be freed to that slab cache. The popular slab backends will take care of this for us, but SLOB famously doesn't. Fix this, even if slob + xfs are not that common of a combination.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
25b1e9dc |
| 19-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: validate xattr name earlier in recovery
When we're validating a recovered xattr log item during log recovery, we should check the name before starting to allocate resources. This isn't strictl
xfs: validate xattr name earlier in recovery
When we're validating a recovered xattr log item during log recovery, we should check the name before starting to allocate resources. This isn't strictly necessary on its own, but it means that we won't bother with huge memory allocations during recovery if the attr name is garbage, which will simplify the changes in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
85d76aec |
| 19-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: reject unknown xattri log item filter flags during recovery
Make sure we screen the "attr flags" field of recovered xattr intent log items to reject flag bits that we don't know about. This is
xfs: reject unknown xattri log item filter flags during recovery
Make sure we screen the "attr flags" field of recovered xattr intent log items to reject flag bits that we don't know about. This is really the attr *filter* field from xfs_da_args, so rename the field and create a mask to make checking for invalid bits easier.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|