Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
e25447c3 |
| 01-Mar-2024 |
Josef Bacik <josef@toxicpanda.com> |
nfs: fix UAF in direct writes
[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]
In production we have been hitting the following warning consistently
------------[ cut here ]-----------
nfs: fix UAF in direct writes
[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]
In production we have been hitting the following warning consistently
------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0 Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] RIP: 0010:refcount_warn_saturate+0x9c/0xe0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x9f/0x130 ? refcount_warn_saturate+0x9c/0xe0 ? report_bug+0xcc/0x150 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0x9c/0xe0 nfs_direct_write_schedule_work+0x237/0x250 [nfs] process_one_work+0x12f/0x4a0 worker_thread+0x14e/0x3b0 ? ZSTD_getCParams_internal+0x220/0x220 kthread+0xdc/0x120 ? __btf_name_valid+0xa0/0xa0 ret_from_fork+0x1f/0x30
This is because we're completing the nfs_direct_request twice in a row.
The source of this is when we have our commit requests to submit, we process them and send them off, and then in the completion path for the commit requests we have
if (nfs_commit_end(cinfo.mds)) nfs_direct_write_complete(dreq);
However since we're submitting asynchronous requests we sometimes have one that completes before we submit the next one, so we end up calling complete on the nfs_direct_request twice.
The only other place we use nfs_generic_commit_list() is in __nfs_commit_inode, which wraps this call in a
nfs_commit_begin(); nfs_commit_end();
Which is a common pattern for this style of completion handling, one that is also repeated in the direct code with get_dreq()/put_dreq() calls around where we process events as well as in the completion paths.
Fix this by using the same pattern for the commit requests.
Before with my 200 node rocksdb stress running this warning would pop every 10ish minutes. With my patch the stress test has been running for several hours without popping.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
e25447c3 |
| 01-Mar-2024 |
Josef Bacik <josef@toxicpanda.com> |
nfs: fix UAF in direct writes
[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]
In production we have been hitting the following warning consistently
------------[ cut here ]-----------
nfs: fix UAF in direct writes
[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]
In production we have been hitting the following warning consistently
------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0 Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] RIP: 0010:refcount_warn_saturate+0x9c/0xe0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x9f/0x130 ? refcount_warn_saturate+0x9c/0xe0 ? report_bug+0xcc/0x150 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0x9c/0xe0 nfs_direct_write_schedule_work+0x237/0x250 [nfs] process_one_work+0x12f/0x4a0 worker_thread+0x14e/0x3b0 ? ZSTD_getCParams_internal+0x220/0x220 kthread+0xdc/0x120 ? __btf_name_valid+0xa0/0xa0 ret_from_fork+0x1f/0x30
This is because we're completing the nfs_direct_request twice in a row.
The source of this is when we have our commit requests to submit, we process them and send them off, and then in the completion path for the commit requests we have
if (nfs_commit_end(cinfo.mds)) nfs_direct_write_complete(dreq);
However since we're submitting asynchronous requests we sometimes have one that completes before we submit the next one, so we end up calling complete on the nfs_direct_request twice.
The only other place we use nfs_generic_commit_list() is in __nfs_commit_inode, which wraps this call in a
nfs_commit_begin(); nfs_commit_end();
Which is a common pattern for this style of completion handling, one that is also repeated in the direct code with get_dreq()/put_dreq() calls around where we process events as well as in the completion paths.
Fix this by using the same pattern for the commit requests.
Before with my 200 node rocksdb stress running this warning would pop every 10ish minutes. With my patch the stress test has been running for several hours without popping.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2 |
|
#
75aa038d |
| 17-Nov-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
pNFS: Fix the pnfs block driver's calculation of layoutget size
[ Upstream commit 8a6291bf3b0eae1bf26621e6419a91682f2d6227 ]
Instead of relying on the value of the 'bytes_left' field, we should cal
pNFS: Fix the pnfs block driver's calculation of layoutget size
[ Upstream commit 8a6291bf3b0eae1bf26621e6419a91682f2d6227 ]
Instead of relying on the value of the 'bytes_left' field, we should calculate the layout size based on the offset of the request that is being written out.
Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2 |
|
#
b11243f7 |
| 04-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: More fixes for nfs_direct_write_reschedule_io()
Ensure that all requests are put back onto the commit list so that they can be rescheduled.
Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_resc
NFS: More fixes for nfs_direct_write_reschedule_io()
Ensure that all requests are put back onto the commit list so that they can be rescheduled.
Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_reschedule_io()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
b193a78d |
| 04-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Use the correct commit info in nfs_join_page_group()
Ensure that nfs_clear_request_commit() updates the correct counters when it removes them from the commit list.
Fixes: ed5d588fe47f ("NFS: T
NFS: Use the correct commit info in nfs_join_page_group()
Ensure that nfs_clear_request_commit() updates the correct counters when it removes them from the commit list.
Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
8982f7af |
| 04-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: More O_DIRECT accounting fixes for error paths
If we hit a fatal error when retransmitting, we do need to record the removal of the request from the count of written bytes.
Fixes: 031d73ed768a
NFS: More O_DIRECT accounting fixes for error paths
If we hit a fatal error when retransmitting, we do need to record the removal of the request from the count of written bytes.
Fixes: 031d73ed768a ("NFS: Fix O_DIRECT accounting of number of bytes read/written") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
7c633932 |
| 04-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix O_DIRECT locking issues
The dreq fields are protected by the dreq->lock.
Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Signed-off-by: Trond Myklebust <trond
NFS: Fix O_DIRECT locking issues
The dreq fields are protected by the dreq->lock.
Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
954998b6 |
| 04-Sep-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix error handling for O_DIRECT write scheduling
If we fail to schedule a request for transmission, there are 2 possibilities: 1) Either we hit a fatal error, and we just want to drop the remai
NFS: Fix error handling for O_DIRECT write scheduling
If we fail to schedule a request for transmission, there are 2 possibilities: 1) Either we hit a fatal error, and we just want to drop the remaining requests on the floor. 2) We were asked to try again, in which case we should allow the outstanding RPC calls to complete, so that we can recoalesce requests and try again.
Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
88975a55 |
| 19-Aug-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix a potential data corruption
We must ensure that the subrequests are joined back into the head before we can retransmit a request. If the head was not on the commit lists, because the server
NFS: Fix a potential data corruption
We must ensure that the subrequests are joined back into the head before we can retransmit a request. If the head was not on the commit lists, because the server wrote it synchronously, we still need to add it back to the retransmission list. Add a call that mirrors the effect of nfs_cancel_remove_inode() for O_DIRECT.
Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v6.1.46, v6.1.45 |
|
#
be2fd156 |
| 08-Aug-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix a use after free in nfs_direct_join_group()
Be more careful when tearing down the subrequests of an O_DIRECT write as part of a retransmission.
Reported-by: Chris Mason <clm@fb.com> Fixes:
NFS: Fix a use after free in nfs_direct_join_group()
Be more careful when tearing down the subrequests of an O_DIRECT write as part of a retransmission.
Reported-by: Chris Mason <clm@fb.com> Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
70e9db69 |
| 19-Jan-2023 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Clean up O_DIRECT request allocation
Rather than adjusting the index+offset after the call to nfs_create_request(), add a function nfs_page_create_from_page() that takes an offset.
Signed-off-
NFS: Clean up O_DIRECT request allocation
Rather than adjusting the index+offset after the call to nfs_create_request(), add a function nfs_page_create_from_page() that takes an offset.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47 |
|
#
1ef255e2 |
| 09-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned.
Provide inline wrapp
iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned.
Provide inline wrappers doing that, convert trivial open-coded uses of those.
BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday...
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18 |
|
#
fcb14cb1 |
| 22-May-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
new iov_iter flavour - ITER_UBUF
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones.
We are going to expose t
new iov_iter flavour - ITER_UBUF
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones.
We are going to expose the things like ->write_iter() et.al. to those in subsequent commits.
New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied.
DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
69d96651 |
| 22-Jul-2022 |
Jeff Layton <jlayton@kernel.org> |
nfs: only issue commit in DIO codepath if we have uncommitted data
Currently, we try to determine whether to issue a commit based on nfs_write_need_commit which looks at the current verifier. In the
nfs: only issue commit in DIO codepath if we have uncommitted data
Currently, we try to determine whether to issue a commit based on nfs_write_need_commit which looks at the current verifier. In the case where we got a short write and then tried to follow it up with one that failed, the verifier can't be trusted.
What we really want to know is whether the pgio request had any successful writes that came back as UNSTABLE. Add a new flag to the pgio request, and use that to indicate that we've had a successful unstable write. Only issue a commit if that flag is set.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
#
55051c0c |
| 22-Jul-2022 |
Jeff Layton <jlayton@kernel.org> |
nfs: always check dreq->error after a commit
When the client gets back a short DIO write, it will then attempt to issue another write to finish the DIO request. If that write then fails (as is often
nfs: always check dreq->error after a commit
When the client gets back a short DIO write, it will then attempt to issue another write to finish the DIO request. If that write then fails (as is often the case in an -ENOSPC situation), then we still may need to issue a COMMIT if the earlier short write was unstable. If that COMMIT then succeeds, then we don't want the client to reschedule the write requests, and to instead just return a short write. Otherwise, we can end up looping over the same DIO write forever.
Always consult dreq->error after a successful RPC, even when the flag state is not NFS_ODIRECT_DONE.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2028370 Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
#
8efc4bbe |
| 22-Jul-2022 |
Jeff Layton <jlayton@kernel.org> |
nfs: add new nfs_direct_req tracepoint events
Add some new tracepoints to the DIO write code.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammers
nfs: add new nfs_direct_req tracepoint events
Add some new tracepoints to the DIO write code.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v5.15.41, v5.15.40, v5.15.39 |
|
#
eb79f3af |
| 09-May-2022 |
NeilBrown <neilb@suse.de> |
nfs: rename nfs_direct_IO and use as ->swap_rw
The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, ret
nfs: rename nfs_direct_IO and use as ->swap_rw
The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count.
So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function.
Link: https://lkml.kernel.org/r/165119301493.15698.7491285551903597618.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (on Renesas RSK+RZA1 with 32 MiB of SDRAM) Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27 |
|
#
c265de25 |
| 06-Mar-2022 |
NeilBrown <neilb@suse.de> |
NFS: swap-out must always use STABLE writes.
The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely w
NFS: swap-out must always use STABLE writes.
The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads.
swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change.
For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
#
64158668 |
| 06-Mar-2022 |
NeilBrown <neilb@suse.de> |
NFS: swap IO handling is slightly different for O_DIRECT IO
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I b
NFS: swap IO handling is slightly different for O_DIRECT IO
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw().
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
a6b5a28e |
| 14-Nov-2020 |
Dave Wysochanski <dwysocha@redhat.com> |
nfs: Convert to new fscache volume/cookie API
Change the nfs filesystem to support fscache's indexing rewrite and reenable caching in nfs.
The following changes have been made:
(1) The fscache_ne
nfs: Convert to new fscache volume/cookie API
Change the nfs filesystem to support fscache's indexing rewrite and reenable caching in nfs.
The following changes have been made:
(1) The fscache_netfs struct is no more, and there's no need to register the filesystem as a whole.
(2) The session cookie is now an fscache_volume cookie, allocated with fscache_acquire_volume(). That takes three parameters: a string representing the "volume" in the index, a string naming the cache to use (or NULL) and a u64 that conveys coherency metadata for the volume.
For nfs, I've made it render the volume name string as:
"nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"
(3) The fscache_cookie_def is no more and needed information is passed directly to fscache_acquire_cookie(). The cache no longer calls back into the filesystem, but rather metadata changes are indicated at other times.
fscache_acquire_cookie() is passed the same keying and coherency information as before.
(4) fscache_enable/disable_cookie() have been removed.
Call fscache_use_cookie() and fscache_unuse_cookie() when a file is opened or closed to prevent a cache file from being culled and to keep resources to hand that are needed to do I/O.
If a file is opened for writing, we invalidate it with FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache, thereby making it cease caching until all currently open files are closed. This should give the same behaviour as the uptream code. Making the cache store local modifications isn't straightforward for NFS, so that's left for future patches.
(5) fscache_invalidate() now needs to be given uptodate auxiliary data and a file size. It also takes a flag to indicate if this was due to a DIO write.
(6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file to which a DIO write is made.
(7) Call fscache_note_page_release() from nfs_release_page().
(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for PG_fscache to be cleared.
(9) The functions to read and write data to/from the cache are stubbed out pending a conversion to use netfslib.
Changes ======= ver #3: - Added missing =n fallback for nfs_fscache_release_file()[1][2].
ver #2: - Use gfpflags_allow_blocking() rather than using flag directly. - fscache_acquire_volume() now returns errors. - Remove NFS_INO_FSCACHE as it's no longer used. - Need to unuse a cookie on file-release, not inode-clear.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna.schumaker@netapp.com> cc: linux-nfs@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/202112100804.nksO8K4u-lkp@intel.com/ [1] Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2] Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4
show more ...
|
#
6b19b766 |
| 21-Oct-2021 |
Jens Axboe <axboe@kernel.dk> |
fs: get rid of the res2 iocb->ki_complete argument
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up
fs: get rid of the res2 iocb->ki_complete argument
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value.
Now that everybody is passing in zero, kill off the extra argument.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
133a48ab |
| 04-Oct-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up commit deadlocks
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end().
Fixes: 723c921e7dfc
NFS: Fix up commit deadlocks
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end().
Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
#
6bb22702 |
| 06-Mar-2022 |
NeilBrown <neilb@suse.de> |
NFS: swap-out must always use STABLE writes.
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]
The commit handling code is not safe against memory-pressure deadlocks when writing to swap
NFS: swap-out must always use STABLE writes.
[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]
The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads.
swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change.
For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
24d28d9b |
| 06-Mar-2022 |
NeilBrown <neilb@suse.de> |
NFS: swap IO handling is slightly different for O_DIRECT IO
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding pos
NFS: swap IO handling is slightly different for O_DIRECT IO
[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]
1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted.
We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw()
2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw().
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
9443fcc2 |
| 04-Oct-2021 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up commit deadlocks
[ Upstream commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 ]
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensu
NFS: Fix up commit deadlocks
[ Upstream commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 ]
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end().
Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|