History log of /openbmc/linux/fs/nfs/direct.c (Results 1 – 25 of 554)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.25, v6.6.24, v6.6.23
# e25447c3 01-Mar-2024 Josef Bacik <josef@toxicpanda.com>

nfs: fix UAF in direct writes

[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]

In production we have been hitting the following warning consistently

------------[ cut here ]-----------

nfs: fix UAF in direct writes

[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]

In production we have been hitting the following warning consistently

------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0
Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
RIP: 0010:refcount_warn_saturate+0x9c/0xe0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x9f/0x130
? refcount_warn_saturate+0x9c/0xe0
? report_bug+0xcc/0x150
? handle_bug+0x3d/0x70
? exc_invalid_op+0x16/0x40
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0x9c/0xe0
nfs_direct_write_schedule_work+0x237/0x250 [nfs]
process_one_work+0x12f/0x4a0
worker_thread+0x14e/0x3b0
? ZSTD_getCParams_internal+0x220/0x220
kthread+0xdc/0x120
? __btf_name_valid+0xa0/0xa0
ret_from_fork+0x1f/0x30

This is because we're completing the nfs_direct_request twice in a row.

The source of this is when we have our commit requests to submit, we
process them and send them off, and then in the completion path for the
commit requests we have

if (nfs_commit_end(cinfo.mds))
nfs_direct_write_complete(dreq);

However since we're submitting asynchronous requests we sometimes have
one that completes before we submit the next one, so we end up calling
complete on the nfs_direct_request twice.

The only other place we use nfs_generic_commit_list() is in
__nfs_commit_inode, which wraps this call in a

nfs_commit_begin();
nfs_commit_end();

Which is a common pattern for this style of completion handling, one
that is also repeated in the direct code with get_dreq()/put_dreq()
calls around where we process events as well as in the completion paths.

Fix this by using the same pattern for the commit requests.

Before with my 200 node rocksdb stress running this warning would pop
every 10ish minutes. With my patch the stress test has been running for
several hours without popping.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.25, v6.6.24, v6.6.23
# e25447c3 01-Mar-2024 Josef Bacik <josef@toxicpanda.com>

nfs: fix UAF in direct writes

[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]

In production we have been hitting the following warning consistently

------------[ cut here ]-----------

nfs: fix UAF in direct writes

[ Upstream commit 17f46b803d4f23c66cacce81db35fef3adb8f2af ]

In production we have been hitting the following warning consistently

------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0
Workqueue: nfsiod nfs_direct_write_schedule_work [nfs]
RIP: 0010:refcount_warn_saturate+0x9c/0xe0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x9f/0x130
? refcount_warn_saturate+0x9c/0xe0
? report_bug+0xcc/0x150
? handle_bug+0x3d/0x70
? exc_invalid_op+0x16/0x40
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0x9c/0xe0
nfs_direct_write_schedule_work+0x237/0x250 [nfs]
process_one_work+0x12f/0x4a0
worker_thread+0x14e/0x3b0
? ZSTD_getCParams_internal+0x220/0x220
kthread+0xdc/0x120
? __btf_name_valid+0xa0/0xa0
ret_from_fork+0x1f/0x30

This is because we're completing the nfs_direct_request twice in a row.

The source of this is when we have our commit requests to submit, we
process them and send them off, and then in the completion path for the
commit requests we have

if (nfs_commit_end(cinfo.mds))
nfs_direct_write_complete(dreq);

However since we're submitting asynchronous requests we sometimes have
one that completes before we submit the next one, so we end up calling
complete on the nfs_direct_request twice.

The only other place we use nfs_generic_commit_list() is in
__nfs_commit_inode, which wraps this call in a

nfs_commit_begin();
nfs_commit_end();

Which is a common pattern for this style of completion handling, one
that is also repeated in the direct code with get_dreq()/put_dreq()
calls around where we process events as well as in the completion paths.

Fix this by using the same pattern for the commit requests.

Before with my 200 node rocksdb stress running this warning would pop
every 10ish minutes. With my patch the stress test has been running for
several hours without popping.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2
# 75aa038d 17-Nov-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

pNFS: Fix the pnfs block driver's calculation of layoutget size

[ Upstream commit 8a6291bf3b0eae1bf26621e6419a91682f2d6227 ]

Instead of relying on the value of the 'bytes_left' field, we should
cal

pNFS: Fix the pnfs block driver's calculation of layoutget size

[ Upstream commit 8a6291bf3b0eae1bf26621e6419a91682f2d6227 ]

Instead of relying on the value of the 'bytes_left' field, we should
calculate the layout size based on the offset of the request that is
being written out.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2
# b11243f7 04-Sep-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: More fixes for nfs_direct_write_reschedule_io()

Ensure that all requests are put back onto the commit list so that they
can be rescheduled.

Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_resc

NFS: More fixes for nfs_direct_write_reschedule_io()

Ensure that all requests are put back onto the commit list so that they
can be rescheduled.

Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_reschedule_io()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


# b193a78d 04-Sep-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Use the correct commit info in nfs_join_page_group()

Ensure that nfs_clear_request_commit() updates the correct counters when
it removes them from the commit list.

Fixes: ed5d588fe47f ("NFS: T

NFS: Use the correct commit info in nfs_join_page_group()

Ensure that nfs_clear_request_commit() updates the correct counters when
it removes them from the commit list.

Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


# 8982f7af 04-Sep-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: More O_DIRECT accounting fixes for error paths

If we hit a fatal error when retransmitting, we do need to record the
removal of the request from the count of written bytes.

Fixes: 031d73ed768a

NFS: More O_DIRECT accounting fixes for error paths

If we hit a fatal error when retransmitting, we do need to record the
removal of the request from the count of written bytes.

Fixes: 031d73ed768a ("NFS: Fix O_DIRECT accounting of number of bytes read/written")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


# 7c633932 04-Sep-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix O_DIRECT locking issues

The dreq fields are protected by the dreq->lock.

Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling")
Signed-off-by: Trond Myklebust <trond

NFS: Fix O_DIRECT locking issues

The dreq fields are protected by the dreq->lock.

Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


# 954998b6 04-Sep-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix error handling for O_DIRECT write scheduling

If we fail to schedule a request for transmission, there are 2
possibilities:
1) Either we hit a fatal error, and we just want to drop the remai

NFS: Fix error handling for O_DIRECT write scheduling

If we fail to schedule a request for transmission, there are 2
possibilities:
1) Either we hit a fatal error, and we just want to drop the remaining
requests on the floor.
2) We were asked to try again, in which case we should allow the
outstanding RPC calls to complete, so that we can recoalesce requests
and try again.

Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


Revision tags: v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48
# 88975a55 19-Aug-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix a potential data corruption

We must ensure that the subrequests are joined back into the head before
we can retransmit a request. If the head was not on the commit lists,
because the server

NFS: Fix a potential data corruption

We must ensure that the subrequests are joined back into the head before
we can retransmit a request. If the head was not on the commit lists,
because the server wrote it synchronously, we still need to add it back
to the retransmission list.
Add a call that mirrors the effect of nfs_cancel_remove_inode() for
O_DIRECT.

Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


Revision tags: v6.1.46, v6.1.45
# be2fd156 08-Aug-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix a use after free in nfs_direct_join_group()

Be more careful when tearing down the subrequests of an O_DIRECT write
as part of a retransmission.

Reported-by: Chris Mason <clm@fb.com>
Fixes:

NFS: Fix a use after free in nfs_direct_join_group()

Be more careful when tearing down the subrequests of an O_DIRECT write
as part of a retransmission.

Reported-by: Chris Mason <clm@fb.com>
Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


Revision tags: v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8
# 70e9db69 19-Jan-2023 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Clean up O_DIRECT request allocation

Rather than adjusting the index+offset after the call to
nfs_create_request(), add a function nfs_page_create_from_page() that
takes an offset.

Signed-off-

NFS: Clean up O_DIRECT request allocation

Rather than adjusting the index+offset after the call to
nfs_create_request(), add a function nfs_page_create_from_page() that
takes an offset.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

show more ...


Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47
# 1ef255e2 09-Jun-2022 Al Viro <viro@zeniv.linux.org.uk>

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrapp

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


Revision tags: v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18
# fcb14cb1 22-May-2022 Al Viro <viro@zeniv.linux.org.uk>

new iov_iter flavour - ITER_UBUF

Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose t

new iov_iter flavour - ITER_UBUF

Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


# 69d96651 22-Jul-2022 Jeff Layton <jlayton@kernel.org>

nfs: only issue commit in DIO codepath if we have uncommitted data

Currently, we try to determine whether to issue a commit based on
nfs_write_need_commit which looks at the current verifier. In the

nfs: only issue commit in DIO codepath if we have uncommitted data

Currently, we try to determine whether to issue a commit based on
nfs_write_need_commit which looks at the current verifier. In the case
where we got a short write and then tried to follow it up with one that
failed, the verifier can't be trusted.

What we really want to know is whether the pgio request had any
successful writes that came back as UNSTABLE. Add a new flag to the pgio
request, and use that to indicate that we've had a successful unstable
write. Only issue a commit if that flag is set.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


# 55051c0c 22-Jul-2022 Jeff Layton <jlayton@kernel.org>

nfs: always check dreq->error after a commit

When the client gets back a short DIO write, it will then attempt to
issue another write to finish the DIO request. If that write then fails
(as is often

nfs: always check dreq->error after a commit

When the client gets back a short DIO write, it will then attempt to
issue another write to finish the DIO request. If that write then fails
(as is often the case in an -ENOSPC situation), then we still may need
to issue a COMMIT if the earlier short write was unstable. If that COMMIT
then succeeds, then we don't want the client to reschedule the write
requests, and to instead just return a short write. Otherwise, we can
end up looping over the same DIO write forever.

Always consult dreq->error after a successful RPC, even when the flag
state is not NFS_ODIRECT_DONE.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2028370
Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


# 8efc4bbe 22-Jul-2022 Jeff Layton <jlayton@kernel.org>

nfs: add new nfs_direct_req tracepoint events

Add some new tracepoints to the DIO write code.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammers

nfs: add new nfs_direct_req tracepoint events

Add some new tracepoints to the DIO write code.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


Revision tags: v5.15.41, v5.15.40, v5.15.39
# eb79f3af 09-May-2022 NeilBrown <neilb@suse.de>

nfs: rename nfs_direct_IO and use as ->swap_rw

The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a
while. We now need a ->swap_rw function which behaves slightly
differently, ret

nfs: rename nfs_direct_IO and use as ->swap_rw

The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a
while. We now need a ->swap_rw function which behaves slightly
differently, returning zero for success rather than a byte count.

So modify nfs_direct_IO accordingly, rename it, and use it as the
->swap_rw function.

Link: https://lkml.kernel.org/r/165119301493.15698.7491285551903597618.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (on Renesas RSK+RZA1 with 32 MiB of SDRAM)
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


Revision tags: v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27
# c265de25 06-Mar-2022 NeilBrown <neilb@suse.de>

NFS: swap-out must always use STABLE writes.

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap. In particular, nfs_commitdata_alloc() blocks
indefinitely w

NFS: swap-out must always use STABLE writes.

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap. In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.

swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does. However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.

For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


# 64158668 06-Mar-2022 NeilBrown <neilb@suse.de>

NFS: swap IO handling is slightly different for O_DIRECT IO

1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I b

NFS: swap IO handling is slightly different for O_DIRECT IO

1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I believe,
eventuate if a buffered read on the swapfile was attempted.

We don't need coherence with the page cache for a swap file, and
buffered writes are forbidden anyway. There is no other need for
i_rwsem during direct IO. So never take it for swap_rw()

2/ generic_write_checks() explicitly forbids writes to swap, and
performs checks that are not needed for swap. So bypass it
for swap_rw().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


Revision tags: v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10
# a6b5a28e 14-Nov-2020 Dave Wysochanski <dwysocha@redhat.com>

nfs: Convert to new fscache volume/cookie API

Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.

The following changes have been made:

(1) The fscache_ne

nfs: Convert to new fscache volume/cookie API

Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For nfs, I've made it render the volume name string as:

"nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) fscache_enable/disable_cookie() have been removed.

Call fscache_use_cookie() and fscache_unuse_cookie() when a file is
opened or closed to prevent a cache file from being culled and to keep
resources to hand that are needed to do I/O.

If a file is opened for writing, we invalidate it with
FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache,
thereby making it cease caching until all currently open files are
closed. This should give the same behaviour as the uptream code.
Making the cache store local modifications isn't straightforward for
NFS, so that's left for future patches.

(5) fscache_invalidate() now needs to be given uptodate auxiliary data and
a file size. It also takes a flag to indicate if this was due to a
DIO write.

(6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file
to which a DIO write is made.

(7) Call fscache_note_page_release() from nfs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

(9) The functions to read and write data to/from the cache are stubbed out
pending a conversion to use netfslib.

Changes
=======
ver #3:
- Added missing =n fallback for nfs_fscache_release_file()[1][2].

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.
- Remove NFS_INO_FSCACHE as it's no longer used.
- Need to unuse a cookie on file-release, not inode-clear.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna.schumaker@netapp.com>
cc: linux-nfs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/202112100804.nksO8K4u-lkp@intel.com/ [1]
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2]
Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4

show more ...


# 6b19b766 21-Oct-2021 Jens Axboe <axboe@kernel.dk>

fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up

fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 133a48ab 04-Oct-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix up commit deadlocks

If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().

Fixes: 723c921e7dfc

NFS: Fix up commit deadlocks

If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().

Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

show more ...


# 6bb22702 06-Mar-2022 NeilBrown <neilb@suse.de>

NFS: swap-out must always use STABLE writes.

[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap

NFS: swap-out must always use STABLE writes.

[ Upstream commit c265de257f558a05c1859ee9e3fed04883b9ec0e ]

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap. In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.

swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does. However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.

For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 24d28d9b 06-Mar-2022 NeilBrown <neilb@suse.de>

NFS: swap IO handling is slightly different for O_DIRECT IO

[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]

1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
pos

NFS: swap IO handling is slightly different for O_DIRECT IO

[ Upstream commit 64158668ac8b31626a8ce48db4cad08496eb8340 ]

1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
possible deadlocks with "fs_reclaim". These deadlocks could, I believe,
eventuate if a buffered read on the swapfile was attempted.

We don't need coherence with the page cache for a swap file, and
buffered writes are forbidden anyway. There is no other need for
i_rwsem during direct IO. So never take it for swap_rw()

2/ generic_write_checks() explicitly forbids writes to swap, and
performs checks that are not needed for swap. So bypass it
for swap_rw().

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 9443fcc2 04-Oct-2021 Trond Myklebust <trond.myklebust@hammerspace.com>

NFS: Fix up commit deadlocks

[ Upstream commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 ]

If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensu

NFS: Fix up commit deadlocks

[ Upstream commit 133a48abf6ecc535d7eddc6da1c3e4c972445882 ]

If O_DIRECT bumps the commit_info rpcs_out field, then that could lead
to fsync() hangs. The fix is to ensure that O_DIRECT calls
nfs_commit_end().

Fixes: 723c921e7dfc ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


12345678910>>...23