#
7bf94c6b |
| 06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Update the boot verifier on stable writes too.
We don't know if the error returned by the fsync() call is exclusive to the data written by the stable write, so play it safe.
Signed-off-by: Tr
nfsd: Update the boot verifier on stable writes too.
We don't know if the error returned by the fsync() call is exclusive to the data written by the stable write, so play it safe.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
5011af4c |
| 06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Fix stable writes
Strictly speaking, a stable write error needs to reflect the write + the commit of that write (and only that write). To ensure that we don't pick up the write errors from oth
nfsd: Fix stable writes
Strictly speaking, a stable write error needs to reflect the write + the commit of that write (and only that write). To ensure that we don't pick up the write errors from other writebacks, add a rw_semaphore to provide exclusion.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
16f8f894 |
| 06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
Needed in order to fix stable writes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fiel
nfsd: Allow nfsd_vfs_write() to take the nfsd_file as an argument
Needed in order to fix stable writes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
384a7cca |
| 24-Dec-2019 |
zhengbin <zhengbin13@huawei.com> |
nfsd: use true,false for bool variable in vfs.c
Fixes coccicheck warning:
fs/nfsd/vfs.c:1389:5-13: WARNING: Assignment of 0/1 to bool variable fs/nfsd/vfs.c:1398:5-13: WARNING: Assignment of 0/1 to
nfsd: use true,false for bool variable in vfs.c
Fixes coccicheck warning:
fs/nfsd/vfs.c:1389:5-13: WARNING: Assignment of 0/1 to bool variable fs/nfsd/vfs.c:1398:5-13: WARNING: Assignment of 0/1 to bool variable fs/nfsd/vfs.c:1415:2-10: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
2a1aa489 |
| 03-Nov-2019 |
Arnd Bergmann <arnd@arndb.de> |
nfsd: pass a 64-bit guardtime to nfsd_setattr()
Guardtime handling in nfs3 differs between 32-bit and 64-bit architectures, and uses the deprecated time_t type.
Change it to using time64_t, which b
nfsd: pass a 64-bit guardtime to nfsd_setattr()
Guardtime handling in nfs3 differs between 32-bit and 64-bit architectures, and uses the deprecated time_t type.
Change it to using time64_t, which behaves the same way on 64-bit and 32-bit architectures, treating the number as an unsigned 32-bit entity with a range of year 1970 to 2106 consistently, and avoiding the y2038 overflow.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
57f64034 |
| 18-Dec-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Clone should commit src file metadata too
vfs_clone_file_range() can modify the metadata on the source file too, so we need to commit that to stable storage as well.
Reported-by: Dave Chinner
nfsd: Clone should commit src file metadata too
vfs_clone_file_range() can modify the metadata on the source file too, so we need to commit that to stable storage as well.
Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Acked-by: Dave Chinner <david@fromorbit.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
09a80f2a |
| 17-Dec-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Return the correct number of bytes written to the file
We must allow for the fact that iov_iter_write() could have returned a short write (e.g. if there was an ENOSPC issue).
Fixes: d890be159
nfsd: Return the correct number of bytes written to the file
We must allow for the fact that iov_iter_write() could have returned a short write (e.g. if there was an ENOSPC issue).
Fixes: d890be159a71 "nfsd: Add I/O trace points in the NFSv4 write path" Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
466e16f0 |
| 27-Nov-2019 |
NeilBrown <neilb@suse.de> |
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
vfs_rmdir and vfs_unlink can return -EBUSY if the target is a mountpoint. This currently gets passed to nfserrno() by nfsd_unlink(), and that results
nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
vfs_rmdir and vfs_unlink can return -EBUSY if the target is a mountpoint. This currently gets passed to nfserrno() by nfsd_unlink(), and that results in a WARNing, which is not user-friendly.
Possibly the best NFSv4 error is NFS4ERR_FILE_OPEN, because there is a sense in which the object is currently in use by some other task. The Linux NFSv4 client will map this back to EBUSY, which is an added benefit.
For NFSv3, the best we can do is probably NFS3ERR_ACCES, which isn't true, but is not less true than the other options.
Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
a25e3726 |
| 27-Nov-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Ensure CLONE persists data and metadata changes to the target file
The NFSv4.2 CLONE operation has implicit persistence requirements on the target file, since there is no protocol requirement
nfsd: Ensure CLONE persists data and metadata changes to the target file
The NFSv4.2 CLONE operation has implicit persistence requirements on the target file, since there is no protocol requirement that the client issue a separate operation to persist data. For that reason, we should call vfs_fsync_range() on the destination file after a successful call to vfs_clone_file_range().
Fixes: ffa0160a1039 ("nfsd: implement the NFSv4.2 CLONE operation") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
83a63072 |
| 26-Aug-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: fix nfs read eof detection
Currently, the knfsd server assumes that a short read indicates an end of file. That assumption is incorrect. The short read means that either we've hit the end of f
nfsd: fix nfs read eof detection
Currently, the knfsd server assumes that a short read indicates an end of file. That assumption is incorrect. The short read means that either we've hit the end of file, or we've hit a read error.
In the case of a read error, the client may want to retry (as per the implementation recommendations in RFC1813 and RFC7530), but currently it is being told that it hit an eof.
Move the code to detect eof from version specific code into the generic nfsd read.
Report eof only in the two following cases: 1) read() returns a zero length short read with no error. 2) the offset+length of the read is >= the file size.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
bbf2f098 |
| 02-Sep-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Reset the boot verifier on all write I/O errors
If multiple clients are writing to the same file, then due to the fact we share a single file descriptor between all NFSv3 clients writing to th
nfsd: Reset the boot verifier on all write I/O errors
If multiple clients are writing to the same file, then due to the fact we share a single file descriptor between all NFSv3 clients writing to the file, we have a situation where clients can miss the fact that their file data was not persisted. While this should be rare, it could cause silent data loss in situations where multiple clients are using NLM locking or O_DIRECT to write to the same file. Unfortunately, the stateless nature of NFSv3 and the fact that we can only identify clients by their IP address means that we cannot trivially cache errors; we would not know when it is safe to release them from the cache.
So the solution is to declare a reboot. We understand that this should be a rare occurrence, since disks are usually stable. The most frequent occurrence is likely to be ENOSPC, at which point all writes to the given filesystem are likely to fail anyway.
So the expectation is that clients will be forced to retry their writes until they hit the fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
7775ec57 |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
501cb184 |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: rip out the raparms cache
The raparms cache was set up in order to ensure that we carry readahead information forward from one RPC call to the next. In other words, it was set up because each
nfsd: rip out the raparms cache
The raparms cache was set up in order to ensure that we carry readahead information forward from one RPC call to the next. In other words, it was set up because each RPC call was forced to open a struct file, then close it, causing the loss of readahead information that is normally cached in that struct file, and used to keep the page cache filled when a user calls read() multiple times on the same file descriptor.
Now that we cache the struct file, and reuse it for all the I/O calls to a given file by a given user, we no longer have to keep a separate readahead cache.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
5920afa3 |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: hook nfsd_commit up to the nfsd_file cache
Use cached filps if possible instead of opening a new one every time.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond
nfsd: hook nfsd_commit up to the nfsd_file cache
Use cached filps if possible instead of opening a new one every time.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
48cd7b51 |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: hook up nfsd_read to the nfsd_file cache
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebus
nfsd: hook up nfsd_read to the nfsd_file cache
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
b4935239 |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: hook up nfsd_write to the new nfsd_file cache
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myk
nfsd: hook up nfsd_write to the new nfsd_file cache
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
65294c1f |
| 18-Aug-2019 |
Jeff Layton <jeff.layton@primarydata.com> |
nfsd: add a new struct file caching facility to nfsd
Currently, NFSv2/3 reads and writes have to open a file, do the read or write and then close it again for each RPC. This is highly inefficient, e
nfsd: add a new struct file caching facility to nfsd
Currently, NFSv2/3 reads and writes have to open a file, do the read or write and then close it again for each RPC. This is highly inefficient, especially when the underlying filesystem has a relatively slow open routine.
This patch adds a new open file cache to knfsd. Rather than doing an open for each RPC, the read/write handlers can call into this cache to see if there is one already there for the correct filehandle and NFS_MAY_READ/WRITE flags.
If there isn't an entry, then we create a new one and attempt to perform the open. If there is, then we wait until the entry is fully instantiated and return it if it is at the end of the wait. If it's not, then we attempt to take over construction.
Since the main goal is to speed up NFSv2/3 I/O, we don't want to close these files on last put of these objects. We need to keep them around for a little while since we never know when the next READ/WRITE will come in.
Cache entries have a hardcoded 1s timeout, and we have a recurring workqueue job that walks the cache and purges any entries that have expired.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Richard Sharpe <richard.sharpe@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
e977cc83 |
| 27-May-2019 |
Geert Uytterhoeven <geert+renesas@glider.be> |
nfsd: Spelling s/EACCESS/EACCES/
The correct spelling is EACCES:
include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */
Signed-off-by: Geert Uytterhoeven <geert+renesas@gl
nfsd: Spelling s/EACCESS/EACCES/
The correct spelling is EACCES:
include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
0ca0c9d7 |
| 12-Apr-2019 |
J. Bruce Fields <bfields@redhat.com> |
nfsd: fh_drop_write in nfsd_unlink
fh_want_write() can now be called twice, but I'm also fixing up the callers not to do that.
Other cases include setattr and create.
Signed-off-by: J. Bruce Field
nfsd: fh_drop_write in nfsd_unlink
fh_want_write() can now be called twice, but I'm also fixing up the callers not to do that.
Other cases include setattr and create.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
e3fdc89c |
| 21-Jan-2019 |
Trond Myklebust <trondmy@gmail.com> |
nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will currently clobber all errors returned by vfs_clone_file_range() and rep
nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will currently clobber all errors returned by vfs_clone_file_range() and replace them with EINVAL.
Fixes: 42ec3d4c0218 ("vfs: make remap_file_range functions take and...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
255fbca6 |
| 30-Nov-2018 |
zhengbin <zhengbin13@huawei.com> |
nfsd: Return EPERM, not EACCES, in some SETATTR cases
As the man(2) page for utime/utimes states, EPERM is returned when the second parameter of utime or utimes is not NULL, the caller's effective U
nfsd: Return EPERM, not EACCES, in some SETATTR cases
As the man(2) page for utime/utimes states, EPERM is returned when the second parameter of utime or utimes is not NULL, the caller's effective UID does not match the owner of the file, and the caller is not privileged.
However, in a NFS directory mounted from knfsd, it will return EACCES (from nfsd_setattr-> fh_verify->nfsd_permission). This patch fixes that.
Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
452ce659 |
| 29-Oct-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
vfs: plumb remap flags through the vfs clone functions
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it.
Signed-off-by: Darrick J.
vfs: plumb remap flags through the vfs clone functions
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
42ec3d4c |
| 29-Oct-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
vfs: make remap_file_range functions take and return bytes completed
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on.
vfs: make remap_file_range functions take and return bytes completed
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner.
A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative.
Neither clone ioctl can take advantage of this, alas.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
show more ...
|
#
0ac203cb |
| 02-Oct-2018 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
nfsd: fix fall-through annotations
Replace "fallthru" with a proper "fall through" annotation.
Also, add an annotation were it is expected to fall through.
These fixes are part of the ongoing effo
nfsd: fix fall-through annotations
Replace "fallthru" with a proper "fall through" annotation.
Also, add an annotation were it is expected to fall through.
These fixes are part of the ongoing efforts to enabling -Wimplicit-fallthrough
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
aa563d7b |
| 19-Oct-2018 |
David Howells <dhowells@redhat.com> |
iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most pla
iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|