#
efeda80d |
| 05-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFSv4: Fix revalidation of dentries with delegations
If a dentry was not initially looked up while we were holding a delegation, then we do still need to revalidate that it still holds the same name
NFSv4: Fix revalidation of dentries with delegations
If a dentry was not initially looked up while we were holding a delegation, then we do still need to revalidate that it still holds the same name. If there are multiple hard links to the same file, then all the hard links need validation.
Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> [Anna: Put nfs_unset_verifier_delegated() under CONFIG_NFS_V4] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
a1147b82 |
| 05-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Fix up directory verifier races
In order to avoid having our dentry revalidation race with an update of the directory on the server, we need to store the verifier before the RPC calls to LOOKUP
NFS: Fix up directory verifier races
In order to avoid having our dentry revalidation race with an update of the directory on the server, we need to store the verifier before the RPC calls to LOOKUP and READDIR.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@gmail.com> Tested-by: Benjamin Coddington <bcodding@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
227823d2 |
| 22-Jan-2020 |
Dai Ngo <dai.ngo@oracle.com> |
nfs: optimise readdir cache page invalidation
When the directory is large and it's being modified by one client while another client is doing the 'ls -l' on the same directory then the cache page in
nfs: optimise readdir cache page invalidation
When the directory is large and it's being modified by one client while another client is doing the 'ls -l' on the same directory then the cache page invalidation from nfs_force_use_readdirplus causes the reading client to keep restarting READDIRPLUS from cookie 0 which causes the 'ls -l' to take a very long time to complete, possibly never completing.
Currently when nfs_force_use_readdirplus is called to switch from READDIR to READDIRPLUS, it invalidates all the cached pages of the directory. This cache page invalidation causes the next nfs_readdir to re-read the directory content from cookie 0.
This patch is to optimise the cache invalidation in nfs_force_use_readdirplus by only truncating the cached pages from last page index accessed to the end the file. It also marks the inode to delay invalidating all the cached page of the directory until the next initial nfs_readdir of the next 'ls' instance.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> [Anna - Fix conflicts with Trond's readdir patches] [Anna - Remove redundant call to nfs_zap_mapping()] [Anna - Replace d_inode(file_dentry(desc->file)) with file_inode(desc->file)] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
93a6ab7b |
| 02-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Switch readdir to using iterate_shared()
Now that the page cache locking is repaired, we should be able to switch to using iterate_shared() for improved concurrency when doing readdir().
Signe
NFS: Switch readdir to using iterate_shared()
Now that the page cache locking is repaired, we should be able to switch to using iterate_shared() for improved concurrency when doing readdir().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
3803d672 |
| 02-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
The directory strings stored in the readdir cache may be used with printk(), so it is better to ensure they are nul-terminated.
Signed-off-by: Tron
NFS: Use kmemdup_nul() in nfs_readdir_make_qstr()
The directory strings stored in the readdir cache may be used with printk(), so it is better to ensure they are nul-terminated.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
114de382 |
| 02-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Directory page cache pages need to be locked when read
When a NFS directory page cache page is removed from the page cache, its contents are freed through a call to nfs_readdir_clear_array(). T
NFS: Directory page cache pages need to be locked when read
When a NFS directory page cache page is removed from the page cache, its contents are freed through a call to nfs_readdir_clear_array(). To prevent the removal of the page cache entry until after we've finished reading it, we must take the page lock.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
4b310319 |
| 02-Feb-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Fix memory leaks and corruption in readdir
nfs_readdir_xdr_to_array() must not exit without having initialised the array, so that the page cache deletion routines can safely call nfs_readdir_cl
NFS: Fix memory leaks and corruption in readdir
nfs_readdir_xdr_to_array() must not exit without having initialised the array, so that the page cache deletion routines can safely call nfs_readdir_clear_array(). Furthermore, we should ensure that if we exit nfs_readdir_filler() with an error, we free up any page contents to prevent a leak if we try to fill the page again.
Fixes: 11de3b11e08c ("NFS: Fix a memory leak in nfs_readdir") Cc: stable@vger.kernel.org # v2.6.37+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
9a206de2 |
| 26-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
We do not need to have the rcu lookup method fail in the case where the fsuid/fsgid and supplemental groups match.
Signed-off-by: Trond Mykl
NFS: nfs_access_get_cached_rcu() should use cred_fscmp()
We do not need to have the rcu lookup method fail in the case where the fsuid/fsgid and supplemental groups match.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
f7b37b8b |
| 14-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Add softreval behaviour to nfs_lookup_revalidate()
If the server is unavaliable, we want to allow the revalidating lookup to time out, and to default to validating the cached dentry if the 'sof
NFS: Add softreval behaviour to nfs_lookup_revalidate()
If the server is unavaliable, we want to allow the revalidating lookup to time out, and to default to validating the cached dentry if the 'softreval' mount option is set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
5c965db8 |
| 06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Trust cached access if we've already revalidated the inode once
If we've already revalidated the inode once then don't distrust the access cache unless the NFS_INO_INVALID_ACCESS flag is actual
NFS: Trust cached access if we've already revalidated the inode once
If we've already revalidated the inode once then don't distrust the access cache unless the NFS_INO_INVALID_ACCESS flag is actually set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
e8194b7d |
| 06-Jan-2020 |
Trond Myklebust <trondmy@gmail.com> |
NFS: Improve tracing of permission calls
On exit from nfs_do_access(), record the mask representing the requested permissions, as well as the server-supplied set of access rights for this user.
Sig
NFS: Improve tracing of permission calls
On exit from nfs_do_access(), record the mask representing the requested permissions, as well as the server-supplied set of access rights for this user.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3 |
|
#
581057c8 |
| 13-Sep-2019 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: remove unused check for negative dentry
This check has been hanging out since we used to have parallel paths to add dentry in nfs_create(), but that hasn't been the case for some years.
Signed
NFS: remove unused check for negative dentry
This check has been hanging out since we used to have parallel paths to add dentry in nfs_create(), but that hasn't been the case for some years.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
406cd915 |
| 13-Sep-2019 |
Benjamin Coddington <bcodding@redhat.com> |
NFS: Refactor nfs_instantiate() for dentry referencing callers
Since commit b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for directory inode"), nfs_instantiate() may succeed without ac
NFS: Refactor nfs_instantiate() for dentry referencing callers
Since commit b0c6108ecf64 ("nfs_instantiate(): prevent multiple aliases for directory inode"), nfs_instantiate() may succeed without actually instantiating the dentry that was passed in. That can be problematic for some callers in NFSv3, so this patch breaks things up so we can get the actual dentry obtained.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9 |
|
#
9821421a |
| 09-Aug-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFSv4: Fix return value in nfs_finish_open()
If the file turns out to be of the wrong type after opening, we want to revalidate the path and retry, so return EOPENSTALE rather than ESTALE.
Signed-o
NFSv4: Fix return value in nfs_finish_open()
If the file turns out to be of the wrong type after opening, we want to revalidate the path and retry, so return EOPENSTALE rather than ESTALE.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1 |
|
#
db531db9 |
| 12-Jul-2019 |
Max Kellermann <mk@cm4all.com> |
Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.
That commit caused a severe memory leak in nfs_readdir_make_qstr().
Revert "NFS: readdirplus optimization by cache mechanism" (memleak)
This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.
That commit caused a severe memory leak in nfs_readdir_make_qstr().
When listing a directory with more than 100 files (this is how many struct nfs_cache_array_entry elements fit in one 4kB page), all allocated file name strings past those 100 leak.
The root of the leakage is that those string pointers are managed in pages which are never linked into the page cache.
fs/nfs/dir.c puts pages into the page cache by calling read_cache_page(); the callback function nfs_readdir_filler() will then fill the given page struct which was passed to it, which is already linked in the page cache (by do_read_cache_page() calling add_to_page_cache_lru()).
Commit be4c2d4723a4 added another (local) array of allocated pages, to be filled with more data, instead of discarding excess items received from the NFS server. Those additional pages can be used by the next nfs_readdir_filler() call (from within the same nfs_readdir() call).
The leak happens when some of those additional pages are never used (copied to the page cache using copy_highpage()). The pages will be freed by nfs_readdir_free_pages(), but their contents will not. The commit did not invoke nfs_readdir_clear_array() (and doing so would have been dangerous, because it did not track which of those pages were already copied to the page cache, risking double free bugs).
How to reproduce the leak:
- Use a kernel with CONFIG_SLUB_DEBUG_ON.
- Create a directory on a NFS mount with more than 100 files with names long enough to use the "kmalloc-32" slab (so we can easily look up the allocation counts):
for i in `seq 110`; do touch ${i}_0123456789abcdef; done
- Drop all caches:
echo 3 >/proc/sys/vm/drop_caches
- Check the allocation counter:
grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
- Request a directory listing and check the allocation counters again:
ls [...] grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
There are now 120 new allocations.
- Drop all caches and check the counters again:
echo 3 >/proc/sys/vm/drop_caches grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
110 allocations are gone, but 10 have leaked and will never be freed.
Unhelpfully, those allocations are explicitly excluded from KMEMLEAK, that's why my initial attempts with KMEMLEAK were not successful:
/* * Avoid a kmemleak false positive. The pointer to the name is stored * in a page cache page which kmemleak does not scan. */ kmemleak_not_leak(string->name);
It would be possible to solve this bug without reverting the whole commit:
- keep track of which pages were not used, and call nfs_readdir_clear_array() on them, or - manually link those pages into the page cache
But for now I have decided to just revert the commit, because the real fix would require complex considerations, risking more dangerous (crash) bugs, which may seem unsuitable for the stable branches.
Signed-off-by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5 |
|
#
1c341b77 |
| 22-May-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Add deferred cache invalidation for close-to-open consistency violations
If the client detects that close-to-open cache consistency has been violated, and that the file or directory has been ch
NFS: Add deferred cache invalidation for close-to-open consistency violations
If the client detects that close-to-open cache consistency has been violated, and that the file or directory has been changed on the server, then do a cache invalidation when we're done working with the file. The reason we don't do an immediate cache invalidation is that we want to avoid performance problems due to false positives. Also, note that we cannot guarantee cache consistency in this situation even if we do invalidate the cache.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v5.1.4 |
|
#
457c8996 |
| 19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was use
treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11 |
|
#
a46126cc |
| 01-May-2019 |
Christoph Hellwig <hch@lst.de> |
nfs: pass the correct prototype to read_cache_page
Fix the callbacks NFS passes to read_cache_page to actually have the proper type expected. Casting around function pointers can easily hide typing
nfs: pass the correct prototype to read_cache_page
Fix the callbacks NFS passes to read_cache_page to actually have the proper type expected. Casting around function pointers can easily hide typing bugs, and defeats control flow protection.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24 |
|
#
bf211ca1 |
| 15-Feb-2019 |
zhangliguang <zhangliguang@linux.alibaba.com> |
NFS: Fix typo in comments of nfs_readdir_alloc_pages()
This fixes the typo in comments of nfs_readdir_alloc_pages(). Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been renamed.
NFS: Fix typo in comments of nfs_readdir_alloc_pages()
This fixes the typo in comments of nfs_readdir_alloc_pages(). Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been renamed.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v4.19.23, v4.19.22, v4.19.21 |
|
#
42f72cf3 |
| 11-Feb-2019 |
zhangliguang <zhangliguang@linux.alibaba.com> |
NFS: Remove redundant semicolon
This removes redundant semicolon for ending code.
Fixes: c7944ebb9ce9 ("NFSv4: Fix lookup revalidate of regular files") Signed-off-by: Liguang Zhang <zhangliguang@li
NFS: Remove redundant semicolon
This removes redundant semicolon for ending code.
Fixes: c7944ebb9ce9 ("NFSv4: Fix lookup revalidate of regular files") Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v4.19.20, v4.19.19 |
|
#
be4c2d47 |
| 29-Jan-2019 |
luanshi <zhangliguang@linux.alibaba.com> |
NFS: readdirplus optimization by cache mechanism
When listing very large directories via NFS, clients may take a long time to complete. There are about three factors involved:
First of all, ls and
NFS: readdirplus optimization by cache mechanism
When listing very large directories via NFS, clients may take a long time to complete. There are about three factors involved:
First of all, ls and practically every other method of listing a directory including python os.listdir and find rely on libc readdir(). However readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory, it is going to take an insanely long time to read all the directory entries.
Secondly, libc readdir() reads 32K of directory entries at a time, in kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will be called for one page, which introduces many readdirplus rpc calls.
Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry) to fill one page (filled by dentry), we found that nearly one third of data was wasted.
To solve above problems, pagecache mechanism was introduced. One NFS readdirplus rpc will ask for a large data (more than 32k), the data can fill more than one page, the cached pages can be used for next readdir call. This can reduce many readdirplus rpc calls and improve readdirplus performance.
TESTING: When listing very large directories(include 300 thousand files) via NFS
time ls -l /nfs_mount | wc -l
without the patch: 300001 real 1m53.524s user 0m2.314s sys 0m2.599s
with the patch: 300001 real 0m23.487s user 0m2.305s sys 0m2.558s
Improved performance: 79.6% readdirplus rpc calls decrease: 85%
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
#
302fad7b |
| 18-Feb-2019 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFS: Fix up documentation warnings
Fix up some compiler warnings about function parameters, etc not being correctly described or formatted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspa
NFS: Fix up documentation warnings
Fix up some compiler warnings about function parameters, etc not being correctly described or formatted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|
Revision tags: v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7 |
|
#
684f39b4 |
| 02-Dec-2018 |
NeilBrown <neilb@suse.com> |
NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.
Use the common 'struct cred' to pass credentials for readdir.
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schum
NFS: struct nfs_open_dir_context: convert rpc_cred pointer to cred.
Use the common 'struct cred' to pass credentials for readdir.
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
#
b68572e0 |
| 02-Dec-2018 |
NeilBrown <neilb@suse.com> |
NFS: change access cache to use 'struct cred'.
Rather than keying the access cache with 'struct rpc_cred', use 'struct cred'. Then use cred_fscmp() to compare credentials rather than comparing the
NFS: change access cache to use 'struct cred'.
Rather than keying the access cache with 'struct rpc_cred', use 'struct cred'. Then use cred_fscmp() to compare credentials rather than comparing the raw pointer.
A benefit of this approach is that in the common case we avoid the rpc_lookup_cred_nonblock() call which can be slow when the cred cache is large. This also keeps many fewer items pinned in the rpc cred cache, so the cred cache is less likely to get large.
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
show more ...
|
Revision tags: v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11 |
|
#
c7944ebb |
| 28-Sep-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
NFSv4: Fix lookup revalidate of regular files
If we're revalidating an existing dentry in order to open a file, we need to ensure that we check the directory has not changed before we optimise away
NFSv4: Fix lookup revalidate of regular files
If we're revalidating an existing dentry in order to open a file, we need to ensure that we check the directory has not changed before we optimise away the lookup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
show more ...
|