Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4 |
|
#
b79e3569 |
| 28-Nov-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Fix checksum mismatches in the duplicate reply cache
[ Upstream commit bf51c52a1f3c238d72c64e14d5e7702d3a245b82 ]
nfsd_cache_csum() currently assumes that the server's RPC layer has been adva
NFSD: Fix checksum mismatches in the duplicate reply cache
[ Upstream commit bf51c52a1f3c238d72c64e14d5e7702d3a245b82 ]
nfsd_cache_csum() currently assumes that the server's RPC layer has been advancing rq_arg.head[0].iov_base as it decodes an incoming request, because that's the way it used to work. On entry, it expects that buf->head[0].iov_base points to the start of the NFS header, and excludes the already-decoded RPC header.
These days however, head[0].iov_base now points to the start of the RPC header during all processing. It no longer points at the NFS Call header when execution arrives at nfsd_cache_csum().
In a retransmitted RPC the XID and the NFS header are supposed to be the same as the original message, but the contents of the retransmitted RPC header can be different. For example, for krb5, the GSS sequence number will be different between the two. Thus if the RPC header is always included in the DRC checksum computation, the checksum of the retransmitted message might not match the checksum of the original message, even though the NFS part of these messages is identical.
The result is that, even if a matching XID is found in the DRC, the checksum mismatch causes the server to execute the retransmitted RPC transaction again.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.3, v6.6.2 |
|
#
68eb0889 |
| 10-Nov-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Update nfsd_cache_append() to use xdr_stream
commit 49cecd8628a9855cd993792a0377559ea32d5e7c upstream.
When inserting a DRC-cached response into the reply buffer, ensure that the reply buffer
NFSD: Update nfsd_cache_append() to use xdr_stream
commit 49cecd8628a9855cd993792a0377559ea32d5e7c upstream.
When inserting a DRC-cached response into the reply buffer, ensure that the reply buffer's xdr_stream is updated properly. Otherwise the server will send a garbage response.
Cc: stable@vger.kernel.org # v6.3+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
e7421ce7 |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Rename struct svc_cacherep
The svc_ prefix is identified with the SunRPC layer. Although the duplicate reply cache caches RPC replies, it is only for the NFS protocol. Rename the struct to bet
NFSD: Rename struct svc_cacherep
The svc_ prefix is identified with the SunRPC layer. Although the duplicate reply cache caches RPC replies, it is only for the NFS protocol. Rename the struct to better reflect its purpose.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
cb18eca4 |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Remove svc_rqst::rq_cacherep
Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations.
Reviewed-by: Jeff
NFSD: Remove svc_rqst::rq_cacherep
Over time I'd like to see NFS-specific fields moved out of struct svc_rqst, which is an RPC layer object. These fields are layering violations.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
c135e126 |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Refactor the duplicate reply cache shrinker
Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to red
NFSD: Refactor the duplicate reply cache shrinker
Avoid holding the bucket lock while freeing cache entries. This change also caps the number of entries that are freed when the shrinker calls to reduce the shrinker's impact on the cache's effectiveness.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
a9507f6a |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Replace nfsd_prune_bucket()
Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache.
A few percpu op
NFSD: Replace nfsd_prune_bucket()
Enable nfsd_prune_bucket() to drop the bucket lock while calling kfree(). Use the same pattern that Jeff recently introduced in the NFSD filecache.
A few percpu operations are moved outside the lock since they temporarily disable local IRQs which is expensive and does not need to be done while the lock is held.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
ff0d1693 |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Rename nfsd_reply_cache_alloc()
For readability, rename to match the other helpers.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
35308e7f |
| 09-Jul-2023 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Refactor nfsd_reply_cache_free_locked()
To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_lock
NFSD: Refactor nfsd_reply_cache_free_locked()
To reduce contention on the bucket locks, we must avoid calling kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_locked() into a helper that removes an entry from the bucket (and must therefore run under the lock) and a second helper that frees the entry (which does not need to hold the lock).
For readability, rename the helpers nfsd_cacherep_<verb>.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35 |
|
#
ed9ab734 |
| 16-Jun-2023 |
Jeff Layton <jlayton@kernel.org> |
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Commit f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache
nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
Commit f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache into nfsd startup, but didn't account for the stats counters, which can be accessed before nfsd is ever started. The result can be a NULL pointer dereference when someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still shut down.
This is a regression and a user-triggerable oops in the right situation:
- non-x86_64 arch - /proc/fs/nfsd is mounted in the namespace - nfsd is not started in the namespace - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats"
Although this is easy to trigger on some arches (like aarch64), on x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the fixed_percpu_data. That struct looks just enough like a newly initialized percpu var to allow nfsd_reply_cache_stats_show to access it without Oopsing.
Move the initialization of the per-net+per-cpu reply-cache counters back into nfsd_init_net, while leaving the rest of the reply cache allocations to be done at nfsd startup time.
Kudos to Eirik who did most of the legwork to track this down.
Cc: stable@vger.kernel.org # v6.3+ Fixes: f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") Reported-and-tested-by: Eirik Fuller <efuller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19 |
|
#
cee4db19 |
| 08-Jan-2023 |
Chuck Lever <chuck.lever@oracle.com> |
SUNRPC: Refactor RPC server dispatch method
Currently, svcauth_gss_accept() pre-reserves response buffer space for the RPC payload length and GSS sequence number before returning to the dispatcher,
SUNRPC: Refactor RPC server dispatch method
Currently, svcauth_gss_accept() pre-reserves response buffer space for the RPC payload length and GSS sequence number before returning to the dispatcher, which then adds the header's accept_stat field.
The problem is the accept_stat field is supposed to go before the length and seq_num fields. So svcauth_gss_release() has to relocate the accept_stat value (see svcauth_gss_prepare_to_wrap()).
To enable these fields to be added to the response buffer in the correct (final) order, the pointer to the accept_stat has to be made available to svcauth_gss_accept() so that it can set it before reserving space for the length and seq_num fields.
As a first step, move the pointer to the location of the accept_stat field into struct svc_rqst.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
8dd41d70 |
| 08-Jan-2023 |
Chuck Lever <chuck.lever@oracle.com> |
SUNRPC: Push svcxdr_init_encode() into svc_process_common()
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it is common code and can be pushed down into the generic RPC server.
Rev
SUNRPC: Push svcxdr_init_encode() into svc_process_common()
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70 |
|
#
64776611 |
| 22-Sep-2022 |
ChenXiaoSong <chenxiaosong2@huawei.com> |
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
nfsd_net is converted from seq_file->file instead of seq_file->pri
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
e33c267a |
| 31-May-2022 |
Roman Gushchin <roman.gushchin@linux.dev> |
mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g.
mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name.
In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided.
The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.44, v5.15.43, v5.15.42 |
|
#
fd5e363e |
| 23-May-2022 |
Julian Schroeder <jumaco@amazon.com> |
nfsd: destroy percpu stats counters after reply cache shutdown
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counte
nfsd: destroy percpu stats counters after reply cache shutdown
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed.
Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9 |
|
#
add1511c |
| 28-Sep-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Streamline the rare "found" case
Move a rarely called function call site out of the hot path.
This is an exceptionally small improvement because the compiler inlines most of the functions tha
NFSD: Streamline the rare "found" case
Move a rarely called function call site out of the hot path.
This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
0f29ce32 |
| 28-Sep-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Skip extra computation for RC_NOCACHE case
Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE.
Si
NFSD: Skip extra computation for RC_NOCACHE case
Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
378a6109 |
| 30-Sep-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: De-duplicate hash bucket indexing
Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update().
Signed-off-by: Chuck Lever <
NFSD: De-duplicate hash bucket indexing
Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
7578b2f6 |
| 30-Sep-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Remove be32_to_cpu() from DRC hash function
Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for global spinlock elimination"), billed as a clean-up, added be32_to_cpu() to the DR
NFSD: Remove be32_to_cpu() from DRC hash function
Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for global spinlock elimination"), billed as a clean-up, added be32_to_cpu() to the DRC hash function without explanation. That commit removed two comments that state that byte-swapping in the hash function is unnecessary without explaining whether there was a need for that change.
On some Intel CPUs, the swab32 instruction is known to cause a CPU pipeline stall. be32_to_cpu() does not add extra randomness, since the hash multiplication is done /before/ shifting to the high-order bits of the result.
As a micro-optimization, remove the unnecessary transform from the DRC hash function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.14.8, v5.14.7 |
|
#
8847ecc9 |
| 20-Sep-2021 |
Chuck Lever <chuck.lever@oracle.com> |
NFSD: Optimize DRC bucket pruning
DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting).
I added a trace_pri
NFSD: Optimize DRC bucket pruning
DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting).
I added a trace_printk() in prune_bucket() to see just how long it takes to prune. Here are two ends of the spectrum:
prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining ... prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
Pruning latency is noticeable on fast transports with fast storage. By noticeable, I mean that the latency measured here in the worst case is the same order of magnitude as the round trip time for cached server operations.
We could do something like moving expired entries to an expired list and then free them later instead of freeing them right in prune_bucket(). But simply limiting the number of entries that can be pruned by a lookup is simple and retains more entries in the cache, making the DRC somewhat more effective.
Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
Before:
write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
WRITE: 1848726 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds)
After:
write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
WRITE: 1891144 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds)
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
075564ed |
| 23-May-2022 |
Julian Schroeder <jumaco@amazon.com> |
nfsd: destroy percpu stats counters after reply cache shutdown
[ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ]
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is track
nfsd: destroy percpu stats counters after reply cache shutdown
[ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ]
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed.
Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
075564ed |
| 23-May-2022 |
Julian Schroeder <jumaco@amazon.com> |
nfsd: destroy percpu stats counters after reply cache shutdown
[ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ]
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is track
nfsd: destroy percpu stats counters after reply cache shutdown
[ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ]
Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed.
Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder <jumaco@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
e567b98c |
| 06-Jan-2021 |
Amir Goldstein <amir73il@gmail.com> |
nfsd: protect concurrent access to nfsd stats counters
nfsd stats counters can be updated by concurrent nfsd threads without any protection.
Convert some nfsd_stats and nfsd_net struct members to u
nfsd: protect concurrent access to nfsd stats counters
nfsd stats counters can be updated by concurrent nfsd threads without any protection.
Convert some nfsd_stats and nfsd_net struct members to use percpu counters.
The longest_chain* members of struct nfsd_net remain unprotected.
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10 |
|
#
8c38b705 |
| 14-Sep-2020 |
Rik van Riel <riel@surriel.com> |
silence nfscache allocation warnings with kvzalloc
silence nfscache allocation warnings with kvzalloc
Currently nfsd_reply_cache_init attempts hash table allocation through kmalloc, and manually fa
silence nfscache allocation warnings with kvzalloc
silence nfscache allocation warnings with kvzalloc
Currently nfsd_reply_cache_init attempts hash table allocation through kmalloc, and manually falls back to vzalloc if that fails. This makes the code a little larger than needed, and creates a significant amount of serial console spam if you have enough systems.
Switching to kvzalloc gets rid of the allocation warnings, and makes the code a little cleaner too as a side effect.
Freeing of nn->drc_hashtbl is already done using kvfree currently.
Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1 |
|
#
c25bf185 |
| 03-Jun-2020 |
J. Bruce Fields <bfields@redhat.com> |
nfsd: safer handling of corrupted c_type
This can only happen if there's a bug somewhere, so let's make it a WARN not a printk. Also, I think it's safest to ignore the corruption rather than trying
nfsd: safer handling of corrupted c_type
This can only happen if there's a bug somewhere, so let's make it a WARN not a printk. Also, I think it's safest to ignore the corruption rather than trying to fix it by removing a cache entry.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v5.4.44 |
|
#
027690c7 |
| 01-Jun-2020 |
J. Bruce Fields <bfields@redhat.com> |
nfsd4: make drc_slab global, not per-net
I made every global per-network-namespace instead. But perhaps doing that to this slab was a step too far.
The kmem_cache_create call in our net init metho
nfsd4: make drc_slab global, not per-net
I made every global per-network-namespace instead. But perhaps doing that to this slab was a step too far.
The kmem_cache_create call in our net init method also seems to be responsible for this lockdep warning:
[ 45.163710] Unable to find swap-space signature [ 45.375718] trinity-c1 (855): attempted to duplicate a private mapping with mremap. This is not supported. [ 46.055744] futex_wake_op: trinity-c1 tries to shift op by -209; fix this program [ 51.011723] [ 51.013378] ====================================================== [ 51.013875] WARNING: possible circular locking dependency detected [ 51.014378] 5.2.0-rc2 #1 Not tainted [ 51.014672] ------------------------------------------------------ [ 51.015182] trinity-c2/886 is trying to acquire lock: [ 51.015593] 000000005405f099 (slab_mutex){+.+.}, at: slab_attr_store+0xa2/0x130 [ 51.016190] [ 51.016190] but task is already holding lock: [ 51.016652] 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500 [ 51.017266] [ 51.017266] which lock already depends on the new lock. [ 51.017266] [ 51.017909] [ 51.017909] the existing dependency chain (in reverse order) is: [ 51.018497] [ 51.018497] -> #1 (kn->count#43){++++}: [ 51.018956] __lock_acquire+0x7cf/0x1a20 [ 51.019317] lock_acquire+0x17d/0x390 [ 51.019658] __kernfs_remove+0x892/0xae0 [ 51.020020] kernfs_remove_by_name_ns+0x78/0x110 [ 51.020435] sysfs_remove_link+0x55/0xb0 [ 51.020832] sysfs_slab_add+0xc1/0x3e0 [ 51.021332] __kmem_cache_create+0x155/0x200 [ 51.021720] create_cache+0xf5/0x320 [ 51.022054] kmem_cache_create_usercopy+0x179/0x320 [ 51.022486] kmem_cache_create+0x1a/0x30 [ 51.022867] nfsd_reply_cache_init+0x278/0x560 [ 51.023266] nfsd_init_net+0x20f/0x5e0 [ 51.023623] ops_init+0xcb/0x4b0 [ 51.023928] setup_net+0x2fe/0x670 [ 51.024315] copy_net_ns+0x30a/0x3f0 [ 51.024653] create_new_namespaces+0x3c5/0x820 [ 51.025257] unshare_nsproxy_namespaces+0xd1/0x240 [ 51.025881] ksys_unshare+0x506/0x9c0 [ 51.026381] __x64_sys_unshare+0x3a/0x50 [ 51.026937] do_syscall_64+0x110/0x10b0 [ 51.027509] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.028175] [ 51.028175] -> #0 (slab_mutex){+.+.}: [ 51.028817] validate_chain+0x1c51/0x2cc0 [ 51.029422] __lock_acquire+0x7cf/0x1a20 [ 51.029947] lock_acquire+0x17d/0x390 [ 51.030438] __mutex_lock+0x100/0xfa0 [ 51.030995] mutex_lock_nested+0x27/0x30 [ 51.031516] slab_attr_store+0xa2/0x130 [ 51.032020] sysfs_kf_write+0x11d/0x180 [ 51.032529] kernfs_fop_write+0x32a/0x500 [ 51.033056] do_loop_readv_writev+0x21d/0x310 [ 51.033627] do_iter_write+0x2e5/0x380 [ 51.034148] vfs_writev+0x170/0x310 [ 51.034616] do_pwritev+0x13e/0x160 [ 51.035100] __x64_sys_pwritev+0xa3/0x110 [ 51.035633] do_syscall_64+0x110/0x10b0 [ 51.036200] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 51.036924] [ 51.036924] other info that might help us debug this: [ 51.036924] [ 51.037876] Possible unsafe locking scenario: [ 51.037876] [ 51.038556] CPU0 CPU1 [ 51.039130] ---- ---- [ 51.039676] lock(kn->count#43); [ 51.040084] lock(slab_mutex); [ 51.040597] lock(kn->count#43); [ 51.041062] lock(slab_mutex); [ 51.041320] [ 51.041320] *** DEADLOCK *** [ 51.041320] [ 51.041793] 3 locks held by trinity-c2/886: [ 51.042128] #0: 000000001f55e152 (sb_writers#5){.+.+}, at: vfs_writev+0x2b9/0x310 [ 51.042739] #1: 00000000c7d6c034 (&of->mutex){+.+.}, at: kernfs_fop_write+0x25b/0x500 [ 51.043400] #2: 00000000ac662005 (kn->count#43){++++}, at: kernfs_fop_write+0x286/0x500
Reported-by: kernel test robot <lkp@intel.com> Fixes: 3ba75830ce17 "drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|