Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
91f8ce28 |
| 12-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Convert "might sleep" comment into a code annotation
Try to catch incorrect calling contexts mechanically rather than by code review.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by
svcrdma: Convert "might sleep" comment into a code annotation
Try to catch incorrect calling contexts mechanically rather than by code review.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
5581cf8e |
| 12-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
SUNRPC: Optimize page release in svc_rdma_sendto()
Now that we have bulk page allocation and release APIs, it's more efficient to use those than it is for nfsd threads to wait for send completions.
SUNRPC: Optimize page release in svc_rdma_sendto()
Now that we have bulk page allocation and release APIs, it's more efficient to use those than it is for nfsd threads to wait for send completions. Previous patches have eliminated the calls to wait_for_completion() and complete(), in order to avoid scheduler overhead.
Now release pages-under-I/O in the send completion handler using the efficient bulk release API.
I've measured a 7% reduction in cumulative CPU utilization in svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In particular, using release_pages() instead of complete() cuts the time per svc_rdma_wc_send() call by two-thirds. This helps improve scalability because svc_rdma_wc_send() is single-threaded per connection.
Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
c4b50cdf |
| 12-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")
Get rid of the completion wait in svc_rdma_sendto(), and release pages in the send completion handler again. A subsequent patch
svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")
Get rid of the completion wait in svc_rdma_sendto(), and release pages in the send completion handler again. A subsequent patch will handle releasing those pages more efficiently.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
a944209c |
| 12-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly.
Reviewed-
SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
6be7afcd |
| 12-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing rq_res.head[0].iov_base")
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply
SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing rq_res.head[0].iov_base")
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v6.1.33 |
|
#
ed51b426 |
| 05-Jun-2023 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up allocation of svc_rdma_send_ctxt
The physical device's favored NUMA node ID is available when allocating a send_ctxt. Use that value instead of relying on the assumption that the m
svcrdma: Clean up allocation of svc_rdma_send_ctxt
The physical device's favored NUMA node ID is available when allocating a send_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10 |
|
#
eef2d8d4 |
| 04-Oct-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Split the svcrdma_wc_send() tracepoint
There are currently three separate purposes being served by a single tracepoint here. They need to be split up.
svcrdma_wc_send: - status is always
svcrdma: Split the svcrdma_wc_send() tracepoint
There are currently three separate purposes being served by a single tracepoint here. They need to be split up.
svcrdma_wc_send: - status is always zero, so there's no value in recording it. - vendor_err is meaningless unless status is not zero, so there's no value in recording it. - This tracepoint is needed only when developing modifications, so it should be left disabled most of the time.
svcrdma_wc_send_flush: - As above, needed only rarely, and not an error.
svcrdma_wc_send_err: - This tracepoint can be left persistently enabled because completion errors are run-time problems (except for FLUSHED_ERR). - Tracepoint name now ends in _err to reflect its purpose.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15 |
|
#
b6c2bfea |
| 09-Feb-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Relieve contention on sc_send_lock.
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client.
To address this, convert the sen
svcrdma: Relieve contention on sc_send_lock.
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client.
To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send).
The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
show more ...
|
#
6c8c84f5 |
| 07-Jul-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Fewer calls to wake_up() in Send completion handler
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's
svcrdma: Fewer calls to wake_up() in Send completion handler
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op.
As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
show more ...
|
#
8727f788 |
| 11-Apr-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Pass a useful error code to the send_err tracepoint
Capture error codes in @ret, which is passed to the send_err tracepoint, so that they can be logged when something goes awry.
Signed-off
svcrdma: Pass a useful error code to the send_err tracepoint
Capture error codes in @ret, which is passed to the send_err tracepoint, so that they can be logged when something goes awry.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
c7731d5e |
| 13-Apr-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Rename goto labels in svc_rdma_sendto()
Clean up: Make the goto labels consistent with other similar functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
351461f3 |
| 13-Apr-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Don't leak send_ctxt on Send errors
Address a rare send_ctxt leak in the svc_rdma_sendto() error paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
Revision tags: v5.10.14 |
|
#
cc93ce95 |
| 01-Feb-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Retain the page backing rq_res.head[0].iov_base
svc_rdma_sendto() now waits for the NIC hardware to finish with the pages backing rq_res. We still have to release the page array in some cas
svcrdma: Retain the page backing rq_res.head[0].iov_base
svc_rdma_sendto() now waits for the NIC hardware to finish with the pages backing rq_res. We still have to release the page array in some cases, but now it's always safe to immediately re-use the page backing rq_res's head buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
57990067 |
| 28-Jan-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Remove unused sc_pages field
Clean up. This significantly reduces the size of struct svc_rdma_send_ctxt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
2a1e4f21 |
| 13-Jan-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Normalize Send page handling
Currently svc_rdma_sendto() migrates xdr_buf pages into a separate page list and NULLs out a bunch of entries in rq_pages while the pages are under I/O. The Sen
svcrdma: Normalize Send page handling
Currently svc_rdma_sendto() migrates xdr_buf pages into a separate page list and NULLs out a bunch of entries in rq_pages while the pages are under I/O. The Send completion handler then frees those pages later.
Instead, let's wait for the Send completion, then handle page releasing in the nfsd thread. I'd like to avoid the cost of 250+ put_page() calls in the Send completion handler, which is single- threaded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
e844d307 |
| 20-Feb-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Add a "deferred close" helper
Refactor a bit of commonly used logic so that every site that wants a close deferred to an nfsd thread does all the right things (set_bit(XPT_CLOSE) then enque
svcrdma: Add a "deferred close" helper
Refactor a bit of commonly used logic so that every site that wants a close deferred to an nfsd thread does all the right things (set_bit(XPT_CLOSE) then enqueue).
Also, once XPT_CLOSE is set on a transport, it is never cleared. If XPT_CLOSE is already set, then the close is already being handled and the enqueue can be skipped.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
072db263 |
| 20-Feb-2021 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: RPCDBG_FACILITY is no longer used
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
22df5a22 |
| 29-Dec-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used.
Signed-off-by: Chuck Lever <chuck.lever@oracl
svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47 |
|
#
b704be09 |
| 11-Jun-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up chunk tracepoints
We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Re
svcrdma: Clean up chunk tracepoints
We already have trace_svcrdma_decode_rseg(), which records each ingress Read segment. Instead of reporting those again when they are about to be posted as RDMA Reads, let's fire one tracepoint before posting each type of chunk.
So we'll get:
nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=0 position=0 192@0x013ca9ebfae14000:0xb0010b05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=1 position=0 7688@0x013ca9ebf914e000:0xb0010a05 nfsd-1998 [002] 321.666615: svcrdma_decode_rseg: cq.id=4 cid=42 segno=2 position=0 28@0x013ca9ebfae15000:0xb0010905 nfsd-1998 [002] 321.666622: svcrdma_decode_rqst: cq.id=4 cid=42 xid=0x013ca9eb vers=1 credits=128 proc=RDMA_NOMSG hdrlen=100
nfsd-1998 [002] 321.666642: svcrdma_post_read_chunk: cq.id=3 cid=112 sqecount=3
kworker/2:1H-221 [002] 321.673949: svcrdma_wc_read: cq.id=3 cid=112 status=SUCCESS (0/0x0)
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25 |
|
#
2371bcc0 |
| 09-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg()
Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload.
This
svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg()
Refactor: svc_rdma_map_reply_msg() is restructured to DMA map only the parts of rq_res that do not contain a result payload.
This change has been tested to confirm that it does not cause a regression in the no Write chunk and single Write chunk cases. Multiple Write chunks have not been tested.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
9d0b09d5 |
| 13-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Support multiple write chunks when pulling up
When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the
svcrdma: Support multiple write chunks when pulling up
When counting the number of SGEs needed to construct a Send request, do not count result payloads. And, when copying the Reply message into the pull-up buffer, result payloads are not to be copied to the Send buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
6911f3e1 |
| 17-Jun-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Use parsed chunk lists to encode Reply transport headers
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use t
svcrdma: Use parsed chunk lists to encode Reply transport headers
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing the egress RPC Reply transport header, use the new parsed Write list and Reply chunk, which are version- agnostic and already XDR decoded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
7a1cbfa1 |
| 17-Jun-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Use parsed chunk lists to construct RDMA Writes
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the W
svcrdma: Use parsed chunk lists to construct RDMA Writes
Refactor: Instead of re-parsing the ingress RPC Call transport header when constructing RDMA Writes, use the new parsed chunk lists for the Write list and Reply chunk, which are version-agnostic and already XDR-decoded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
ded380f1 |
| 13-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up svc_rdma_encode_reply_chunk()
Refactor: Match the control flow of svc_rdma_encode_write_list().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f6ad7759 |
| 13-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Post RDMA Writes while XDR encoding replies
The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before pos
svcrdma: Post RDMA Writes while XDR encoding replies
The only RPC/RDMA ordering requirement between RDMA Writes and RDMA Sends is that the responder must post the Writes on the Send queue before posting the Send that conveys the RPC Reply for that Write payload.
The Linux NFS server implementation now has a transport method that can post result Payload Writes earlier than svc_rdma_sendto:
->xpo_result_payload()
This gets RDMA Writes going earlier so they are more likely to be complete at the remote end before the Send completes.
Some care must be taken with pulled-up Replies. We don't want to push the Write chunk and then send the same payload data via Send.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|