#
1e5f4160 |
| 07-May-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Simplify svc_rdma_recv_ctxt_put
Currently svc_rdma_recv_ctxt_put's callers have to know whether they want to free the ctxt's pages or not. This means the human developers have to know when
svcrdma: Simplify svc_rdma_recv_ctxt_put
Currently svc_rdma_recv_ctxt_put's callers have to know whether they want to free the ctxt's pages or not. This means the human developers have to know when and why to set that free_pages argument.
Instead, the ctxt should carry that information with it so that svc_rdma_recv_ctxt_put does the right thing no matter who is calling.
We want to keep track of the number of pages in the Receive buffer separately from the number of pages pulled over by RDMA Read. This is so that the correct number of pages can be freed properly and that number is well-documented.
So now, rc_hdr_count is the number of pages consumed by head[0] (ie., the page index where the Read chunk should start); and rc_page_count is always the number of pages that need to be released when the ctxt is put.
The @free_pages argument is no longer needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
ecf85b23 |
| 07-May-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Introduce svc_rdma_recv_ctxt
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt free list. This eliminates the overhead of calling kmalloc / kfree, both of which grab a globa
svcrdma: Introduce svc_rdma_recv_ctxt
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt free list. This eliminates the overhead of calling kmalloc / kfree, both of which grab a globally shared lock that disables interrupts. To reduce contention further, separate the use of these objects in the Receive and Send paths in svcrdma.
Subsequent patches will take advantage of this separation by allocating real resources which are then cached in these objects. The allocations are freed when the transport is torn down.
I've renamed the structure so that static type checking can be used to ensure that uses of op_ctxt and recv_ctxt are not confused. As an additional clean up, structure fields are renamed to conform with kernel coding conventions.
As a final clean up, helpers related to recv_ctxt are moved closer to the functions that use them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
bd2abef3 |
| 07-May-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Trace key RDMA API events
This includes: * Posting on the Send and Receive queues * Send, Receive, Read, and Write completion * Connect upcalls * QP errors
Signed-off-by: Chuck Lev
svcrdma: Trace key RDMA API events
This includes: * Posting on the Send and Receive queues * Send, Receive, Read, and Write completion * Connect upcalls * QP errors
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
98895edb |
| 07-May-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Trace key RPC/RDMA protocol events
This includes: * Transport accept and tear-down * Decisions about using Write and Reply chunks * Each RDMA segment that is handled * Whenever an R
svcrdma: Trace key RPC/RDMA protocol events
This includes: * Transport accept and tear-down * Decisions about using Write and Reply chunks * Each RDMA segment that is handled * Whenever an RDMA_ERR is sent
As a clean-up, I've standardized the order of the includes, and removed some now redundant dprintk call sites.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.16 |
|
#
175e0310 |
| 02-Feb-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Fix Read chunk round-up
A single NFSv4 WRITE compound can often have three operations: PUTFH, WRITE, then GETATTR.
When the WRITE payload is sent in a Read chunk, the client places the GET
svcrdma: Fix Read chunk round-up
A single NFSv4 WRITE compound can often have three operations: PUTFH, WRITE, then GETATTR.
When the WRITE payload is sent in a Read chunk, the client places the GETATTR in the inline part of the RPC/RDMA message, just after the WRITE operation (sans payload). The position value in the Read chunk enables the receiver to insert the Read chunk at the correct place in the received XDR stream; that is between the WRITE and GETATTR.
According to RFC 8166, an NFS/RDMA client does not have to add XDR round-up to the Read chunk that carries the WRITE payload. The receiver adds XDR round-up padding if it is absent and the receiver's XDR decoder requires it to be present.
Commit 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving") attempted to add support for receiving such a compound so that just the WRITE payload appears in rq_arg's page list, and the trailing GETATTR is placed in rq_arg's tail iovec. (TCP just strings the whole compound into the head iovec and page list, without regard to the alignment of the WRITE payload).
The server transport logic also had to accommodate the optional XDR round-up of the Read chunk, which it did simply by lengthening the tail iovec when round-up was needed. This approach is adequate for the NFSv2 and NFSv3 WRITE decoders.
Unfortunately it is not sufficient for nfsd4_decode_write. When the Read chunk length is a couple of bytes less than PAGE_SIZE, the computation at the end of nfsd4_decode_write allows argp->pagelen to go negative, which breaks the logic in read_buf that looks for the tail iovec.
The result is that a WRITE operation whose payload length is just less than a multiple of a page succeeds, but the subsequent GETATTR in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR decoder can't find it. Clients ignore the error, but they must update their attribute cache via a separate round trip.
As nfsd4_decode_write appears to expect the payload itself to always have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk add the Read chunk XDR round-up to the page_len rather than lengthening the tail iovec.
Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.15, v4.13.16, v4.14 |
|
#
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v4.13.5, v4.13 |
|
#
193bcb7b |
| 18-Aug-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Populate tail iovec when receiving
So that NFS WRITE payloads can eventually be placed directly into a file's page cache, enable the RPC-over-RDMA transport to present these payloads in the
svcrdma: Populate tail iovec when receiving
So that NFS WRITE payloads can eventually be placed directly into a file's page cache, enable the RPC-over-RDMA transport to present these payloads in the xdr_buf's page list, while placing trailing content (such as a GETATTR operation) in the xdr_buf's tail.
After this change, the RPC-over-RDMA's "copy tail" hack, added by commit a97c331f9aa9 ("svcrdma: Handle additional inline content"), is no longer needed and can be removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
7075a867 |
| 01-Aug-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up svc_rdma_build_read_chunk()
Dan Carpenter <dan.carpenter@oracle.com> observed that the while() loop in svc_rdma_build_read_chunk() does not document the assumption that the loop in
svcrdma: Clean up svc_rdma_build_read_chunk()
Dan Carpenter <dan.carpenter@oracle.com> observed that the while() loop in svc_rdma_build_read_chunk() does not document the assumption that the loop interior is always executed at least once.
Defensive: the function now returns -EINVAL if this assumption fails.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.12 |
|
#
35a30fc3 |
| 23-Jun-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
Clean up: No need to save the I/O direction. The functions that release svc_rdma_chunk_ctxt already know what direction to use.
Signed-off-by: Chuc
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
Clean up: No need to save the I/O direction. The functions that release svc_rdma_chunk_ctxt already know what direction to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
91b022ec |
| 23-Jun-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: use offset_in_page() macro
Clean up: Use offset_in_page() macro instead of open-coding.
Reported-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com
svcrdma: use offset_in_page() macro
Clean up: Use offset_in_page() macro instead of open-coding.
Reported-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
71641d99 |
| 23-Jun-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Properly compute .len and .buflen for received RPC Calls
When an RPC-over-RDMA request is received, the Receive buffer contains a Transport Header possibly followed by an RPC message.
Even
svcrdma: Properly compute .len and .buflen for received RPC Calls
When an RPC-over-RDMA request is received, the Receive buffer contains a Transport Header possibly followed by an RPC message.
Even though rq_arg.head[0] (as passed to NFSD) does not contain the Transport Header header, currently rq_arg.len includes the size of the Transport Header.
That violates the intent of the xdr_buf API contract. .buflen should include everything, but .len should be exactly the length of the RPC message in the buffer.
The rq_arg fields are summed together at the end of svc_rdma_recvfrom to obtain the correct return value. rq_arg.len really ought to contain the correct number of bytes already, but it currently doesn't due to the above misbehavior.
Let's instead ensure that .buflen includes the length of the transport header, and that .len is always equal to head.iov_len + .page_len + tail.iov_len .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
026d958b |
| 23-Jun-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
svc_rdma_rw.c already contains helpers for the sendto path. Introduce helpers for the recvfrom path.
The plan is to replace the local NFSD bespoke cod
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
svc_rdma_rw.c already contains helpers for the sendto path. Introduce helpers for the recvfrom path.
The plan is to replace the local NFSD bespoke code that constructs and posts RDMA Read Work Requests with calls to the rdma_rw API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and posting Work Requests.
This new code also puts all RDMA_NOMSG-specific logic in one place.
Lastly, the use of rqstp->rq_arg.pages is deprecated in favor of using rqstp->rq_pages directly, for clarity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
107c1d0a |
| 23-Jun-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Avoid Send Queue overflow
Sanity case: Catch the case where more Work Requests are being posted to the Send Queue than there are Send Queue Entries.
This might happen if a client sends a c
svcrdma: Avoid Send Queue overflow
Sanity case: Catch the case where more Work Requests are being posted to the Send Queue than there are Send Queue Entries.
This might happen if a client sends a chunk with more segments than there are SQEs for the transport. The server can't send that reply, so the transport will deadlock unless the client drops the RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10 |
|
#
f13193f5 |
| 09-Apr-2017 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Introduce local rdma_rw API helpers
The plan is to replace the local bespoke code that constructs and posts RDMA Read and Write Work Requests with calls to the rdma_rw API. This shares code
svcrdma: Introduce local rdma_rw API helpers
The plan is to replace the local bespoke code that constructs and posts RDMA Read and Write Work Requests with calls to the rdma_rw API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and posting Work Requests.
Some design notes:
o The structure of RPC-over-RDMA transport headers is flexible, allowing multiple segments per Reply with arbitrary alignment, each with a unique R_key. Write and Send WRs continue to be built and posted in separate code paths. However, one whole chunk (with one or more RDMA segments apiece) gets exactly one ib_post_send and one work completion.
o svc_xprt reference counting is modified, since a chain of rdma_rw_ctx structs generates one completion, no matter how many Write WRs are posted.
o The current code builds the transport header as it is construct- ing Write WRs. I've replaced that with marshaling of transport header data items in a separate step. This is because the exact structure of client-provided segments may not align with the components of the server's reply xdr_buf, or the pages in the page list. Thus parts of each client-provided segment may be written at different points in the send path.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
1cc5213b |
| 22-Aug-2020 |
Randy Dunlap <rdunlap@infradead.org> |
net: sunrpc: delete repeated words Drop duplicate words in net/sunrpc/. Also fix "Anyone" to be "Any one". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce
net: sunrpc: delete repeated words Drop duplicate words in net/sunrpc/. Also fix "Anyone" to be "Any one". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
365e9992 |
| 30-Jun-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Remove transport reference counting Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference c
svcrdma: Remove transport reference counting Jason tells me that a ULP cannot rely on getting an ESTABLISHED and DISCONNECTED event pair for each connection, so transport reference counting in the CM event handler will never be reliable. Now that we have ib_drain_qp(), svcrdma should no longer need to hold transport references while Sends and Receives are posted. So remove the get/put call sites in the CM event handlers. This eliminates a significant source of locked memory bus traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
6787f0be |
| 29-Apr-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Display chunk completion ID when posting a rw_ctxt Re-use the post_rw tracepoint (safely) to trace cc_info lifetime events, including completion IDs. Signed-off-by: Chu
svcrdma: Display chunk completion ID when posting a rw_ctxt Re-use the post_rw tracepoint (safely) to trace cc_info lifetime events, including completion IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
f60a0869 |
| 29-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Add common XDR decoders for RDMA and Read segments Clean up: De-duplicate some code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
e814eecb |
| 11-Jun-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Fix page leak in svc_rdma_recv_read_chunk() Commit 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") moved the page saver logic so that it gets executed event when an error occ
svcrdma: Fix page leak in svc_rdma_recv_read_chunk() Commit 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") moved the page saver logic so that it gets executed event when an error occurs. In that case, the I/O is never posted, and those pages are then leaked. Errors in this path, however, are quite rare. Fixes: 07d0ff3b0cd2 ("svcrdma: Clean up Read chunk path") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
dbc17acd |
| 20-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: trace undersized Write chunks Clean up: Replace a dprintk call site. This is the last remaining dprintk call site in svc_rdma_rw.c, so remove dprintk infrastructure as
svcrdma: trace undersized Write chunks Clean up: Replace a dprintk call site. This is the last remaining dprintk call site in svc_rdma_rw.c, so remove dprintk infrastructure as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
9d200638 |
| 20-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Trace page overruns when constructing RDMA Reads Clean up: Replace a dprintk call site with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
f4e53e1c |
| 20-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up handling of get_rw_ctx errors Clean up: Replace two dprintk call sites with a tracepoint. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
#
2abfbe7e |
| 20-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Clean up the tracing for rw_ctx_init errors - De-duplicate code - Rename the tracepoint with "_err" to allow enabling via glob - Report the sg_cnt for the failing rw_ctx
svcrdma: Clean up the tracing for rw_ctx_init errors - De-duplicate code - Rename the tracepoint with "_err" to allow enabling via glob - Report the sg_cnt for the failing rw_ctx - Fix a dumb signage issue Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
#
e28b4fc6 |
| 30-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Fix trace point use-after-free race I hit this while testing nfsd-5.7 with kernel memory debugging enabled on my server: Mar 30 13:21:45 klimt kernel: BUG: unable to ha
svcrdma: Fix trace point use-after-free race I hit this while testing nfsd-5.7 with kernel memory debugging enabled on my server: Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8 Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060 Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591 Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma] Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 <8b> 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00 Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286 Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030 Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246 Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004 Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80 Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000 Mar 30 13:21:45 klimt kernel: FS: 0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000 Mar 30 13:21:45 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0 Mar 30 13:21:45 klimt kernel: Call Trace: Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma] Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma] Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma] Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd] Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc] Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd] Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74 Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50 Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]--- It's absolutely not safe to use resources pointed to by the @send_wr argument of ib_post_send() _after_ that function returns. Those resources are typically freed by the Send completion handler, which can run before ib_post_send() returns. Thus the trace points currently around ib_post_send() in the server's RPC/RDMA transport are a hazard, even when they are disabled. Rearrange them so that they touch the Work Request only _before_ ib_post_send() is invoked. Fixes: bd2abef33394 ("svcrdma: Trace key RDMA API events") Fixes: 4201c7464753 ("svcrdma: Introduce svc_rdma_send_ctxt") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|
Revision tags: v5.4.24 |
|
#
a406c563 |
| 02-Mar-2020 |
Chuck Lever <chuck.lever@oracle.com> |
svcrdma: Rename svcrdma_encode trace points in send routines These trace points are misnamed: trace_svcrdma_encode_wseg trace_svcrdma_encode_write tr
svcrdma: Rename svcrdma_encode trace points in send routines These trace points are misnamed: trace_svcrdma_encode_wseg trace_svcrdma_encode_write trace_svcrdma_encode_reply trace_svcrdma_encode_rseg trace_svcrdma_encode_read trace_svcrdma_encode_pzr Because they actually trace posting on the Send Queue. Let's rename them so that I can add trace points in the chunk list encoders that actually do trace chunk list encoding events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
show more ...
|