1c06b540aSTom Tucker /* 29a6a180bSChuck Lever * Copyright (c) 2016 Oracle. All rights reserved. 30bf48289SSteve Wise * Copyright (c) 2014 Open Grid Computing, Inc. All rights reserved. 4c06b540aSTom Tucker * Copyright (c) 2005-2006 Network Appliance, Inc. All rights reserved. 5c06b540aSTom Tucker * 6c06b540aSTom Tucker * This software is available to you under a choice of one of two 7c06b540aSTom Tucker * licenses. You may choose to be licensed under the terms of the GNU 8c06b540aSTom Tucker * General Public License (GPL) Version 2, available from the file 9c06b540aSTom Tucker * COPYING in the main directory of this source tree, or the BSD-type 10c06b540aSTom Tucker * license below: 11c06b540aSTom Tucker * 12c06b540aSTom Tucker * Redistribution and use in source and binary forms, with or without 13c06b540aSTom Tucker * modification, are permitted provided that the following conditions 14c06b540aSTom Tucker * are met: 15c06b540aSTom Tucker * 16c06b540aSTom Tucker * Redistributions of source code must retain the above copyright 17c06b540aSTom Tucker * notice, this list of conditions and the following disclaimer. 18c06b540aSTom Tucker * 19c06b540aSTom Tucker * Redistributions in binary form must reproduce the above 20c06b540aSTom Tucker * copyright notice, this list of conditions and the following 21c06b540aSTom Tucker * disclaimer in the documentation and/or other materials provided 22c06b540aSTom Tucker * with the distribution. 23c06b540aSTom Tucker * 24c06b540aSTom Tucker * Neither the name of the Network Appliance, Inc. nor the names of 25c06b540aSTom Tucker * its contributors may be used to endorse or promote products 26c06b540aSTom Tucker * derived from this software without specific prior written 27c06b540aSTom Tucker * permission. 28c06b540aSTom Tucker * 29c06b540aSTom Tucker * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 30c06b540aSTom Tucker * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 31c06b540aSTom Tucker * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 32c06b540aSTom Tucker * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 33c06b540aSTom Tucker * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 34c06b540aSTom Tucker * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 35c06b540aSTom Tucker * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 36c06b540aSTom Tucker * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 37c06b540aSTom Tucker * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 38c06b540aSTom Tucker * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 39c06b540aSTom Tucker * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 40c06b540aSTom Tucker * 41c06b540aSTom Tucker * Author: Tom Tucker <tom@opengridcomputing.com> 42c06b540aSTom Tucker */ 43c06b540aSTom Tucker 449a6a180bSChuck Lever /* Operation 459a6a180bSChuck Lever * 469a6a180bSChuck Lever * The main entry point is svc_rdma_sendto. This is called by the 479a6a180bSChuck Lever * RPC server when an RPC Reply is ready to be transmitted to a client. 489a6a180bSChuck Lever * 499a6a180bSChuck Lever * The passed-in svc_rqst contains a struct xdr_buf which holds an 509a6a180bSChuck Lever * XDR-encoded RPC Reply message. sendto must construct the RPC-over-RDMA 519a6a180bSChuck Lever * transport header, post all Write WRs needed for this Reply, then post 529a6a180bSChuck Lever * a Send WR conveying the transport header and the RPC message itself to 539a6a180bSChuck Lever * the client. 549a6a180bSChuck Lever * 559a6a180bSChuck Lever * svc_rdma_sendto must fully transmit the Reply before returning, as 569a6a180bSChuck Lever * the svc_rqst will be recycled as soon as sendto returns. Remaining 579a6a180bSChuck Lever * resources referred to by the svc_rqst are also recycled at that time. 589a6a180bSChuck Lever * Therefore any resources that must remain longer must be detached 599a6a180bSChuck Lever * from the svc_rqst and released later. 609a6a180bSChuck Lever * 619a6a180bSChuck Lever * Page Management 629a6a180bSChuck Lever * 639a6a180bSChuck Lever * The I/O that performs Reply transmission is asynchronous, and may 649a6a180bSChuck Lever * complete well after sendto returns. Thus pages under I/O must be 659a6a180bSChuck Lever * removed from the svc_rqst before sendto returns. 669a6a180bSChuck Lever * 679a6a180bSChuck Lever * The logic here depends on Send Queue and completion ordering. Since 689a6a180bSChuck Lever * the Send WR is always posted last, it will always complete last. Thus 699a6a180bSChuck Lever * when it completes, it is guaranteed that all previous Write WRs have 709a6a180bSChuck Lever * also completed. 719a6a180bSChuck Lever * 729a6a180bSChuck Lever * Write WRs are constructed and posted. Each Write segment gets its own 739a6a180bSChuck Lever * svc_rdma_rw_ctxt, allowing the Write completion handler to find and 749a6a180bSChuck Lever * DMA-unmap the pages under I/O for that Write segment. The Write 759a6a180bSChuck Lever * completion handler does not release any pages. 769a6a180bSChuck Lever * 779a6a180bSChuck Lever * When the Send WR is constructed, it also gets its own svc_rdma_op_ctxt. 789a6a180bSChuck Lever * The ownership of all of the Reply's pages are transferred into that 799a6a180bSChuck Lever * ctxt, the Send WR is posted, and sendto returns. 809a6a180bSChuck Lever * 819a6a180bSChuck Lever * The svc_rdma_op_ctxt is presented when the Send WR completes. The 829a6a180bSChuck Lever * Send completion handler finally releases the Reply's pages. 839a6a180bSChuck Lever * 849a6a180bSChuck Lever * This mechanism also assumes that completions on the transport's Send 859a6a180bSChuck Lever * Completion Queue do not run in parallel. Otherwise a Write completion 869a6a180bSChuck Lever * and Send completion running at the same time could release pages that 879a6a180bSChuck Lever * are still DMA-mapped. 889a6a180bSChuck Lever * 899a6a180bSChuck Lever * Error Handling 909a6a180bSChuck Lever * 919a6a180bSChuck Lever * - If the Send WR is posted successfully, it will either complete 929a6a180bSChuck Lever * successfully, or get flushed. Either way, the Send completion 939a6a180bSChuck Lever * handler releases the Reply's pages. 949a6a180bSChuck Lever * - If the Send WR cannot be not posted, the forward path releases 959a6a180bSChuck Lever * the Reply's pages. 969a6a180bSChuck Lever * 979a6a180bSChuck Lever * This handles the case, without the use of page reference counting, 989a6a180bSChuck Lever * where two different Write segments send portions of the same page. 999a6a180bSChuck Lever */ 1009a6a180bSChuck Lever 101c06b540aSTom Tucker #include <linux/sunrpc/debug.h> 102c06b540aSTom Tucker #include <linux/sunrpc/rpc_rdma.h> 103c06b540aSTom Tucker #include <linux/spinlock.h> 104c06b540aSTom Tucker #include <asm/unaligned.h> 105c06b540aSTom Tucker #include <rdma/ib_verbs.h> 106c06b540aSTom Tucker #include <rdma/rdma_cm.h> 107c06b540aSTom Tucker #include <linux/sunrpc/svc_rdma.h> 108c06b540aSTom Tucker 109c06b540aSTom Tucker #define RPCDBG_FACILITY RPCDBG_SVCXPRT 110c06b540aSTom Tucker 111cf570a93SChuck Lever static u32 xdr_padsize(u32 len) 112cf570a93SChuck Lever { 113cf570a93SChuck Lever return (len & 3) ? (4 - (len & 3)) : 0; 114cf570a93SChuck Lever } 115cf570a93SChuck Lever 1169a6a180bSChuck Lever /* Returns length of transport header, in bytes. 1179a6a180bSChuck Lever */ 1189a6a180bSChuck Lever static unsigned int svc_rdma_reply_hdr_len(__be32 *rdma_resp) 1199a6a180bSChuck Lever { 1209a6a180bSChuck Lever unsigned int nsegs; 1219a6a180bSChuck Lever __be32 *p; 1229a6a180bSChuck Lever 1239a6a180bSChuck Lever p = rdma_resp; 1249a6a180bSChuck Lever 1259a6a180bSChuck Lever /* RPC-over-RDMA V1 replies never have a Read list. */ 1269a6a180bSChuck Lever p += rpcrdma_fixed_maxsz + 1; 1279a6a180bSChuck Lever 1289a6a180bSChuck Lever /* Skip Write list. */ 1299a6a180bSChuck Lever while (*p++ != xdr_zero) { 1309a6a180bSChuck Lever nsegs = be32_to_cpup(p++); 1319a6a180bSChuck Lever p += nsegs * rpcrdma_segment_maxsz; 1329a6a180bSChuck Lever } 1339a6a180bSChuck Lever 1349a6a180bSChuck Lever /* Skip Reply chunk. */ 1359a6a180bSChuck Lever if (*p++ != xdr_zero) { 1369a6a180bSChuck Lever nsegs = be32_to_cpup(p++); 1379a6a180bSChuck Lever p += nsegs * rpcrdma_segment_maxsz; 1389a6a180bSChuck Lever } 1399a6a180bSChuck Lever 1409a6a180bSChuck Lever return (unsigned long)p - (unsigned long)rdma_resp; 1419a6a180bSChuck Lever } 1429a6a180bSChuck Lever 1439a6a180bSChuck Lever /* One Write chunk is copied from Call transport header to Reply 1449a6a180bSChuck Lever * transport header. Each segment's length field is updated to 1459a6a180bSChuck Lever * reflect number of bytes consumed in the segment. 1469a6a180bSChuck Lever * 1479a6a180bSChuck Lever * Returns number of segments in this chunk. 1489a6a180bSChuck Lever */ 1499a6a180bSChuck Lever static unsigned int xdr_encode_write_chunk(__be32 *dst, __be32 *src, 1509a6a180bSChuck Lever unsigned int remaining) 1519a6a180bSChuck Lever { 1529a6a180bSChuck Lever unsigned int i, nsegs; 1539a6a180bSChuck Lever u32 seg_len; 1549a6a180bSChuck Lever 1559a6a180bSChuck Lever /* Write list discriminator */ 1569a6a180bSChuck Lever *dst++ = *src++; 1579a6a180bSChuck Lever 1589a6a180bSChuck Lever /* number of segments in this chunk */ 1599a6a180bSChuck Lever nsegs = be32_to_cpup(src); 1609a6a180bSChuck Lever *dst++ = *src++; 1619a6a180bSChuck Lever 1629a6a180bSChuck Lever for (i = nsegs; i; i--) { 1639a6a180bSChuck Lever /* segment's RDMA handle */ 1649a6a180bSChuck Lever *dst++ = *src++; 1659a6a180bSChuck Lever 1669a6a180bSChuck Lever /* bytes returned in this segment */ 1679a6a180bSChuck Lever seg_len = be32_to_cpu(*src); 1689a6a180bSChuck Lever if (remaining >= seg_len) { 1699a6a180bSChuck Lever /* entire segment was consumed */ 1709a6a180bSChuck Lever *dst = *src; 1719a6a180bSChuck Lever remaining -= seg_len; 1729a6a180bSChuck Lever } else { 1739a6a180bSChuck Lever /* segment only partly filled */ 1749a6a180bSChuck Lever *dst = cpu_to_be32(remaining); 1759a6a180bSChuck Lever remaining = 0; 1769a6a180bSChuck Lever } 1779a6a180bSChuck Lever dst++; src++; 1789a6a180bSChuck Lever 1799a6a180bSChuck Lever /* segment's RDMA offset */ 1809a6a180bSChuck Lever *dst++ = *src++; 1819a6a180bSChuck Lever *dst++ = *src++; 1829a6a180bSChuck Lever } 1839a6a180bSChuck Lever 1849a6a180bSChuck Lever return nsegs; 1859a6a180bSChuck Lever } 1869a6a180bSChuck Lever 1879a6a180bSChuck Lever /* The client provided a Write list in the Call message. Fill in 1889a6a180bSChuck Lever * the segments in the first Write chunk in the Reply's transport 1899a6a180bSChuck Lever * header with the number of bytes consumed in each segment. 1909a6a180bSChuck Lever * Remaining chunks are returned unused. 1919a6a180bSChuck Lever * 1929a6a180bSChuck Lever * Assumptions: 1939a6a180bSChuck Lever * - Client has provided only one Write chunk 1949a6a180bSChuck Lever */ 1959a6a180bSChuck Lever static void svc_rdma_xdr_encode_write_list(__be32 *rdma_resp, __be32 *wr_ch, 1969a6a180bSChuck Lever unsigned int consumed) 1979a6a180bSChuck Lever { 1989a6a180bSChuck Lever unsigned int nsegs; 1999a6a180bSChuck Lever __be32 *p, *q; 2009a6a180bSChuck Lever 2019a6a180bSChuck Lever /* RPC-over-RDMA V1 replies never have a Read list. */ 2029a6a180bSChuck Lever p = rdma_resp + rpcrdma_fixed_maxsz + 1; 2039a6a180bSChuck Lever 2049a6a180bSChuck Lever q = wr_ch; 2059a6a180bSChuck Lever while (*q != xdr_zero) { 2069a6a180bSChuck Lever nsegs = xdr_encode_write_chunk(p, q, consumed); 2079a6a180bSChuck Lever q += 2 + nsegs * rpcrdma_segment_maxsz; 2089a6a180bSChuck Lever p += 2 + nsegs * rpcrdma_segment_maxsz; 2099a6a180bSChuck Lever consumed = 0; 2109a6a180bSChuck Lever } 2119a6a180bSChuck Lever 2129a6a180bSChuck Lever /* Terminate Write list */ 2139a6a180bSChuck Lever *p++ = xdr_zero; 2149a6a180bSChuck Lever 2159a6a180bSChuck Lever /* Reply chunk discriminator; may be replaced later */ 2169a6a180bSChuck Lever *p = xdr_zero; 2179a6a180bSChuck Lever } 2189a6a180bSChuck Lever 2199a6a180bSChuck Lever /* The client provided a Reply chunk in the Call message. Fill in 2209a6a180bSChuck Lever * the segments in the Reply chunk in the Reply message with the 2219a6a180bSChuck Lever * number of bytes consumed in each segment. 2229a6a180bSChuck Lever * 2239a6a180bSChuck Lever * Assumptions: 2249a6a180bSChuck Lever * - Reply can always fit in the provided Reply chunk 2259a6a180bSChuck Lever */ 2269a6a180bSChuck Lever static void svc_rdma_xdr_encode_reply_chunk(__be32 *rdma_resp, __be32 *rp_ch, 2279a6a180bSChuck Lever unsigned int consumed) 2289a6a180bSChuck Lever { 2299a6a180bSChuck Lever __be32 *p; 2309a6a180bSChuck Lever 2319a6a180bSChuck Lever /* Find the Reply chunk in the Reply's xprt header. 2329a6a180bSChuck Lever * RPC-over-RDMA V1 replies never have a Read list. 2339a6a180bSChuck Lever */ 2349a6a180bSChuck Lever p = rdma_resp + rpcrdma_fixed_maxsz + 1; 2359a6a180bSChuck Lever 2369a6a180bSChuck Lever /* Skip past Write list */ 2379a6a180bSChuck Lever while (*p++ != xdr_zero) 2389a6a180bSChuck Lever p += 1 + be32_to_cpup(p) * rpcrdma_segment_maxsz; 2399a6a180bSChuck Lever 2409a6a180bSChuck Lever xdr_encode_write_chunk(p, rp_ch, consumed); 2419a6a180bSChuck Lever } 2429a6a180bSChuck Lever 2435fdca653SChuck Lever /* Parse the RPC Call's transport header. 24410dc4512SChuck Lever */ 2459a6a180bSChuck Lever static void svc_rdma_get_write_arrays(__be32 *rdma_argp, 2469a6a180bSChuck Lever __be32 **write, __be32 **reply) 24710dc4512SChuck Lever { 2485fdca653SChuck Lever __be32 *p; 24910dc4512SChuck Lever 2509a6a180bSChuck Lever p = rdma_argp + rpcrdma_fixed_maxsz; 2515fdca653SChuck Lever 2525fdca653SChuck Lever /* Read list */ 2535fdca653SChuck Lever while (*p++ != xdr_zero) 2545fdca653SChuck Lever p += 5; 2555fdca653SChuck Lever 2565fdca653SChuck Lever /* Write list */ 2575fdca653SChuck Lever if (*p != xdr_zero) { 2589a6a180bSChuck Lever *write = p; 2595fdca653SChuck Lever while (*p++ != xdr_zero) 2605fdca653SChuck Lever p += 1 + be32_to_cpu(*p) * 4; 2615fdca653SChuck Lever } else { 2625fdca653SChuck Lever *write = NULL; 2635fdca653SChuck Lever p++; 26410dc4512SChuck Lever } 26510dc4512SChuck Lever 2665fdca653SChuck Lever /* Reply chunk */ 2675fdca653SChuck Lever if (*p != xdr_zero) 2689a6a180bSChuck Lever *reply = p; 2695fdca653SChuck Lever else 2705fdca653SChuck Lever *reply = NULL; 27110dc4512SChuck Lever } 27210dc4512SChuck Lever 27325d55296SChuck Lever /* RPC-over-RDMA Version One private extension: Remote Invalidation. 27425d55296SChuck Lever * Responder's choice: requester signals it can handle Send With 27525d55296SChuck Lever * Invalidate, and responder chooses one rkey to invalidate. 27625d55296SChuck Lever * 27725d55296SChuck Lever * Find a candidate rkey to invalidate when sending a reply. Picks the 278c238c4c0SChuck Lever * first R_key it finds in the chunk lists. 27925d55296SChuck Lever * 28025d55296SChuck Lever * Returns zero if RPC's chunk lists are empty. 28125d55296SChuck Lever */ 282c238c4c0SChuck Lever static u32 svc_rdma_get_inv_rkey(__be32 *rdma_argp, 283c238c4c0SChuck Lever __be32 *wr_lst, __be32 *rp_ch) 28425d55296SChuck Lever { 285c238c4c0SChuck Lever __be32 *p; 28625d55296SChuck Lever 287c238c4c0SChuck Lever p = rdma_argp + rpcrdma_fixed_maxsz; 288c238c4c0SChuck Lever if (*p != xdr_zero) 289c238c4c0SChuck Lever p += 2; 290c238c4c0SChuck Lever else if (wr_lst && be32_to_cpup(wr_lst + 1)) 291c238c4c0SChuck Lever p = wr_lst + 2; 292c238c4c0SChuck Lever else if (rp_ch && be32_to_cpup(rp_ch + 1)) 293c238c4c0SChuck Lever p = rp_ch + 2; 294c238c4c0SChuck Lever else 295fafedf81SChuck Lever return 0; 296c238c4c0SChuck Lever return be32_to_cpup(p); 29725d55296SChuck Lever } 29825d55296SChuck Lever 2999a6a180bSChuck Lever /* ib_dma_map_page() is used here because svc_rdma_dma_unmap() 3009a6a180bSChuck Lever * is used during completion to DMA-unmap this memory, and 3019a6a180bSChuck Lever * it uses ib_dma_unmap_page() exclusively. 3029a6a180bSChuck Lever */ 3039a6a180bSChuck Lever static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma, 3049a6a180bSChuck Lever struct svc_rdma_op_ctxt *ctxt, 3059a6a180bSChuck Lever unsigned int sge_no, 3069a6a180bSChuck Lever unsigned char *base, 3079a6a180bSChuck Lever unsigned int len) 3089a6a180bSChuck Lever { 3099a6a180bSChuck Lever unsigned long offset = (unsigned long)base & ~PAGE_MASK; 3109a6a180bSChuck Lever struct ib_device *dev = rdma->sc_cm_id->device; 3119a6a180bSChuck Lever dma_addr_t dma_addr; 3129a6a180bSChuck Lever 3139a6a180bSChuck Lever dma_addr = ib_dma_map_page(dev, virt_to_page(base), 3149a6a180bSChuck Lever offset, len, DMA_TO_DEVICE); 3159a6a180bSChuck Lever if (ib_dma_mapping_error(dev, dma_addr)) 31691a08eaeSChuck Lever goto out_maperr; 3179a6a180bSChuck Lever 3189a6a180bSChuck Lever ctxt->sge[sge_no].addr = dma_addr; 3199a6a180bSChuck Lever ctxt->sge[sge_no].length = len; 3209a6a180bSChuck Lever ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey; 3219a6a180bSChuck Lever svc_rdma_count_mappings(rdma, ctxt); 3229a6a180bSChuck Lever return 0; 32391a08eaeSChuck Lever 32491a08eaeSChuck Lever out_maperr: 32591a08eaeSChuck Lever pr_err("svcrdma: failed to map buffer\n"); 32691a08eaeSChuck Lever return -EIO; 3279a6a180bSChuck Lever } 3289a6a180bSChuck Lever 3296e6092caSChuck Lever static int svc_rdma_dma_map_page(struct svcxprt_rdma *rdma, 3306e6092caSChuck Lever struct svc_rdma_op_ctxt *ctxt, 3316e6092caSChuck Lever unsigned int sge_no, 3326e6092caSChuck Lever struct page *page, 3336e6092caSChuck Lever unsigned int offset, 3346e6092caSChuck Lever unsigned int len) 3356e6092caSChuck Lever { 3366e6092caSChuck Lever struct ib_device *dev = rdma->sc_cm_id->device; 3376e6092caSChuck Lever dma_addr_t dma_addr; 3386e6092caSChuck Lever 3396e6092caSChuck Lever dma_addr = ib_dma_map_page(dev, page, offset, len, DMA_TO_DEVICE); 3406e6092caSChuck Lever if (ib_dma_mapping_error(dev, dma_addr)) 34191a08eaeSChuck Lever goto out_maperr; 3426e6092caSChuck Lever 3436e6092caSChuck Lever ctxt->sge[sge_no].addr = dma_addr; 3446e6092caSChuck Lever ctxt->sge[sge_no].length = len; 3456e6092caSChuck Lever ctxt->sge[sge_no].lkey = rdma->sc_pd->local_dma_lkey; 3466e6092caSChuck Lever svc_rdma_count_mappings(rdma, ctxt); 3476e6092caSChuck Lever return 0; 34891a08eaeSChuck Lever 34991a08eaeSChuck Lever out_maperr: 35091a08eaeSChuck Lever pr_err("svcrdma: failed to map page\n"); 35191a08eaeSChuck Lever return -EIO; 3526e6092caSChuck Lever } 3536e6092caSChuck Lever 3546e6092caSChuck Lever /** 3556e6092caSChuck Lever * svc_rdma_map_reply_hdr - DMA map the transport header buffer 3566e6092caSChuck Lever * @rdma: controlling transport 3576e6092caSChuck Lever * @ctxt: op_ctxt for the Send WR 3586e6092caSChuck Lever * @rdma_resp: buffer containing transport header 3596e6092caSChuck Lever * @len: length of transport header 3606e6092caSChuck Lever * 3616e6092caSChuck Lever * Returns: 3626e6092caSChuck Lever * %0 if the header is DMA mapped, 3636e6092caSChuck Lever * %-EIO if DMA mapping failed. 3646e6092caSChuck Lever */ 3656e6092caSChuck Lever int svc_rdma_map_reply_hdr(struct svcxprt_rdma *rdma, 3666e6092caSChuck Lever struct svc_rdma_op_ctxt *ctxt, 3676e6092caSChuck Lever __be32 *rdma_resp, 3686e6092caSChuck Lever unsigned int len) 3696e6092caSChuck Lever { 3706e6092caSChuck Lever ctxt->direction = DMA_TO_DEVICE; 3716e6092caSChuck Lever ctxt->pages[0] = virt_to_page(rdma_resp); 3726e6092caSChuck Lever ctxt->count = 1; 3736e6092caSChuck Lever return svc_rdma_dma_map_page(rdma, ctxt, 0, ctxt->pages[0], 0, len); 3746e6092caSChuck Lever } 3756e6092caSChuck Lever 3769a6a180bSChuck Lever /* Load the xdr_buf into the ctxt's sge array, and DMA map each 3779a6a180bSChuck Lever * element as it is added. 3789a6a180bSChuck Lever * 3799a6a180bSChuck Lever * Returns the number of sge elements loaded on success, or 3809a6a180bSChuck Lever * a negative errno on failure. 381c06b540aSTom Tucker */ 3829a6a180bSChuck Lever static int svc_rdma_map_reply_msg(struct svcxprt_rdma *rdma, 3839a6a180bSChuck Lever struct svc_rdma_op_ctxt *ctxt, 3849a6a180bSChuck Lever struct xdr_buf *xdr, __be32 *wr_lst) 385c06b540aSTom Tucker { 3869a6a180bSChuck Lever unsigned int len, sge_no, remaining, page_off; 3879a6a180bSChuck Lever struct page **ppages; 3889a6a180bSChuck Lever unsigned char *base; 3899a6a180bSChuck Lever u32 xdr_pad; 390c06b540aSTom Tucker int ret; 391c06b540aSTom Tucker 3929a6a180bSChuck Lever sge_no = 1; 393c06b540aSTom Tucker 3949a6a180bSChuck Lever ret = svc_rdma_dma_map_buf(rdma, ctxt, sge_no++, 3959a6a180bSChuck Lever xdr->head[0].iov_base, 3969a6a180bSChuck Lever xdr->head[0].iov_len); 3979a6a180bSChuck Lever if (ret < 0) 3989a6a180bSChuck Lever return ret; 399c06b540aSTom Tucker 4009a6a180bSChuck Lever /* If a Write chunk is present, the xdr_buf's page list 4019a6a180bSChuck Lever * is not included inline. However the Upper Layer may 4029a6a180bSChuck Lever * have added XDR padding in the tail buffer, and that 4039a6a180bSChuck Lever * should not be included inline. 4049a6a180bSChuck Lever */ 4059a6a180bSChuck Lever if (wr_lst) { 4069a6a180bSChuck Lever base = xdr->tail[0].iov_base; 4079a6a180bSChuck Lever len = xdr->tail[0].iov_len; 4089a6a180bSChuck Lever xdr_pad = xdr_padsize(xdr->page_len); 409c06b540aSTom Tucker 4109a6a180bSChuck Lever if (len && xdr_pad) { 4119a6a180bSChuck Lever base += xdr_pad; 4129a6a180bSChuck Lever len -= xdr_pad; 413c06b540aSTom Tucker } 414c06b540aSTom Tucker 4159a6a180bSChuck Lever goto tail; 416c06b540aSTom Tucker } 4179a6a180bSChuck Lever 4189a6a180bSChuck Lever ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT); 4199a6a180bSChuck Lever page_off = xdr->page_base & ~PAGE_MASK; 4209a6a180bSChuck Lever remaining = xdr->page_len; 4219a6a180bSChuck Lever while (remaining) { 4229a6a180bSChuck Lever len = min_t(u32, PAGE_SIZE - page_off, remaining); 4239a6a180bSChuck Lever 4249a6a180bSChuck Lever ret = svc_rdma_dma_map_page(rdma, ctxt, sge_no++, 4259a6a180bSChuck Lever *ppages++, page_off, len); 4269a6a180bSChuck Lever if (ret < 0) 4279a6a180bSChuck Lever return ret; 4289a6a180bSChuck Lever 4299a6a180bSChuck Lever remaining -= len; 4309a6a180bSChuck Lever page_off = 0; 431c06b540aSTom Tucker } 432c06b540aSTom Tucker 4339a6a180bSChuck Lever base = xdr->tail[0].iov_base; 4349a6a180bSChuck Lever len = xdr->tail[0].iov_len; 4359a6a180bSChuck Lever tail: 4369a6a180bSChuck Lever if (len) { 4379a6a180bSChuck Lever ret = svc_rdma_dma_map_buf(rdma, ctxt, sge_no++, base, len); 4389a6a180bSChuck Lever if (ret < 0) 4399a6a180bSChuck Lever return ret; 4409a6a180bSChuck Lever } 44108ae4e7fSChuck Lever 4429a6a180bSChuck Lever return sge_no - 1; 443c06b540aSTom Tucker } 444c06b540aSTom Tucker 445c55ab070SChuck Lever /* The svc_rqst and all resources it owns are released as soon as 446c55ab070SChuck Lever * svc_rdma_sendto returns. Transfer pages under I/O to the ctxt 447c55ab070SChuck Lever * so they are released by the Send completion handler. 448c55ab070SChuck Lever */ 449c55ab070SChuck Lever static void svc_rdma_save_io_pages(struct svc_rqst *rqstp, 450c55ab070SChuck Lever struct svc_rdma_op_ctxt *ctxt) 451c55ab070SChuck Lever { 452c55ab070SChuck Lever int i, pages = rqstp->rq_next_page - rqstp->rq_respages; 453c55ab070SChuck Lever 454c55ab070SChuck Lever ctxt->count += pages; 455c55ab070SChuck Lever for (i = 0; i < pages; i++) { 456c55ab070SChuck Lever ctxt->pages[i + 1] = rqstp->rq_respages[i]; 457c55ab070SChuck Lever rqstp->rq_respages[i] = NULL; 458c55ab070SChuck Lever } 459c55ab070SChuck Lever rqstp->rq_next_page = rqstp->rq_respages + 1; 460c55ab070SChuck Lever } 461c55ab070SChuck Lever 46217f5f7f5SChuck Lever /** 46317f5f7f5SChuck Lever * svc_rdma_post_send_wr - Set up and post one Send Work Request 46417f5f7f5SChuck Lever * @rdma: controlling transport 46517f5f7f5SChuck Lever * @ctxt: op_ctxt for transmitting the Send WR 46617f5f7f5SChuck Lever * @num_sge: number of SGEs to send 46717f5f7f5SChuck Lever * @inv_rkey: R_key argument to Send With Invalidate, or zero 46817f5f7f5SChuck Lever * 46917f5f7f5SChuck Lever * Returns: 47017f5f7f5SChuck Lever * %0 if the Send* was posted successfully, 47117f5f7f5SChuck Lever * %-ENOTCONN if the connection was lost or dropped, 47217f5f7f5SChuck Lever * %-EINVAL if there was a problem with the Send we built, 47317f5f7f5SChuck Lever * %-ENOMEM if ib_post_send failed. 47417f5f7f5SChuck Lever */ 47517f5f7f5SChuck Lever int svc_rdma_post_send_wr(struct svcxprt_rdma *rdma, 47617f5f7f5SChuck Lever struct svc_rdma_op_ctxt *ctxt, int num_sge, 47717f5f7f5SChuck Lever u32 inv_rkey) 47817f5f7f5SChuck Lever { 47917f5f7f5SChuck Lever struct ib_send_wr *send_wr = &ctxt->send_wr; 48017f5f7f5SChuck Lever 48117f5f7f5SChuck Lever dprintk("svcrdma: posting Send WR with %u sge(s)\n", num_sge); 48217f5f7f5SChuck Lever 48317f5f7f5SChuck Lever send_wr->next = NULL; 48417f5f7f5SChuck Lever ctxt->cqe.done = svc_rdma_wc_send; 48517f5f7f5SChuck Lever send_wr->wr_cqe = &ctxt->cqe; 48617f5f7f5SChuck Lever send_wr->sg_list = ctxt->sge; 48717f5f7f5SChuck Lever send_wr->num_sge = num_sge; 48817f5f7f5SChuck Lever send_wr->send_flags = IB_SEND_SIGNALED; 48917f5f7f5SChuck Lever if (inv_rkey) { 49017f5f7f5SChuck Lever send_wr->opcode = IB_WR_SEND_WITH_INV; 49117f5f7f5SChuck Lever send_wr->ex.invalidate_rkey = inv_rkey; 49217f5f7f5SChuck Lever } else { 49317f5f7f5SChuck Lever send_wr->opcode = IB_WR_SEND; 49417f5f7f5SChuck Lever } 49517f5f7f5SChuck Lever 49617f5f7f5SChuck Lever return svc_rdma_send(rdma, send_wr); 49717f5f7f5SChuck Lever } 49817f5f7f5SChuck Lever 4999a6a180bSChuck Lever /* Prepare the portion of the RPC Reply that will be transmitted 5009a6a180bSChuck Lever * via RDMA Send. The RPC-over-RDMA transport header is prepared 5019a6a180bSChuck Lever * in sge[0], and the RPC xdr_buf is prepared in following sges. 5029a6a180bSChuck Lever * 5039a6a180bSChuck Lever * Depending on whether a Write list or Reply chunk is present, 5049a6a180bSChuck Lever * the server may send all, a portion of, or none of the xdr_buf. 5059a6a180bSChuck Lever * In the latter case, only the transport header (sge[0]) is 5069a6a180bSChuck Lever * transmitted. 5079a6a180bSChuck Lever * 5089a6a180bSChuck Lever * RDMA Send is the last step of transmitting an RPC reply. Pages 5099a6a180bSChuck Lever * involved in the earlier RDMA Writes are here transferred out 5109a6a180bSChuck Lever * of the rqstp and into the ctxt's page array. These pages are 5119a6a180bSChuck Lever * DMA unmapped by each Write completion, but the subsequent Send 5129a6a180bSChuck Lever * completion finally releases these pages. 5139a6a180bSChuck Lever * 5149a6a180bSChuck Lever * Assumptions: 5159a6a180bSChuck Lever * - The Reply's transport header will never be larger than a page. 516c06b540aSTom Tucker */ 5179a6a180bSChuck Lever static int svc_rdma_send_reply_msg(struct svcxprt_rdma *rdma, 5189a6a180bSChuck Lever __be32 *rdma_argp, __be32 *rdma_resp, 519c06b540aSTom Tucker struct svc_rqst *rqstp, 5209a6a180bSChuck Lever __be32 *wr_lst, __be32 *rp_ch) 521c06b540aSTom Tucker { 5229ec64052SChuck Lever struct svc_rdma_op_ctxt *ctxt; 5239a6a180bSChuck Lever u32 inv_rkey; 5249a6a180bSChuck Lever int ret; 5250e7f011aSTom Tucker 5269a6a180bSChuck Lever dprintk("svcrdma: sending %s reply: head=%zu, pagelen=%u, tail=%zu\n", 5279a6a180bSChuck Lever (rp_ch ? "RDMA_NOMSG" : "RDMA_MSG"), 5289a6a180bSChuck Lever rqstp->rq_res.head[0].iov_len, 5299a6a180bSChuck Lever rqstp->rq_res.page_len, 5309a6a180bSChuck Lever rqstp->rq_res.tail[0].iov_len); 5319a6a180bSChuck Lever 5329ec64052SChuck Lever ctxt = svc_rdma_get_context(rdma); 533c06b540aSTom Tucker 5349a6a180bSChuck Lever ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp, 5359a6a180bSChuck Lever svc_rdma_reply_hdr_len(rdma_resp)); 5369a6a180bSChuck Lever if (ret < 0) 537afd566eaSTom Tucker goto err; 538afd566eaSTom Tucker 5399a6a180bSChuck Lever if (!rp_ch) { 5409a6a180bSChuck Lever ret = svc_rdma_map_reply_msg(rdma, ctxt, 5419a6a180bSChuck Lever &rqstp->rq_res, wr_lst); 5429a6a180bSChuck Lever if (ret < 0) 5433fe04ee9SChuck Lever goto err; 5443fe04ee9SChuck Lever } 545c06b540aSTom Tucker 546c55ab070SChuck Lever svc_rdma_save_io_pages(rqstp, ctxt); 5470bf48289SSteve Wise 5489a6a180bSChuck Lever inv_rkey = 0; 5499a6a180bSChuck Lever if (rdma->sc_snd_w_inv) 5509a6a180bSChuck Lever inv_rkey = svc_rdma_get_inv_rkey(rdma_argp, wr_lst, rp_ch); 5519a6a180bSChuck Lever ret = svc_rdma_post_send_wr(rdma, ctxt, 1 + ret, inv_rkey); 552c06b540aSTom Tucker if (ret) 553afd566eaSTom Tucker goto err; 554c06b540aSTom Tucker 555afd566eaSTom Tucker return 0; 556afd566eaSTom Tucker 557afd566eaSTom Tucker err: 55821515e46SSteve Wise svc_rdma_unmap_dma(ctxt); 559afd566eaSTom Tucker svc_rdma_put_context(ctxt, 1); 5609ec64052SChuck Lever return ret; 561c06b540aSTom Tucker } 562c06b540aSTom Tucker 5634757d90bSChuck Lever /* Given the client-provided Write and Reply chunks, the server was not 5644757d90bSChuck Lever * able to form a complete reply. Return an RDMA_ERROR message so the 5654757d90bSChuck Lever * client can retire this RPC transaction. As above, the Send completion 5664757d90bSChuck Lever * routine releases payload pages that were part of a previous RDMA Write. 5674757d90bSChuck Lever * 5684757d90bSChuck Lever * Remote Invalidation is skipped for simplicity. 5694757d90bSChuck Lever */ 5704757d90bSChuck Lever static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, 5714757d90bSChuck Lever __be32 *rdma_resp, struct svc_rqst *rqstp) 5724757d90bSChuck Lever { 5734757d90bSChuck Lever struct svc_rdma_op_ctxt *ctxt; 5744757d90bSChuck Lever __be32 *p; 5754757d90bSChuck Lever int ret; 5764757d90bSChuck Lever 5774757d90bSChuck Lever ctxt = svc_rdma_get_context(rdma); 5784757d90bSChuck Lever 5794757d90bSChuck Lever /* Replace the original transport header with an 5804757d90bSChuck Lever * RDMA_ERROR response. XID etc are preserved. 5814757d90bSChuck Lever */ 5824757d90bSChuck Lever p = rdma_resp + 3; 5834757d90bSChuck Lever *p++ = rdma_error; 5844757d90bSChuck Lever *p = err_chunk; 5854757d90bSChuck Lever 5864757d90bSChuck Lever ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp, 20); 5874757d90bSChuck Lever if (ret < 0) 5884757d90bSChuck Lever goto err; 5894757d90bSChuck Lever 5904757d90bSChuck Lever svc_rdma_save_io_pages(rqstp, ctxt); 5914757d90bSChuck Lever 5924757d90bSChuck Lever ret = svc_rdma_post_send_wr(rdma, ctxt, 1 + ret, 0); 5934757d90bSChuck Lever if (ret) 5944757d90bSChuck Lever goto err; 5954757d90bSChuck Lever 5964757d90bSChuck Lever return 0; 5974757d90bSChuck Lever 5984757d90bSChuck Lever err: 5994757d90bSChuck Lever pr_err("svcrdma: failed to post Send WR (%d)\n", ret); 6004757d90bSChuck Lever svc_rdma_unmap_dma(ctxt); 6014757d90bSChuck Lever svc_rdma_put_context(ctxt, 1); 6024757d90bSChuck Lever return ret; 6034757d90bSChuck Lever } 6044757d90bSChuck Lever 605c06b540aSTom Tucker void svc_rdma_prep_reply_hdr(struct svc_rqst *rqstp) 606c06b540aSTom Tucker { 607c06b540aSTom Tucker } 608c06b540aSTom Tucker 6099a6a180bSChuck Lever /** 6109a6a180bSChuck Lever * svc_rdma_sendto - Transmit an RPC reply 6119a6a180bSChuck Lever * @rqstp: processed RPC request, reply XDR already in ::rq_res 6129a6a180bSChuck Lever * 6139a6a180bSChuck Lever * Any resources still associated with @rqstp are released upon return. 6149a6a180bSChuck Lever * If no reply message was possible, the connection is closed. 6159a6a180bSChuck Lever * 6169a6a180bSChuck Lever * Returns: 6179a6a180bSChuck Lever * %0 if an RPC reply has been successfully posted, 6189a6a180bSChuck Lever * %-ENOMEM if a resource shortage occurred (connection is lost), 6199a6a180bSChuck Lever * %-ENOTCONN if posting failed (connection is lost). 6209a6a180bSChuck Lever */ 621c06b540aSTom Tucker int svc_rdma_sendto(struct svc_rqst *rqstp) 622c06b540aSTom Tucker { 623c06b540aSTom Tucker struct svc_xprt *xprt = rqstp->rq_xprt; 624c06b540aSTom Tucker struct svcxprt_rdma *rdma = 625c06b540aSTom Tucker container_of(xprt, struct svcxprt_rdma, sc_xprt); 6269a6a180bSChuck Lever __be32 *p, *rdma_argp, *rdma_resp, *wr_lst, *rp_ch; 6279a6a180bSChuck Lever struct xdr_buf *xdr = &rqstp->rq_res; 628c06b540aSTom Tucker struct page *res_page; 6299a6a180bSChuck Lever int ret; 630c06b540aSTom Tucker 6319a6a180bSChuck Lever /* Find the call's chunk lists to decide how to send the reply. 6329a6a180bSChuck Lever * Receive places the Call's xprt header at the start of page 0. 633e5523bd2SChuck Lever */ 634e5523bd2SChuck Lever rdma_argp = page_address(rqstp->rq_pages[0]); 6359a6a180bSChuck Lever svc_rdma_get_write_arrays(rdma_argp, &wr_lst, &rp_ch); 636c06b540aSTom Tucker 6379a6a180bSChuck Lever dprintk("svcrdma: preparing response for XID 0x%08x\n", 6389a6a180bSChuck Lever be32_to_cpup(rdma_argp)); 639c06b540aSTom Tucker 640e4eb42ceSChuck Lever /* Create the RDMA response header. xprt->xpt_mutex, 641e4eb42ceSChuck Lever * acquired in svc_send(), serializes RPC replies. The 642e4eb42ceSChuck Lever * code path below that inserts the credit grant value 643e4eb42ceSChuck Lever * into each transport header runs only inside this 644e4eb42ceSChuck Lever * critical section. 645e4eb42ceSChuck Lever */ 64678da2b3cSChuck Lever ret = -ENOMEM; 64778da2b3cSChuck Lever res_page = alloc_page(GFP_KERNEL); 64878da2b3cSChuck Lever if (!res_page) 64978da2b3cSChuck Lever goto err0; 650c06b540aSTom Tucker rdma_resp = page_address(res_page); 65198fc21d3SChuck Lever 6529a6a180bSChuck Lever p = rdma_resp; 6539a6a180bSChuck Lever *p++ = *rdma_argp; 6549a6a180bSChuck Lever *p++ = *(rdma_argp + 1); 65598fc21d3SChuck Lever *p++ = rdma->sc_fc_credits; 6569a6a180bSChuck Lever *p++ = rp_ch ? rdma_nomsg : rdma_msg; 65798fc21d3SChuck Lever 65898fc21d3SChuck Lever /* Start with empty chunks */ 65998fc21d3SChuck Lever *p++ = xdr_zero; 66098fc21d3SChuck Lever *p++ = xdr_zero; 66198fc21d3SChuck Lever *p = xdr_zero; 662c06b540aSTom Tucker 6639a6a180bSChuck Lever if (wr_lst) { 6649a6a180bSChuck Lever /* XXX: Presume the client sent only one Write chunk */ 6659a6a180bSChuck Lever ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr); 66608ae4e7fSChuck Lever if (ret < 0) 6674757d90bSChuck Lever goto err2; 6689a6a180bSChuck Lever svc_rdma_xdr_encode_write_list(rdma_resp, wr_lst, ret); 66908ae4e7fSChuck Lever } 6709a6a180bSChuck Lever if (rp_ch) { 6719a6a180bSChuck Lever ret = svc_rdma_send_reply_chunk(rdma, rp_ch, wr_lst, xdr); 67208ae4e7fSChuck Lever if (ret < 0) 6734757d90bSChuck Lever goto err2; 6749a6a180bSChuck Lever svc_rdma_xdr_encode_reply_chunk(rdma_resp, rp_ch, ret); 67508ae4e7fSChuck Lever } 676c06b540aSTom Tucker 6779ec64052SChuck Lever ret = svc_rdma_post_recv(rdma, GFP_KERNEL); 6789ec64052SChuck Lever if (ret) 6799ec64052SChuck Lever goto err1; 6809a6a180bSChuck Lever ret = svc_rdma_send_reply_msg(rdma, rdma_argp, rdma_resp, rqstp, 6819a6a180bSChuck Lever wr_lst, rp_ch); 6823e1eeb98SChuck Lever if (ret < 0) 6839995237bSChuck Lever goto err0; 6849a6a180bSChuck Lever return 0; 685afd566eaSTom Tucker 6864757d90bSChuck Lever err2: 687b20dae70SColin Ian King if (ret != -E2BIG && ret != -EINVAL) 6884757d90bSChuck Lever goto err1; 6894757d90bSChuck Lever 6904757d90bSChuck Lever ret = svc_rdma_post_recv(rdma, GFP_KERNEL); 6914757d90bSChuck Lever if (ret) 6924757d90bSChuck Lever goto err1; 6934757d90bSChuck Lever ret = svc_rdma_send_error_msg(rdma, rdma_resp, rqstp); 6944757d90bSChuck Lever if (ret < 0) 6954757d90bSChuck Lever goto err0; 6964757d90bSChuck Lever return 0; 6974757d90bSChuck Lever 698afd566eaSTom Tucker err1: 699afd566eaSTom Tucker put_page(res_page); 700afd566eaSTom Tucker err0: 7019ec64052SChuck Lever pr_err("svcrdma: Could not send reply, err=%d. Closing transport.\n", 7029ec64052SChuck Lever ret); 7039a6a180bSChuck Lever set_bit(XPT_CLOSE, &xprt->xpt_flags); 7043e1eeb98SChuck Lever return -ENOTCONN; 705c06b540aSTom Tucker } 706