#
7f391546 |
| 24-Dec-2018 |
Vasily Averin <vvs@virtuozzo.com> |
sunrpc: remove svc_tcp_bc_class
Remove svc_xprt_class svc_tcp_bc_class and related functions
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
#
a289ce53 |
| 24-Dec-2018 |
Vasily Averin <vvs@virtuozzo.com> |
sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer. To prevent its misuse in future it is replaced by new boolean flag.
Signed-
sunrpc: replace svc_serv->sv_bc_xprt by boolean flag
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer. To prevent its misuse in future it is replaced by new boolean flag.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
d4b09acf |
| 24-Dec-2018 |
Vasily Averin <vvs@virtuozzo.com> |
sunrpc: use-after-free in svc_process_common()
if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common()
svc_process_common() /* Setup
sunrpc: use-after-free in svc_process_common()
if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common()
svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace.
According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags.
All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case.
This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL.
To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12 |
|
#
4c8e5537 |
| 01-Oct-2018 |
Trond Myklebust <trondmy@gmail.com> |
SUNRPC: Simplify TCP receive code
Use the fact that the iov iterators already have functionality for skipping a base offset.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-
SUNRPC: Simplify TCP receive code
Use the fact that the iov iterators already have functionality for skipping a base offset.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
aa563d7b |
| 19-Oct-2018 |
David Howells <dhowells@redhat.com> |
iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most pla
iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6 |
|
#
75c84151 |
| 31-Aug-2018 |
Trond Myklebust <trond.myklebust@hammerspace.com> |
SUNRPC: Rename xprt->recv_lock to xprt->queue_lock
We will use the same lock to protect both the transmit and receive queues.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
Revision tags: v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17, v4.16 |
|
#
ece200dd |
| 27-Mar-2018 |
Chuck Lever <chuck.lever@oracle.com> |
sunrpc: Save remote presentation address in svc_xprt for trace events
TP_printk defines a format string that is passed to user space for converting raw trace event records to something human-readabl
sunrpc: Save remote presentation address in svc_xprt for trace events
TP_printk defines a format string that is passed to user space for converting raw trace event records to something human-readable.
My user space's printf (Oracle Linux 7), however, does not have a %pI format specifier. The result is that what is supposed to be an IP address in the output of "trace-cmd report" is just a string that says the field couldn't be displayed.
To fix this, adopt the same approach as the client: maintain a pre- formated presentation address for occasions when %pI is not available.
The location of the trace_svc_send trace point is adjusted so that rqst->rq_xprt is not NULL when the trace event is recorded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
989f881e |
| 27-Mar-2018 |
Chuck Lever <chuck.lever@oracle.com> |
svc: Simplify ->xpo_secure_port
Clean up: Instead of returning a value that is used to set or clear a bit, just make ->xpo_secure_port mangle that bit, and return void.
Signed-off-by: Chuck Lever <
svc: Simplify ->xpo_secure_port
Clean up: Instead of returning a value that is used to set or clear a bit, just make ->xpo_secure_port mangle that bit, and return void.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
9b2c45d4 |
| 12-Feb-2018 |
Denys Vlasenko <dvlasenk@redhat.com> |
net: make getname() functions return length rather than use int* parameter
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustr
net: make getname() functions return length rather than use int* parameter
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c
Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success.
"int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need.
None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it.
This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error.
Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way.
Userspace API is not changed.
text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.15 |
|
#
d0945caa |
| 08-Jan-2018 |
Christoph Hellwig <hch@lst.de> |
sunrpc: remove dead code in svc_sock_setbufsize
Setting values in struct sock directly is the usual method. Remove the long dead code using set_fs() and the related comment.
Signed-off-by: Christo
sunrpc: remove dead code in svc_sock_setbufsize
Setting values in struct sock directly is the usual method. Remove the long dead code using set_fs() and the related comment.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.13.16, v4.14, v4.13.5, v4.13, v4.12 |
|
#
1e4ed1b7 |
| 01-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
svc_recvfrom(): switch to sock_recvmsg()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2412e927 |
| 01-Aug-2017 |
Chuck Lever <chuck.lever@oracle.com> |
sunrpc: Const-ify instances of struct svc_xprt_ops
Close an attack vector by moving the arrays of server-side transport methods to read-only memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.c
sunrpc: Const-ify instances of struct svc_xprt_ops
Close an attack vector by moving the arrays of server-side transport methods to read-only memory.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
eebe53e8 |
| 21-Aug-2017 |
Vadim Lomovtsev <vlomovts@redhat.com> |
net: sunrpc: svcsock: fix NULL-pointer exception
While running nfs/connectathon tests kernel NULL-pointer exception has been observed due to races in svcsock.c.
Race is appear when kernel accepts c
net: sunrpc: svcsock: fix NULL-pointer exception
While running nfs/connectathon tests kernel NULL-pointer exception has been observed due to races in svcsock.c.
Race is appear when kernel accepts connection by kernel_accept (which creates new socket) and start queuing ingress packets to new socket. This happens in ksoftirq context which could run concurrently on a different core while new socket setup is not done yet.
The fix is to re-order socket user data init sequence and add write/read barrier calls to be sure that we got proper values for callback pointers before actually calling them.
Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
Crash log: ---<-snip->--- [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 6708.647093] pgd = ffff0000094e0000 [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000 [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021] [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1 [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017 [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000 [ 6708.773822] PC is at 0x0 [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc] [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145 [ 6708.789248] sp : ffff810ffbad3900 [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58 [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00 [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28 [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000 [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00 [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2 [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000 [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001 [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000 [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000 [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000 [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000 [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000 [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00 [ 6708.872084] [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000) [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000) [ 6708.886075] Call trace: [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840) [ 6708.894942] 3700: ffff810012412400 0001000000000000 [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000 [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000 [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000 [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140 [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000 [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028 [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000 [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000 [ 6708.973117] [< (null)>] (null) [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58 [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254 [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4 [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8 [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84 [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8 [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf] [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf] [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464 [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308 [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174 [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4 [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190 [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20) [ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000 [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18 [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80 [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4 [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50 [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000 [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000 [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20 [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145 [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238 [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140 [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30 [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4 [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30 [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160 [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4 [ 6709.207967] Code: bad PC value [ 6709.211061] SMP: stopping secondary CPUs [ 6709.218830] Starting crashdump kernel... [ 6709.222749] Bye! ---<-snip>---
Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
ce7c252a |
| 16-Aug-2017 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Add a separate spinlock to protect the RPC request receive list
This further reduces contention with the transport_lock, and allows us to convert to using a non-bh-safe spinlock, since the l
SUNRPC: Add a separate spinlock to protect the RPC request receive list
This further reduces contention with the transport_lock, and allows us to convert to using a non-bh-safe spinlock, since the list is now never accessed from a bh context.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2 |
|
#
5427290d |
| 07-Mar-2017 |
Kinglong Mee <kinglongmee@gmail.com> |
SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
The xprt for backchannel is created separately, not in TCP/UDP code. It needs the XPT_CONG_CTRL flag set on it too--otherwise requests on the N
SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
The xprt for backchannel is created separately, not in TCP/UDP code. It needs the XPT_CONG_CTRL flag set on it too--otherwise requests on the NFSv4.1 backchannel are rjected in svc_process_common():
1191 if (versp->vs_need_cong_ctrl && 1192 !test_bit(XPT_CONG_CTRL, &rqstp->rq_xprt->xpt_flags)) 1193 goto err_bad_vers;
Fixes: 5283b03ee5 ("nfs/nfsd/sunrpc: enforce transport...") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
5b5e0928 |
| 27-Feb-2017 |
Alexey Dobriyan <adobriyan@gmail.com> |
lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM driver
lib/vsprintf.c: remove %Z support
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller.
Use BUILD_BUG_ON in a couple ATM drivers.
In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more.
Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v4.10.1 |
|
#
362142b2 |
| 24-Feb-2017 |
Jeff Layton <jlayton@redhat.com> |
sunrpc: flag transports as having congestion control
NFSv4 requires a transport protocol with congestion control in most cases.
On an IP network, that means that NFSv4 over UDP should be forbidden.
sunrpc: flag transports as having congestion control
NFSv4 requires a transport protocol with congestion control in most cases.
On an IP network, that means that NFSv4 over UDP should be forbidden.
The situation with RDMA is a bit more nuanced, but most RDMA transports are suitable for this. For now, we assume that all RDMA transports are suitable, but we may need to revise that at some point.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.10 |
|
#
2456e855 |
| 25-Dec-2016 |
Thomas Gleixner <tglx@linutronix.de> |
ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machin
ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
show more ...
|
#
7c0f6ba6 |
| 24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PA
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32 |
|
#
ea08e392 |
| 11-Nov-2016 |
Scott Mayhew <smayhew@redhat.com> |
sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
This fixes the following panic that can occur with NFSoRDMA.
general protection fault: 0000 [#1] SMP Modules linked in:
sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
This fixes the following panic that can occur with NFSoRDMA.
general protection fault: 0000 [#1] SMP Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core tg3 crct10dif_pclmul drm crct10dif_common ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1 Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015 Workqueue: events check_lifetime task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000 RIP: 0010:[<ffffffff8168d847>] [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50 RSP: 0018:ffff88031f587ba8 EFLAGS: 00010206 RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072 RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77 R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072 R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff880322a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002 ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32 0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0 Call Trace: [<ffffffff81557830>] lock_sock_nested+0x20/0x50 [<ffffffff8155ae08>] sock_setsockopt+0x78/0x940 [<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50 [<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50 [<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc] [<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd] [<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70 [<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70 [<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0 [<ffffffff815e8cef>] check_lifetime+0x25f/0x270 [<ffffffff810a7f3b>] process_one_work+0x17b/0x470 [<ffffffff810a8d76>] worker_thread+0x126/0x410 [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460 [<ffffffff810b052f>] kthread+0xcf/0xe0 [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81696418>] ret_from_fork+0x58/0x90 [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f RIP [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50 RSP <ffff88031f587ba8>
Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.4.31 |
|
#
7c13f97f |
| 04-Nov-2016 |
Paolo Abeni <pabeni@redhat.com> |
udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide an explicit skb destructor, invoked under the receive queue lock. The UDP protocol uses such argume
udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide an explicit skb destructor, invoked under the receive queue lock. The UDP protocol uses such argument to perform memory reclaiming on dequeue, so that the UDP protocol does not set anymore skb->desctructor. Instead explicit memory reclaiming is performed at close() time and when skbs are removed from the receive queue. The in kernel UDP protocol users now need to call a skb_recv_udp() variant instead of skb_recv_datagram() to properly perform memory accounting on dequeue.
Overall, this allows acquiring only once the receive queue lock on dequeue.
Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport.
nr sinks vanilla patched 1 440 560 3 2150 2300 6 3650 3800 9 4450 4600 12 6250 6450
v1 -> v2: - do rmem and allocated memory scheduling under the receive lock - do bulk scheduling in first_packet_length() and in udp_destruct_sock() - avoid the typdef for the dequeue callback
Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10 |
|
#
850cbadd |
| 21-Oct-2016 |
Paolo Abeni <pabeni@redhat.com> |
udp: use it's own memory accounting schema
Completely avoid default sock memory accounting and replace it with udp-specific accounting.
Since the new memory accounting model encapsulates completely
udp: use it's own memory accounting schema
Completely avoid default sock memory accounting and replace it with udp-specific accounting.
Since the new memory accounting model encapsulates completely the required locking, remove the socket lock on both enqueue and dequeue, and avoid using the backlog on enqueue.
Be sure to clean-up rx queue memory on socket destruction, using udp its own sk_destruct.
Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport.
nr readers Kpps (vanilla) Kpps (patched) 1 170 440 3 1250 2150 6 3000 3650 9 4200 4450 12 5700 6250
v4 -> v5: - avoid unneeded test in first_packet_length
v3 -> v4: - remove useless sk_rcvqueues_full() call
v2 -> v3: - do not set the now unsed backlog_rcv callback
v1 -> v2: - add memory pressure support - fixed dropwatch accounting for ipv6
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16 |
|
#
c7995f8a |
| 26-Jul-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Detect immediate closure of accepted sockets
This modification is useful for debugging issues that happen while the socket is being initialised.
Signed-off-by: Trond Myklebust <trond.mykleb
SUNRPC: Detect immediate closure of accepted sockets
This modification is useful for debugging issues that happen while the socket is being initialised.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
b2f21f7d |
| 26-Jul-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: accept() may return sockets that are still in SYN_RECV
We're seeing traces of the following form:
[10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2 [10952.396351] svc: tcp
SUNRPC: accept() may return sockets that are still in SYN_RECV
We're seeing traces of the following form:
[10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2 [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80 [10952.396362] nfsd: connect from 10.2.6.1, port=187 [10952.396364] svc: svc_setup_socket ffff8800b99bcf00 [10952.396368] setting up TCP socket for reading [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800) [10952.396373] svc: transport ffff8803eb10a000 put into queue [10952.396375] svc: transport ffff88042ba4a000 put into queue [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000) [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2 [10952.396381] svc_recv: found XPT_CLOSE [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000) [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000) [10952.396399] svc: svc_sock_detach(ffff8803eb10a000) [10952.396412] svc: svc_sock_free(ffff8803eb10a000)
i.e. an immediate close of the socket after initialisation.
The culprit appears to be the test at the end of svc_tcp_init, which checks if the newly created socket is in the TCP_ESTABLISHED state, and immediately closes it if not. The evidence appears to suggest that the socket might still be in the SYN_RECV state at this time.
The fix is to check for both states, and then to add a check in svc_tcp_state_change() to ensure we don't close the socket when it transitions into TCP_ESTABLISHED.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
Revision tags: v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14 |
|
#
637600f3 |
| 24-Jun-2016 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Change TCP socket space reservation
The current server rpc tcp code attempts to predict how much writeable socket space will be available to a given RPC call before accepting it for processi
SUNRPC: Change TCP socket space reservation
The current server rpc tcp code attempts to predict how much writeable socket space will be available to a given RPC call before accepting it for processing. On a 40GigE network, we've found this throttles individual clients long before the network or disk is saturated. The server may handle more clients easily, but the bandwidth of individual clients is still artificially limited.
Instead of trying (and failing) to predict how much writeable socket space will be available to the RPC call, just fall back to the simple model of deferring processing until the socket is uncongested.
This may increase the risk of fast clients starving slower clients; in such cases, the previous patch allows setting a hard per-connection limit.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|