#
505936f5 |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
9cbc94fb |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Remove TCP socket linger code
Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state.
Signed-off-by: Trond Mykle
SUNRPC: Remove TCP socket linger code
Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
4efdd92c |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Remove TCP client connection reset hack
Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3.
Signed-off-by: Trond Myklebust <trond.myklebust@prim
SUNRPC: Remove TCP client connection reset hack
Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
de84d890 |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: TCP/UDP always close the old socket before reconnecting
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the corr
SUNRPC: TCP/UDP always close the old socket before reconnecting
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
718ba5b8 |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Add helpers to prevent socket create from racing
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the
SUNRPC: Add helpers to prevent socket create from racing
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up.
This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
6cc7e908 |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Ensure xs_reset_transport() resets the close connection flags
Otherwise, we may end up looping.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
76698b23 |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Do not clear the source port in xs_reset_transport
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so
SUNRPC: Do not clear the source port in xs_reset_transport
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
3913c78c |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Handle EADDRINUSE on connect
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we
SUNRPC: Handle EADDRINUSE on connect
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
4dda9c8a |
| 08-Feb-2015 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Set SO_REUSEPORT socket option for TCP connections
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client.
SUNRPC: Set SO_REUSEPORT socket option for TCP connections
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
f895b252 |
| 17-Nov-2014 |
Jeff Layton <jlayton@primarydata.com> |
sunrpc: eliminate RPC_DEBUG
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebus
sunrpc: eliminate RPC_DEBUG
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
1a867a08 |
| 28-Oct-2014 |
Jeff Layton <jlayton@primarydata.com> |
sunrpc: add tracepoints in xs_tcp_data_recv
Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff La
sunrpc: add tracepoints in xs_tcp_data_recv
Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
3705ad64 |
| 28-Oct-2014 |
Jeff Layton <jlayton@primarydata.com> |
sunrpc: add new tracepoints in xprt handling code
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Mykl
sunrpc: add new tracepoints in xprt handling code
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
1aff5256 |
| 23-Sep-2014 |
NeilBrown <neilb@suse.de> |
NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't
NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely.
So we don't need to set PF_FSTRANS for sunrpc network operations any more.
Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
3dedbb5c |
| 24-Sep-2014 |
Jason Baron <jbaron@akamai.com> |
rpc: Add -EPERM processing for xs_udp_send_request()
If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured,
rpc: Add -EPERM processing for xs_udp_send_request()
If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM is ignored since the prior bits of the packet may have been successfully queued and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request() thinks that because some bits were queued it should return -EAGAIN. We then try the request again and again, resulting in cpu spinning. Reproducer:
1) open a file on the nfs server '/nfs/foo' (mounted using udp) 2) iptables -A OUTPUT -d <nfs server ip> -j DROP 3) write to /nfs/foo 4) close /nfs/foo 5) iptables -D OUTPUT -d <nfs server ip> -j DROP
The softlockup occurs in step 4 above.
The previous patch, allows xs_sendpages() to return both a sent count and any error values that may have occurred. Thus, if we get an -EPERM, return that to the higher level code.
With this patch in place we can successfully abort the above sequence and avoid the softlockup.
I also tried the above test case on an nfs mount on tcp and although the system does not softlockup, I still ended up with the 'hung_task' firing after 120 seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix, since -EPERM appears to get ignored much lower down in the stack and does not propogate up to xs_sendpages(). This case is not quite as insidious as the softlockup and it is not addressed here.
Reported-by: Yigong Lou <ylou@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
f279cd00 |
| 24-Sep-2014 |
Jason Baron <jbaron@akamai.com> |
rpc: return sent and err from xs_sendpages()
If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicat
rpc: return sent and err from xs_sendpages()
If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicating success. Callers seem to treat this as -EAGAIN.
However, there are cases where its not a question of waiting for the write queue to drain. For example, when there is an iptables rule dropping packets to the destination, the lower level code can return -EPERM only after parts of the packet have been successfully queued. In this case, we can end up continuously retrying resulting in a kernel softlockup.
This patch is intended to make no changes in behavior but is in preparation for subsequent patches that can make decisions based on both on the number of bytes sent by xs_sendpages() and any errors that may have be returned.
Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
a743419f |
| 23-Sep-2014 |
Benjamin Coddington <bcodding@redhat.com> |
SUNRPC: Don't wake tasks during connection abort
When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if
SUNRPC: Don't wake tasks during connection abort
When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect().
This may also avoid a potential conflict on the socket's lock.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
0f7a622c |
| 05-Sep-2014 |
Chris Perl <chris.perl@gmail.com> |
rpc: xs_bind - do not bind when requesting a random ephemeral port
When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen
rpc: xs_bind - do not bind when requesting a random ephemeral port
When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used.
The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of *any* TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system.
This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same.
This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections.
Signed-off-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
00cfaa94 |
| 21-Jun-2014 |
Daniel Walter <dwalter@google.com> |
replace strict_strto calls
Replace obsolete strict_strto calls with appropriate kstrto calls
Signed-off-by: Daniel Walter <dwalter@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primar
replace strict_strto calls
Replace obsolete strict_strto calls with appropriate kstrto calls
Signed-off-by: Daniel Walter <dwalter@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
3601c4a9 |
| 30-Jun-2014 |
Trond Myklebust <trond.myklebust@primarydata.com> |
SUNRPC: Ensure that we handle ENOBUFS errors correctly.
Currently, an ENOBUFS error will result in a fatal error for the RPC call. Normally, we will just want to wait and then retry.
Signed-off-by:
SUNRPC: Ensure that we handle ENOBUFS errors correctly.
Currently, an ENOBUFS error will result in a fatal error for the RPC call. Normally, we will just want to wait and then retry.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
0f8066bd |
| 23-May-2014 |
Tom Herbert <therbert@google.com> |
sunrpc: Remove sk_no_check setting
Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.ne
sunrpc: Remove sk_no_check setting
Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4e857c58 |
| 17-Mar-2014 |
Peter Zijlstra <peterz@infradead.org> |
arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.co
arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
676d2369 |
| 11-Apr-2014 |
David S. Miller <davem@davemloft.net> |
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk,
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'.
So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
642aab58 |
| 23-Mar-2014 |
Kinglong Mee <kinglongmee@gmail.com> |
SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp, because rpc_ping (which is in rpc_create) will using it.
Signed-off-b
SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp, because rpc_ping (which is in rpc_create) will using it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
d531c008 |
| 23-Mar-2014 |
Kinglong Mee <kinglongmee@gmail.com> |
NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Si
NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
47f72efa |
| 23-Mar-2014 |
Kinglong Mee <kinglongmee@gmail.com> |
NFSD: Free backchannel xprt in bc_destroy
Backchannel xprt isn't freed right now. Free it in bc_destroy, and put the reference of THIS_MODULE.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Si
NFSD: Free backchannel xprt in bc_destroy
Backchannel xprt isn't freed right now. Free it in bc_destroy, and put the reference of THIS_MODULE.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|