#
9faaff59 |
| 24-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls
Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS f
rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls
Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem.
The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller.
In such a case, the following warning is produced.
====================================================== WARNING: possible circular locking dependency detected 4.14.0-fscache+ #243 Tainted: G E ------------------------------------------------------ modpost/16701 is trying to acquire lock: (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
but task is already holding lock: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75
-> #2 (&call->user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75
-> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75
-> #0 (&vnode->io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75
other info that might help us debug this:
Chain exists of: &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&call->user_mutex); lock(&mm->mmap_sem); lock(&vnode->io_lock);
*** DEADLOCK ***
1 lock held by modpost/16701: #0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:ffff880071e93da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720 RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00 R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07 RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900 RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000 R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000 R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.13.16, v4.14 |
|
#
20acbd9a |
| 02-Nov-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Lock around calling a kernel service Rx notification
Place a spinlock around the invocation of call->notify_rx() for a kernel service call and lock again when ending the call and replace the
rxrpc: Lock around calling a kernel service Rx notification
Place a spinlock around the invocation of call->notify_rx() for a kernel service call and lock again when ending the call and replace the notification pointer with a pointer to a dummy function.
This is required because it's possible for rxrpc_notify_socket() to be called after the call has been ended by the kernel service if called from the asynchronous work function rxrpc_process_call().
However, rxrpc_notify_socket() currently only holds the RCU read lock when invoking ->notify_rx(), which means that the afs_call struct would need to be disposed of by call_rcu() rather than by kfree().
But we shouldn't see any notifications from a call after calling rxrpc_kernel_end_call(), so a lock is required in rxrpc code.
Without this, we may see the call wait queue as having a corrupt spinlock:
BUG: spinlock bad magic on CPU#0, kworker/0:2/1612 general protection fault: 0000 [#1] SMP ... Workqueue: krxrpcd rxrpc_process_call task: ffff88040b83c400 task.stack: ffff88040adfc000 RIP: 0010:spin_bug+0x161/0x18f RSP: 0018:ffff88040adffcc0 EFLAGS: 00010002 RAX: 0000000000000032 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81ab16cf RDX: ffff88041fa14c01 RSI: ffff88041fa0ccb8 RDI: ffff88041fa0ccb8 RBP: ffff88040adffcd8 R08: 00000000ffffffff R09: 00000000ffffffff R10: ffff88040adffc60 R11: 000000000000022c R12: ffff88040aca2208 R13: ffffffff81a58114 R14: 0000000000000000 R15: 0000000000000000 .... Call Trace: do_raw_spin_lock+0x1d/0x89 _raw_spin_lock_irqsave+0x3d/0x49 ? __wake_up_common_lock+0x4c/0xa7 __wake_up_common_lock+0x4c/0xa7 ? __lock_is_held+0x47/0x7a __wake_up+0xe/0x10 afs_wake_up_call_waiter+0x11b/0x122 [kafs] rxrpc_notify_socket+0x12b/0x258 rxrpc_process_call+0x18e/0x7d0 process_one_work+0x298/0x4de ? rescuer_thread+0x280/0x280 worker_thread+0x1d1/0x2ae ? rescuer_thread+0x280/0x280 kthread+0x12c/0x134 ? kthread_create_on_node+0x3a/0x3a ret_from_fork+0x27/0x40
In this case, note the corrupt data in EBX. The address of the offending afs_call is in R12, plus the offset to the spinlock.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.13.5, v4.13 |
|
#
c038a58c |
| 29-Aug-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow failed client calls to be retried
Allow a client call that failed on network error to be retried, provided that the Tx queue still holds DATA packet 1. This allows an operation to be s
rxrpc: Allow failed client calls to be retried
Allow a client call that failed on network error to be retried, provided that the Tx queue still holds DATA packet 1. This allows an operation to be submitted to another server or another address for the same server without having to repackage and re-encrypt the data so far processed.
Two new functions are provided:
(1) rxrpc_kernel_check_call() - This is used to find out the completion state of a call to guess whether it can be retried and whether it should be retried.
(2) rxrpc_kernel_retry_call() - Disconnect the call from its current connection, reset the state and submit it as a new client call to a new address. The new address need not match the previous address.
A call may be retried even if all the data hasn't been loaded into it yet; a partially constructed will be retained at the same point it was at when an error condition was detected. msg_data_left() can be used to find out how much data was packaged before the error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
7b674e39 |
| 29-Aug-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix IPv6 support
Fix IPv6 support in AF_RXRPC in the following ways:
(1) When extracting the address from a received IPv4 packet, if the local transport socket is open for IPv6 then fi
rxrpc: Fix IPv6 support
Fix IPv6 support in AF_RXRPC in the following ways:
(1) When extracting the address from a received IPv4 packet, if the local transport socket is open for IPv6 then fill out the sockaddr_rxrpc struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead of an AF_INET one.
(2) When sending CHALLENGE or RESPONSE packets, the transport length needs to be set from the sockaddr_rxrpc::transport_len field rather than sizeof() on the IPv4 transport address.
(3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up the address correctly before searching for the affected peer.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
10674a03 |
| 29-Aug-2017 |
Baolin Wang <baolin.wang@linaro.org> |
net: rxrpc: Replace time_t type with time64_t type
Since the 'expiry' variable of 'struct key_preparsed_payload' has been changed to 'time64_t' type, which is year 2038 safe on 32bits system.
In ne
net: rxrpc: Replace time_t type with time64_t type
Since the 'expiry' variable of 'struct key_preparsed_payload' has been changed to 'time64_t' type, which is year 2038 safe on 32bits system.
In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type when copying ticket expires time to 'prep->expiry', then this patch introduces two helper functions to help convert 'u32' to 'time64_t' type.
This patch also uses ktime_get_real_seconds() to get current time instead of get_seconds() which is not year 2038 safe on 32bits system.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
ddc6c70f |
| 21-Jul-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Move the packet.h include file into net/rxrpc/
Move the protocol description header file into net/rxrpc/ and rename it to protocol.h. It's no longer necessary to expose it as packets are no
rxrpc: Move the packet.h include file into net/rxrpc/
Move the protocol description header file into net/rxrpc/ and rename it to protocol.h. It's no longer necessary to expose it as packets are no longer exposed to kernel services (such as AFS) that use the facility.
The abort codes are transferred to the UAPI header instead as we pass these back to userspace and also to kernel services.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.12 |
|
#
f7aec129 |
| 14-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Cache the congestion window setting
Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the
rxrpc: Cache the congestion window setting
Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm.
The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e754eba6 |
| 07-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Provide a cmsg to specify the amount of Tx data for a call
Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response
rxrpc: Provide a cmsg to specify the amount of Tx data for a call
Provide a control message that can be specified on the first sendmsg() of a client call or the first sendmsg() of a service response to indicate the total length of the data to be transmitted for that call.
Currently, because the length of the payload of an encrypted DATA packet is encrypted in front of the data, the packet cannot be encrypted until we know how much data it will hold.
By specifying the length at the beginning of the transmit phase, each DATA packet length can be set before we start loading data from userspace (where several sendmsg() calls may contribute to a particular packet).
An error will be returned if too little or too much data is presented in the Tx phase.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
4e255721 |
| 05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Add service upgrade support for client connections
Make it possible for a client to use AuriStor's service upgrade facility.
The client does this by adding an RXRPC_UPGRADE_SERVICE control m
rxrpc: Add service upgrade support for client connections
Make it possible for a client to use AuriStor's service upgrade facility.
The client does this by adding an RXRPC_UPGRADE_SERVICE control message to the first sendmsg() of a call. This takes no parameters.
When recvmsg() starts returning data from the call, the service ID field in the returned msg_name will reflect the result of the upgrade attempt. If the upgrade was ignored, srx_service will match what was set in the sendmsg(); if the upgrade happened the srx_service will be altered to indicate the service the server upgraded to.
Note that:
(1) The choice of upgrade service is up to the server
(2) Further client calls to the same server that would share a connection are blocked if an upgrade probe is in progress.
(3) This should only be used to probe the service. Clients should then use the returned service ID in all subsequent communications with that server (and not set the upgrade). Note that the kernel will not retain this information should the connection expire from its cache.
(4) If a server that supports upgrading is replaced by one that doesn't, whilst a connection is live, and if the replacement is running, say, OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement server will not respond to packets sent to the upgraded connection.
At this point, calls will time out and the server must be reprobed.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
4722974d |
| 05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Implement service upgrade
Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with:
(1) Various of the standard AFS RPC calls have IPv4 addre
rxrpc: Implement service upgrade
Implement AuriStor's service upgrade facility. There are three problems that this is meant to deal with:
(1) Various of the standard AFS RPC calls have IPv4 addresses in their requests and/or replies - but there's no room for including IPv6 addresses.
(2) Definition of IPv6-specific RPC operations in the standard operation sets has not yet been achieved.
(3) One could envision the creation a new service on the same port that as the original service. The new service could implement improved operations - and the client could try this first, falling back to the original service if it's not there.
Unfortunately, certain servers ignore packets addressed to a service they don't implement and don't respond in any way - not even with an ABORT. This means that the client must then wait for the call timeout to occur.
What service upgrade does is to see if the connection is marked as being 'upgradeable' and if so, change the service ID in the server and thus the request and reply formats. Note that the upgrade isn't mandatory - a server that supports only the original call set will ignore the upgrade request.
In the protocol, the procedure is then as follows:
(1) To request an upgrade, the first DATA packet in a new connection must have the userStatus set to 1 (this is normally 0). The userStatus value is normally ignored by the server.
(2) If the server doesn't support upgrading, the reply packets will contain the same service ID as for the first request packet.
(3) If the server does support upgrading, all future reply packets on that connection will contain the new service ID and the new service ID will be applied to *all* further calls on that connection as well.
(4) The RPC op used to probe the upgrade must take the same request data as the shadow call in the upgrade set (but may return a different reply). GetCapability RPC ops were added to all standard sets for just this purpose. Ops where the request formats differ cannot be used for probing.
(5) The client must wait for completion of the probe before sending any further RPC ops to the same destination. It should then use the service ID that recvmsg() reported back in all future calls.
(6) The shadow service must have call definitions for all the operation IDs defined by the original service.
To support service upgrading, a server should:
(1) Call bind() twice on its AF_RXRPC socket before calling listen(). Each bind() should supply a different service ID, but the transport addresses must be the same. This allows the server to receive requests with either service ID.
(2) Enable automatic upgrading by calling setsockopt(), specifying RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of unsigned shorts as the argument:
unsigned short optval[2];
This specifies a pair of service IDs. They must be different and must match the service IDs bound to the socket. Member 0 is the service ID to upgrade from and member 1 is the service ID to upgrade to.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
28036f44 |
| 05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Permit multiple service binding
Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictio
rxrpc: Permit multiple service binding
Permit bind() to be called on an AF_RXRPC socket more than once (currently maximum twice) to bind multiple listening services to it. There are some restrictions:
(1) All bind() calls involved must have a non-zero service ID.
(2) The service IDs must all be different.
(3) The rest of the address (notably the transport part) must be the same in all (a single UDP socket is shared).
(4) This must be done before listen() or sendmsg() is called.
This allows someone to connect to the service socket with different service IDs and lays the foundation for service upgrading.
The service ID used by an incoming call can be extracted from the msg_name returned by recvmsg().
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
68d6d1ae |
| 05-Jun-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Separate the connection's protocol service ID from the lookup ID
Keep the rxrpc_connection struct's idea of the service ID that is exposed in the protocol separate from the service ID that's
rxrpc: Separate the connection's protocol service ID from the lookup ID
Keep the rxrpc_connection struct's idea of the service ID that is exposed in the protocol separate from the service ID that's used as a lookup key.
This allows the protocol service ID on a client connection to get upgraded without making the connection unfindable for other client calls that also would like to use the upgraded connection.
The connection's actual service ID is then returned through recvmsg() by way of msg_name.
Whilst we're at it, we get rid of the last_service_id field from each channel. The service ID is per-connection, not per-call and an entire connection is upgraded in one go.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
2baec2c3 |
| 24-May-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Support network namespacing
Support network namespacing in AF_RXRPC with the following changes:
(1) All the local endpoint, peer and call lists, locks, counters, etc. are moved into th
rxrpc: Support network namespacing
Support network namespacing in AF_RXRPC with the following changes:
(1) All the local endpoint, peer and call lists, locks, counters, etc. are moved into the per-namespace record.
(2) All the connection tracking is moved into the per-namespace record with the exception of the client connection ID tree, which is kept global so that connection IDs are kept unique per-machine.
(3) Each namespace gets its own epoch. This allows each network namespace to pretend to be a separate client machine.
(4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and the contents reflect the namespace.
fs/afs/ should be okay with this patch as it explicitly requires the current net namespace to be init_net to permit a mount to proceed at the moment. It will, however, need updating so that cells, IP addresses and DNS records are per-namespace also.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.17, v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10, v4.10.9 |
|
#
fb46f6ee |
| 06-Apr-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Trace protocol errors in received packets
Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received packets. The following changes are made:
(1) Add a function, __rxrpc_abort
rxrpc: Trace protocol errors in received packets
Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received packets. The following changes are made:
(1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a call and mark the call aborted. This is wrapped by rxrpc_abort_eproto() that makes the why string usable in trace.
(2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error generation points, replacing rxrpc_abort_call() with the latter.
(3) Only send an abort packet in rxkad_verify_packet*() if we actually managed to abort the call.
Note that a trace event is also emitted if a kernel user (e.g. afs) tries to send data through a call when it's not in the transmission phase, though it's not technically a receive event.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2 |
|
#
540b1c48 |
| 27-Feb-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix deadlock between call creation and sendmsg/recvmsg
All the routines by which rxrpc is accessed from the outside are serialised by means of the socket lock (sendmsg, recvmsg, bind, rxrpc_k
rxrpc: Fix deadlock between call creation and sendmsg/recvmsg
All the routines by which rxrpc is accessed from the outside are serialised by means of the socket lock (sendmsg, recvmsg, bind, rxrpc_kernel_begin_call(), ...) and this presents a problem:
(1) If a number of calls on the same socket are in the process of connection to the same peer, a maximum of four concurrent live calls are permitted before further calls need to wait for a slot.
(2) If a call is waiting for a slot, it is deep inside sendmsg() or rxrpc_kernel_begin_call() and the entry function is holding the socket lock.
(3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented from servicing the other calls as they need to take the socket lock to do so.
(4) The socket is stuck until a call is aborted and makes its slot available to the waiter.
Fix this by:
(1) Provide each call with a mutex ('user_mutex') that arbitrates access by the users of rxrpc separately for each specific call.
(2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as they've got a call and taken its mutex.
Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is set but someone else has the lock. Should I instead only return EWOULDBLOCK if there's nothing currently to be done on a socket, and sleep in this particular instance because there is something to be done, but we appear to be blocked by the interrupt handler doing its ping?
(3) Make rxrpc_new_client_call() unlock the socket after allocating a new call, locking its user mutex and adding it to the socket's call tree. The call is returned locked so that sendmsg() can add data to it immediately.
From the moment the call is in the socket tree, it is subject to access by sendmsg() and recvmsg() - even if it isn't connected yet.
(4) Lock new service calls in the UDP data_ready handler (in rxrpc_new_incoming_call()) because they may already be in the socket's tree and the data_ready handler makes them live immediately if a user ID has already been preassigned.
Note that the new call is locked before any notifications are sent that it is live, so doing mutex_trylock() *ought* to always succeed. Userspace is prevented from doing sendmsg() on calls that are in a too-early state in rxrpc_do_sendmsg().
(5) Make rxrpc_new_incoming_call() return the call with the user mutex held so that a ping can be scheduled immediately under it.
Note that it might be worth moving the ping call into rxrpc_new_incoming_call() and then we can drop the mutex there.
(6) Make rxrpc_accept_call() take the lock on the call it is accepting and release the socket after adding the call to the socket's tree. This is slightly tricky as we've dequeued the call by that point and have to requeue it.
Note that requeuing emits a trace event.
(7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the new mutex immediately and don't bother with the socket mutex at all.
This patch has the nice bonus that calls on the same socket are now to some extent parallelisable.
Note that we might want to move rxrpc_service_prealloc() calls out from the socket lock and give it its own lock, so that we don't hang progress in other calls because we're waiting for the allocator.
We probably also want to avoid calling rxrpc_notify_socket() from within the socket lock (rxrpc_accept_call()).
Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.c.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.1, v4.10 |
|
#
210f0353 |
| 05-Jan-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Allow listen(sock, 0) to be used to disable listening
Allow listen() with a backlog of 0 to be used to disable listening on an AF_RXRPC socket. This also releases any preallocation, thereby
rxrpc: Allow listen(sock, 0) to be used to disable listening
Allow listen() with a backlog of 0 to be used to disable listening on an AF_RXRPC socket. This also releases any preallocation, thereby making it easier for a kernel service to account for all allocated call structures when shutting down the service.
The socket cannot thereafter have listening reenabled, but must rather be closed and reopened.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
b54a134a |
| 05-Jan-2017 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix handling of enums-to-string translation in tracing
Fix the way enum values are translated into strings in AF_RXRPC tracepoints. The problem with just doing a lookup in a normal flat arra
rxrpc: Fix handling of enums-to-string translation in tracing
Fix the way enum values are translated into strings in AF_RXRPC tracepoints. The problem with just doing a lookup in a normal flat array of strings or chars is that external tracing infrastructure can't find it. Rather, TRACE_DEFINE_ENUM must be used.
Also sort the enums and string tables to make it easier to keep them in order so that a future patch to __print_symbolic() can be optimised to try a direct lookup into the table first before iterating over it.
A couple of _proto() macro calls are removed because they refered to tables that got moved to the tracing infrastructure. The relevant data can be found by way of tracing.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7 |
|
#
9749fd2b |
| 06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Need to produce an ACK for service op if op takes a long time
We need to generate a DELAY ACK from the service end of an operation if we start doing the actual operation work and it takes lon
rxrpc: Need to produce an ACK for service op if op takes a long time
We need to generate a DELAY ACK from the service end of an operation if we start doing the actual operation work and it takes longer than expected. This will hard-ACK the request data and allow the client to release its resources.
To make this work:
(1) We have to set the ack timer and propose an ACK when the call moves to the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel the timer when we start transmitting the reply (the first DATA packet of the reply implicitly ACKs the request phase).
(2) It must be possible to set the timer when the caller is holding call->state_lock, so split the lock-getting part of the timer function out.
(3) Add trace notes for the ACK we're requesting and the timer we clear.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
a5af7e1f |
| 06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs
Separate the output of PING ACKs from the output of other sorts of ACK so that if we receive a PING ACK and schedule transmission of
rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs
Separate the output of PING ACKs from the output of other sorts of ACK so that if we receive a PING ACK and schedule transmission of a PING RESPONSE ACK, the response doesn't get cancelled by a PING ACK we happen to be scheduling transmission of at the same time.
If a PING RESPONSE gets lost, the other side might just sit there waiting for it and refuse to proceed otherwise.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
26cb02aa |
| 06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix warning by splitting rxrpc_send_call_packet()
Split rxrpc_send_data_packet() to separate ACK generation (which is more complicated) from ABORT generation. This simplifies the code a bit
rxrpc: Fix warning by splitting rxrpc_send_call_packet()
Split rxrpc_send_data_packet() to separate ACK generation (which is more complicated) from ABORT generation. This simplifies the code a bit and fixes the following warning:
In file included from ../net/rxrpc/output.c:20:0: net/rxrpc/output.c: In function 'rxrpc_send_call_packet': net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized] net/rxrpc/output.c:103:24: note: 'top' was declared here net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
19c0dbd5 |
| 06-Oct-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Fix duplicate const
Remove a duplicate const keyword.
Signed-off-by: David Howells <dhowells@redhat.com>
|
Revision tags: v4.8, v4.4.23, v4.7.6 |
|
#
df0adc78 |
| 26-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Keep the call timeouts as ktimes rather than jiffies
Keep that call timeouts as ktimes rather than jiffies so that they can be expressed as functions of RTT.
Signed-off-by: David Howells <dh
rxrpc: Keep the call timeouts as ktimes rather than jiffies
Keep that call timeouts as ktimes rather than jiffies so that they can be expressed as functions of RTT.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
c31410ea |
| 30-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Remove error from struct rxrpc_skb_priv as it is unused
Remove error from struct rxrpc_skb_priv as it is no longer used.
Signed-off-by: David Howells <dhowells@redhat.com>
|
#
775e5b71 |
| 30-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: The offset field in struct rxrpc_skb_priv is unnecessary
The offset field in struct rxrpc_skb_priv is unnecessary as the value can always be calculated.
Signed-off-by: David Howells <dhowell
rxrpc: The offset field in struct rxrpc_skb_priv is unnecessary
The offset field in struct rxrpc_skb_priv is unnecessary as the value can always be calculated.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|
#
1e9e5c95 |
| 29-Sep-2016 |
David Howells <dhowells@redhat.com> |
rxrpc: Reduce the rxrpc_local::services list to a pointer
Reduce the rxrpc_local::services list to just a pointer as we don't permit multiple service endpoints to bind to a single transport endpoint
rxrpc: Reduce the rxrpc_local::services list to a pointer
Reduce the rxrpc_local::services list to just a pointer as we don't permit multiple service endpoints to bind to a single transport endpoints (this is excluded by rxrpc_lookup_local()).
The reason we don't allow this is that if you send a request to an AFS filesystem service, it will try to talk back to your cache manager on the port you sent from (this is how file change notifications are handled). To prevent someone from stealing your CM callbacks, we don't let AF_RXRPC sockets share a UDP socket if at least one of them has a service bound.
Signed-off-by: David Howells <dhowells@redhat.com>
show more ...
|