#
648845ab |
| 14-Dec-2017 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
sock: Move the socket inuse to namespace.
In some case, we want to know how many sockets are in use in different _net_ namespaces. It's a key resource metric.
This patch add a member in struct netn
sock: Move the socket inuse to namespace.
In some case, we want to know how many sockets are in use in different _net_ namespaces. It's a key resource metric.
This patch add a member in struct netns_core. This is a counter for socket-inuse in the _net_ namespace. The patch will add/sub counter in the sk_alloc, sk_clone_lock and __sk_free.
This patch will not counter the socket created in kernel. It's not very useful for userspace to know how many kernel sockets we created.
The main reasons for doing this are that:
1. When linux calls the 'do_exit' for process to exit, the functions 'exit_task_namespaces' and 'exit_task_work' will be called sequentially. 'exit_task_namespaces' may have destroyed the _net_ namespace, but 'sock_release' called in 'exit_task_work' may use the _net_ namespace if we counter the socket-inuse in sock_release.
2. socket and sock are in pair. More important, sock holds the _net_ namespace. We counter the socket-inuse in sock, for avoiding holding _net_ namespace again in socket. It's a easy way to maintain the code.
Signed-off-by: Martin Zhang <zhangjunweimartin@didichuxing.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8e1611e2 |
| 05-Dec-2017 |
Al Viro <viro@ZenIV.linux.org.uk> |
make sock_alloc_file() do sock_release() on failures
This changes calling conventions (and simplifies the hell out the callers). New rules: once struct socket had been passed to sock_alloc_file(),
make sock_alloc_file() do sock_release() on failures
This changes calling conventions (and simplifies the hell out the callers). New rules: once struct socket had been passed to sock_alloc_file(), it's been consumed either by struct file or by sock_release() done by sock_alloc_file(). Either way the caller should not do sock_release() after that point.
Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
016a266b |
| 05-Dec-2017 |
Al Viro <viro@ZenIV.linux.org.uk> |
socketpair(): allocate descriptors first
simplifies failure exits considerably...
Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dav
socketpair(): allocate descriptors first
simplifies failure exits considerably...
Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ade994f4 |
| 02-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
net: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e6c8adca |
| 03-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
anntotate the places where ->poll() return values go
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
49502766 |
| 15-Nov-2017 |
Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com> |
kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.
As discussed at LSF/MM, kill kmemcheck.
KASan is a replacement that is able to work without the limitation of kmemcheck
kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.
As discussed at LSF/MM, kill kmemcheck.
KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream.
We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement).
The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again.
Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons.
This patch (of 4):
Remove kmemcheck annotations, and calls to kmemcheck from the kernel.
[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
db5980d8 |
| 16-Aug-2017 |
John Fastabend <john.fastabend@gmail.com> |
net: fixes for skb_send_sock
A couple fixes to new skb_send_sock infrastructure. However, no users currently exist for this code (adding user in next handful of patches) so it should not be possible
net: fixes for skb_send_sock
A couple fixes to new skb_send_sock infrastructure. However, no users currently exist for this code (adding user in next handful of patches) so it should not be possible to trigger a panic with existing in-kernel code.
Fixes: 306b13eb3cf9 ("proto_ops: Add locked held versions of sendmsg and sendpage") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
306b13eb |
| 28-Jul-2017 |
Tom Herbert <tom@quantonium.net> |
proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be called when the socket lock is already held. Correspondingly, add kernel_
proto_ops: Add locked held versions of sendmsg and sendpage
Add new proto_ops sendmsg_locked and sendpage_locked that can be called when the socket lock is already held. Correspondingly, add kernel_sendmsg_locked and kernel_sendpage_locked as front end functions.
These functions will be used in zero proxy so that we can take the socket lock in a ULP sendmsg/sendpage and then directly call the backend transport proto_ops functions.
Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
614d79c0 |
| 24-Jul-2017 |
stephen hemminger <stephen@networkplumber.org> |
socket: fix set not used warning
The variable owned_by_user is always set, but only used when kernel is configured with LOCKDEP enabled.
Get rid of the warning by moving the code to put the call to
socket: fix set not used warning
The variable owned_by_user is always set, but only used when kernel is configured with LOCKDEP enabled.
Get rid of the warning by moving the code to put the call to owned_by_user into the the rcu_protected call.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
864d9664 |
| 21-Jul-2017 |
Paolo Abeni <pabeni@redhat.com> |
net/socket: fix type in assignment and trim long line
The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin") introduce a new sparse warning:
net/socket.c:1919:27: war
net/socket: fix type in assignment and trim long line
The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin") introduce a new sparse warning:
net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces) net/socket.c:1919:27: expected void *msg_control net/socket.c:1919:27: got void [noderef] <asn:1>*[addressable] msg_control
and a line above 80 chars, let's fix them
Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ffb07550 |
| 27-Jun-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
copy_msghdr_from_user(): get rid of field-by-field copyin
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
393cc3f5 |
| 13-Jun-2017 |
Jiri Slaby <jslaby@suse.cz> |
fs/fcntl: f_setown, allow returning error
Allow f_setown to return an error value. We will fail in the next patch with EINVAL for bad input to f_setown, so tile the path for the later patch.
Signed
fs/fcntl: f_setown, allow returning error
Allow f_setown to return an error value. We will fail in the next patch with EINVAL for bad input to f_setown, so tile the path for the later patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com>
show more ...
|
#
241c4667 |
| 21-May-2017 |
Rosen, Rami <rami.rosen@intel.com> |
net: socket: fix a typo in sockfd_lookup().
This patch fixes a typo in sockfd_lookup() in net/socket.c.
Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@daveml
net: socket: fix a typo in sockfd_lookup().
This patch fixes a typo in sockfd_lookup() in net/socket.c.
Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.17 |
|
#
b50a5c70 |
| 19-May-2017 |
Miroslav Lichvar <mlichvar@redhat.com> |
net: allow simultaneous SW and HW transmit timestamping
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even wh
net: allow simultaneous SW and HW transmit timestamping
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to be looped to the socket's error queue with a software timestamp even when a hardware transmit timestamp is expected to be provided by the driver.
Applications using this option will receive two separate messages from the error queue, one with a software timestamp and the other with a hardware timestamp. As the hardware timestamp is saved to the shared skb info, which may happen before the first message with software timestamp is received by the application, the hardware timestamp is copied to the SCM_TIMESTAMPING control message only when the skb has no software timestamp or it is an incoming packet.
While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as there are no other users.
CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
aad9c8c4 |
| 19-May-2017 |
Miroslav Lichvar <mlichvar@redhat.com> |
net: add new control message for incoming HW-timestamped packets
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains t
net: add new control message for incoming HW-timestamped packets
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message for incoming packets with hardware timestamps. It contains the index of the real interface which received the packet and the length of the packet at layer 2.
The index is useful with bonding, bridges and other interfaces, where IP_PKTINFO doesn't allow applications to determine which PHC made the timestamp. With the L2 length (and link speed) it is possible to transpose preamble timestamps to trailer timestamps, which are used in the NTP protocol.
While this information could be provided by two new socket options independently from timestamping, it doesn't look like they would be very useful. With this option any performance impact is limited to hardware timestamping.
Use dev_get_by_napi_id() to get the device and its index. On kernels with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero index will be returned in the control message.
CC: Richard Cochran <richardcochran@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.16, v4.10.15, v4.10.14, v4.10.13, v4.10.12, v4.10.11 |
|
#
57240d00 |
| 12-Apr-2017 |
R. Parameswaran <parameswaran.r7@gmail.com> |
l2tp: device MTU setup, tunnel socket needs a lock
The MTU overhead calculation in L2TP device set-up merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff needs to be adjusted to lock the tunn
l2tp: device MTU setup, tunnel socket needs a lock
The MTU overhead calculation in L2TP device set-up merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff needs to be adjusted to lock the tunnel socket while referencing the sub-data structures to derive the socket's IP overhead.
Reported-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.10, v4.10.9 |
|
#
113c3075 |
| 05-Apr-2017 |
R. Parameswaran <parameswaran.r7@gmail.com> |
New kernel function to get IP overhead on a socket.
A new function, kernel_sock_ip_overhead(), is provided to calculate the cumulative overhead imposed by the IP Header and IP options, if any, on a
New kernel function to get IP overhead on a socket.
A new function, kernel_sock_ip_overhead(), is provided to calculate the cumulative overhead imposed by the IP Header and IP options, if any, on a socket's payload. The new function returns an overhead of zero for sockets that do not belong to the IPv4 or IPv6 address families. This is used in the L2TP code path to compute the total outer IP overhead on the L2TP tunnel socket when calculating the default MTU for Ethernet pseudowires.
Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.8, v4.10.7, v4.10.6, v4.10.5 |
|
#
4ef1b286 |
| 18-Mar-2017 |
Soheil Hassas Yeganeh <soheil@google.com> |
tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->s
tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags is not enough to safely assume that the skb contains OPT_STATS data.
Add a bit in sock_exterr_skb to indicate whether the skb contains opt_stats data.
Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8605330a |
| 18-Mar-2017 |
Soheil Hassas Yeganeh <soheil@google.com> |
tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs
__sock_recv_timestamp can be called for both normal skbs (for receive timestamps) and for skbs on the error queue (for transmit timestamps).
Comm
tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs
__sock_recv_timestamp can be called for both normal skbs (for receive timestamps) and for skbs on the error queue (for transmit timestamps).
Commit 1c885808e456 (tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING) assumes any skb passed to __sock_recv_timestamp are from the error queue, containing OPT_STATS in the content of the skb. This results in accessing invalid memory or generating junk data.
To fix this, set skb->pkt_type to PACKET_OUTGOING for packets on the error queue. This is safe because on the receive path on local sockets skb->pkt_type is never set to PACKET_OUTGOING. With that, copy OPT_STATS from a packet, only if its pkt_type is PACKET_OUTGOING.
Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.4, v4.10.3, v4.10.2 |
|
#
cdfbabfb |
| 09-Mar-2017 |
David Howells <dhowells@redhat.com> |
net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds th
net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
The theory lockdep comes up with is as follows:
(1) If the pagefault handler decides it needs to read pages from AFS, it calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but creating a call requires the socket lock:
mmap_sem must be taken before sk_lock-AF_RXRPC
(2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind() binds the underlying UDP socket whilst holding its socket lock. inet_bind() takes its own socket lock:
sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
(3) Reading from a TCP socket into a userspace buffer might cause a fault and thus cause the kernel to take the mmap_sem, but the TCP socket is locked whilst doing this:
sk_lock-AF_INET must be taken before mmap_sem
However, lockdep's theory is wrong in this instance because it deals only with lock classes and not individual locks. The AF_INET lock in (2) isn't really equivalent to the AF_INET lock in (3) as the former deals with a socket entirely internal to the kernel that never sees userspace. This is a limitation in the design of lockdep.
Fix the general case by:
(1) Double up all the locking keys used in sockets so that one set are used if the socket is created by userspace and the other set is used if the socket is created by the kernel.
(2) Store the kern parameter passed to sk_alloc() in a variable in the sock struct (sk_kern_sock). This informs sock_lock_init(), sock_init_data() and sk_clone_lock() as to the lock keys to be used.
Note that the child created by sk_clone_lock() inherits the parent's kern setting.
(3) Add a 'kern' parameter to ->accept() that is analogous to the one passed in to ->create() that distinguishes whether kernel_accept() or sys_accept4() was the caller and can be passed to sk_alloc().
Note that a lot of accept functions merely dequeue an already allocated socket. I haven't touched these as the new socket already exists before we get the parameter.
Note also that there are a couple of places where I've made the accepted socket unconditionally kernel-based:
irda_accept() rds_rcp_accept_one() tcp_accept_from_sock()
because they follow a sock_create_kern() and accept off of that.
Whilst creating this, I noticed that lustre and ocfs don't create sockets through sock_create_kern() and thus they aren't marked as for-kernel, though they appear to be internal. I wonder if these should do that so that they use the new set of lock keys.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9f138fa6 |
| 08-Mar-2017 |
Alexander Potapenko <glider@google.com> |
net: initialize msg.msg_flags in recvfrom
KMSAN reports a use of uninitialized memory in put_cmsg() because msg.msg_flags in recvfrom haven't been initialized properly. The flag values don't affect
net: initialize msg.msg_flags in recvfrom
KMSAN reports a use of uninitialized memory in put_cmsg() because msg.msg_flags in recvfrom haven't been initialized properly. The flag values don't affect the result on this path, but it's still a good idea to initialize them explicitly.
Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.1 |
|
#
e623a9e9 |
| 21-Feb-2017 |
Maxime Jayat <maxime.jayat@mobile-devices.fr> |
net: socket: fix recvmmsg not returning error from sock_error
Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"), changed the exit path of recvmmsg to always return the datagr
net: socket: fix recvmmsg not returning error from sock_error
Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"), changed the exit path of recvmmsg to always return the datagrams variable and modified the error paths to set the variable to the error code returned by recvmsg if necessary.
However in the case sock_error returned an error, the error code was then ignored, and recvmmsg returned 0.
Change the error path of recvmmsg to correctly return the error code of sock_error.
The bug was triggered by using recvmmsg on a CAN interface which was not up. Linux 4.6 and later return 0 in this case while earlier releases returned -ENETDOWN.
Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path") Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10 |
|
#
dc647ec8 |
| 10-Jan-2017 |
Tobias Klauser <tklauser@distanz.ch> |
net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c
This fixes the following GCC warning: net/socket.c:534:5: warni
net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c
This fixes the following GCC warning: net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]
Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1e911632 |
| 07-Jan-2017 |
yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> |
net: change init_inodecache() return void
sock_init() call it but not check it's return value, so change it to void return and add an internal BUG_ON() check.
Signed-off-by: yuan linyu <Linyu.Yuan@
net: change init_inodecache() return void
sock_init() call it but not check it's return value, so change it to void return and add an internal BUG_ON() check.
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ac4340fc |
| 04-Jan-2017 |
David S. Miller <davem@davemloft.net> |
net: Assert at build time the assumptions we make about the CMSG header.
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).
Otherwise there are missing adjustments in the vario
net: Assert at build time the assumptions we make about the CMSG header.
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).
Otherwise there are missing adjustments in the various calculations that parse and build these things.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|