Revision tags: v4.10.15, v4.10.14, v4.10.13 |
|
#
fd2c83b3 |
| 25-Apr-2017 |
Alexander Potapenko <glider@google.com> |
net/packet: check length in getsockopt() called with PACKET_HDRLEN
In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4 |val| remains uninitialized and the syscall may behave differe
net/packet: check length in getsockopt() called with PACKET_HDRLEN
In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4 |val| remains uninitialized and the syscall may behave differently depending on its value, and even copy garbage to userspace on certain architectures. To fix this we now return -EINVAL if optlen is too small.
This bug has been detected with KMSAN.
Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4a69a864 |
| 21-Apr-2017 |
Mike Maloney <maloney@google.com> |
packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
Fanout uses a per net global namespace. A process that intends to create a new fanout group can accidentally join an existing g
packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
Fanout uses a per net global namespace. A process that intends to create a new fanout group can accidentally join an existing group. It is not possible to detect this.
Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the supplied fanout group id must be set to 0, and the kernel chooses an id that is not already in use. This is an ephemeral flag so that other sockets can be added to this group using setsockopt, but NOT specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...) can be used to retrieve the new group id.
We assume that there are not a lot of fanout groups and that this is not a high frequency call.
The method assigns ids starting at zero and increases until it finds an unused id. It keeps track of the last assigned id, and uses it as a starting point to find new ids.
Signed-off-by: Mike Maloney <maloney@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.12, v4.10.11, v4.10.10, v4.10.9, v4.10.8, v4.10.7 |
|
#
bcc5364b |
| 29-Mar-2017 |
Andrey Konovalov <andreyknvl@google.com> |
net/packet: fix overflow in check for tp_reserve
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.
Fix by checking that tp_reserve <= INT_MAX on assign.
Signed-off-by: Andre
net/packet: fix overflow in check for tp_reserve
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.
Fix by checking that tp_reserve <= INT_MAX on assign.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8f8d28e4 |
| 29-Mar-2017 |
Andrey Konovalov <andreyknvl@google.com> |
net/packet: fix overflow in check for tp_frame_nr
When calculating rb->frames_per_block * req->tp_block_nr the result can overflow.
Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
Since
net/packet: fix overflow in check for tp_frame_nr
When calculating rb->frames_per_block * req->tp_block_nr the result can overflow.
Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
Since frames_per_block <= tp_block_size, the expression would never overflow.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2b6867c2 |
| 29-Mar-2017 |
Andrey Konovalov <andreyknvl@google.com> |
net/packet: fix overflow in check for priv area size
Subtracting tp_sizeof_priv from tp_block_size and casting to int to check whether one is less then the other doesn't always work (both of them ar
net/packet: fix overflow in check for priv area size
Subtracting tp_sizeof_priv from tp_block_size and casting to int to check whether one is less then the other doesn't always work (both of them are unsigned ints).
Compare them as is instead.
Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as it can overflow inside BLK_PLUS_PRIV otherwise.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2 |
|
#
540e2894 |
| 01-Mar-2017 |
Alexander Potapenko <glider@google.com> |
net: don't call strlen() on the user buffer in packet_bind_spkt()
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in packet_bind_spkt(): Acked-by: Eric
net: don't call strlen() on the user buffer in packet_bind_spkt()
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of uninitialized memory in packet_bind_spkt(): Acked-by: Eric Dumazet <edumazet@google.com>
================================================================== BUG: KMSAN: use of unitialized memory CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48 ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550 0000000000000000 0000000000000092 00000000ec400911 0000000000000002 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51 [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003 [<ffffffff818a783b>] __msan_warning+0x5b/0xb0 mm/kmsan/kmsan_instr.c:424 [< inline >] strlen lib/string.c:484 [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144 [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230 net/packet/af_packet.c:3132 [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370 [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356 [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? chained origin: 00000000eba00911 [<ffffffff810bb787>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67 [< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322 [< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334 [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0 mm/kmsan/kmsan.c:527 [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130 mm/kmsan/kmsan_instr.c:380 [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356 [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356 [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f arch/x86/entry/entry_64.o:? origin description: ----address@SYSC_bind (origin=00000000eb400911) ================================================================== (the line numbers are relative to 4.8-rc6, but the bug persists upstream)
, when I run the following program as root:
===================================== #include <string.h> #include <sys/socket.h> #include <netpacket/packet.h> #include <net/ethernet.h>
int main() { struct sockaddr addr; memset(&addr, 0xff, sizeof(addr)); addr.sa_family = AF_PACKET; int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL)); bind(fd, &addr, sizeof(addr)); return 0; } =====================================
This happens because addr.sa_data copied from the userspace is not zero-terminated, and copying it with strlcpy() in packet_bind_spkt() results in calling strlen() on the kernel copy of that non-terminated buffer.
Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.10.1, v4.10 |
|
#
2bd624b4 |
| 15-Feb-2017 |
Anoob Soman <anoob.soman@citrix.com> |
packet: Do not call fanout_release from atomic contexts
Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev"), unfortunately, introduced the following issues.
1. calling
packet: Do not call fanout_release from atomic contexts
Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev"), unfortunately, introduced the following issues.
1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside rcu_read-side critical section. rcu_read_lock disables preemption, most often, which prohibits calling sleeping functions.
[ ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section! [ ] [ ] rcu_scheduler_active = 1, debug_locks = 0 [ ] 4 locks held by ovs-vswitchd/1969: [ ] #0: (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40 [ ] #1: (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch] [ ] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20 [ ] #3: (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0 [ ] [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110 [ ] [<ffffffff810a2da7>] ___might_sleep+0x57/0x210 [ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90 [ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0 [ ] [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30 [ ] [<ffffffff81186e88>] ? printk+0x4d/0x4f [ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock). "sleeping function called from invalid context"
[ ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd [ ] INFO: lockdep is turned off. [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff810a2f52>] ___might_sleep+0x202/0x210 [ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90 [ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0 [ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
3. calling dev_remove_pack(&fanout->prot_hook), from inside spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack() -> synchronize_net(), which might sleep.
[ ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002 [ ] INFO: lockdep is turned off. [ ] Call Trace: [ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4 [ ] [<ffffffff81186274>] __schedule_bug+0x64/0x73 [ ] [<ffffffff8162b8cb>] __schedule+0x6b/0xd10 [ ] [<ffffffff8162c5db>] schedule+0x6b/0x80 [ ] [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410 [ ] [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810 [ ] [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10 [ ] [<ffffffff8154eab5>] synchronize_net+0x35/0x50 [ ] [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20 [ ] [<ffffffff8161077e>] fanout_release+0xbe/0xe0 [ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
4. fanout_release() races with calls from different CPU.
To fix the above problems, remove the call to fanout_release() under rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure fanout->prot_hook is removed as well.
Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d199fab6 |
| 14-Feb-2017 |
Eric Dumazet <edumazet@google.com> |
packet: fix races in fanout_add()
Multiple threads can call fanout_add() at the same time.
We need to grab fanout_mutex earlier to avoid races that could lead to one thread freeing po->rollover tha
packet: fix races in fanout_add()
Multiple threads can call fanout_add() at the same time.
We need to grab fanout_mutex earlier to avoid races that could lead to one thread freeing po->rollover that was set by another thread.
Do the same in fanout_release(), for peace of mind, and to help us finding lockdep issues earlier.
Fixes: dc99f600698d ("packet: Add fanout support.") Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
57031eb7 |
| 07-Feb-2017 |
Willem de Bruijn <willemb@google.com> |
packet: round up linear to header len
Link layer protocols may unconditionally pull headers, as Ethernet does in eth_type_trans. Ensure that the entire link layer header always lies in the skb linea
packet: round up linear to header len
Link layer protocols may unconditionally pull headers, as Ethernet does in eth_type_trans. Ensure that the entire link layer header always lies in the skb linear segment. tpacket_snd has such a check. Extend this to packet_snd.
Variable length link layer headers complicate the computation somewhat. Here skb->len may be smaller than dev->hard_header_len.
Round up the linear length to be at least as long as the smallest of the two.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6391a448 |
| 20-Jan-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving
Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too,
virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving
Commit 501db511397f ("virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit") in fact disables VIRTIO_HDR_F_DATA_VALID on receiving path too, fixing this by adding a hint (has_data_valid) and set it only on the receiving path.
Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
57ea884b |
| 04-Jan-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode
When TX timestamping is in use with TPACKET_V3's TX ring, then we'll hit the BUG() in __packet_set_timestamp() when ring buffer s
packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode
When TX timestamping is in use with TPACKET_V3's TX ring, then we'll hit the BUG() in __packet_set_timestamp() when ring buffer slot is returned to user space via tpacket_destruct_skb(). This is due to v3 being assumed as unreachable here, but since 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling the timestamp back into the ring slot.
Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7f953ab2 |
| 03-Jan-2017 |
Sowmini Varadhan <sowmini.varadhan@oracle.com> |
af_packet: TX_RING support for TPACKET_V3
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3 does not currently have TX_RING support. As a result an application that wants the best pe
af_packet: TX_RING support for TPACKET_V3
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3 does not currently have TX_RING support. As a result an application that wants the best perf for Tx and Rx (e.g. to handle request/response transacations) ends up needing 2 sockets, one with *_v2 for Tx and another with *_v3 for Rx.
This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3 so that an application can use a single descriptor to get the benefits of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by first filling up multiple frames in the Tx ring and then triggering a transmit. This patch only support fixed size Tx frames for TPACKET_V3, and requires that tp_next_offset must be zero.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7c0f6ba6 |
| 24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PA
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31 |
|
#
cbbd26b8 |
| 01-Nov-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
[iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advan
[iov_iter] new primitives - copy_from_iter_full() and friends
copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not.
Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
84ac7260 |
| 30-Nov-2016 |
Philip Pettersson <philip.pettersson@gmail.com> |
packet: fix race condition in packet_set_ring
When packet_set_ring creates a ring buffer it will initialize a struct timer_list if the packet version is TPACKET_V3. This value can then be raced by a
packet: fix race condition in packet_set_ring
When packet_set_ring creates a ring buffer it will initialize a struct timer_list if the packet version is TPACKET_V3. This value can then be raced by a different thread calling setsockopt to set the version to TPACKET_V1 before packet_set_ring has finished.
This leads to a use-after-free on a function pointer in the struct timer_list when the socket is closed as the previously initialized timer will not be deleted.
The bug is fixed by taking lock_sock(sk) in packet_setsockopt when changing the packet version while also taking the lock at the start of packet_set_ring.
Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5a213881 |
| 18-Nov-2016 |
Jarno Rajahalme <jarno@ovn.org> |
af_packet: Use virtio_net_hdr_from_skb() directly.
Remove static function __packet_rcv_vnet(), which only called virtio_net_hdr_from_skb() and BUG()ged out if an error code was returned. Instead, c
af_packet: Use virtio_net_hdr_from_skb() directly.
Remove static function __packet_rcv_vnet(), which only called virtio_net_hdr_from_skb() and BUG()ged out if an error code was returned. Instead, call virtio_net_hdr_from_skb() from the former call sites of __packet_rcv_vnet() and actually use the error handling code that is already there.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
db60eb5f |
| 18-Nov-2016 |
Jarno Rajahalme <jarno@ovn.org> |
af_packet: Use virtio_net_hdr_to_skb().
Use the common virtio_net_hdr_to_skb() instead of open coding it. Other call sites were changed by commit fd2a0437dc, but this one was missed, maybe because i
af_packet: Use virtio_net_hdr_to_skb().
Use the common virtio_net_hdr_to_skb() instead of open coding it. Other call sites were changed by commit fd2a0437dc, but this one was missed, maybe because it is split in two parts of the source code.
Interim comparisons of 'vnet_hdr->gso_type' still work as both the vnet_hdr and skb notion of gso_type is zero when there is no gso.
Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9403cd7c |
| 18-Nov-2016 |
Jarno Rajahalme <jarno@ovn.org> |
virtio_net: Do not clear memory for struct virtio_net_hdr twice.
virtio_net_hdr_from_skb() clears the memory for the header, so there is no point for the callers to do the same.
Signed-off-by: Jarn
virtio_net: Do not clear memory for struct virtio_net_hdr twice.
virtio_net_hdr_from_skb() clears the memory for the header, so there is no point for the callers to do the same.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.30, v4.4.29, v4.4.28 |
|
#
104ba78c |
| 26-Oct-2016 |
Willem de Bruijn <willemb@google.com> |
packet: on direct_xmit, limit tso and csum to supported devices
When transmitting on a packet socket with PACKET_VNET_HDR and PACKET_QDISC_BYPASS, validate device support for features requested in v
packet: on direct_xmit, limit tso and csum to supported devices
When transmitting on a packet socket with PACKET_VNET_HDR and PACKET_QDISC_BYPASS, validate device support for features requested in vnet_hdr.
Drop TSO packets sent to devices that do not support TSO or have the feature disabled. Note that the latter currently do process those packets correctly, regardless of not advertising the feature.
Because of SKB_GSO_DODGY, it is not sufficient to test device features with netif_needs_gso. Full validate_xmit_skb is needed.
Switch to software checksum for non-TSO packets that request checksum offload if that device feature is unsupported or disabled. Note that similar to the TSO case, device drivers may perform checksum offload correctly even when not advertising it.
When switching to software checksum, packets hit skb_checksum_help, which has two BUG_ON checksum not in linear segment. Packet sockets always allocate at least up to csum_start + csum_off + 2 as linear.
Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
ethtool -K eth0 tso off tx on psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N
ethtool -K eth0 tx off psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N
v2: - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)
Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7 |
|
#
66644982 |
| 05-Oct-2016 |
Anoob Soman <anoob.soman@citrix.com> |
packet: call fanout_release, while UNREGISTERING a netdev
If a socket has FANOUT sockopt set, a new proto_hook is registered as part of fanout_add(). When processing a NETDEV_UNREGISTER event in af_
packet: call fanout_release, while UNREGISTERING a netdev
If a socket has FANOUT sockopt set, a new proto_hook is registered as part of fanout_add(). When processing a NETDEV_UNREGISTER event in af_packet, __fanout_unlink is called for all sockets, but prot_hook which was registered as part of fanout_add is not removed. Call fanout_release, on a NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the fanout_list.
This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1 |
|
#
f8e7718c |
| 20-Jul-2016 |
Soheil Hassas Yeganeh <soheil@google.com> |
packet: propagate sock_cmsg_send() error
sock_cmsg_send() can return different error codes and not only -EINVAL, and we should properly propagate them.
Fixes: c14ac9451c34 ("sock: enable timestampi
packet: propagate sock_cmsg_send() error
sock_cmsg_send() can return different error codes and not only -EINVAL, and we should properly propagate them.
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
edbe7746 |
| 19-Jul-2016 |
Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> |
packet: fix second argument of sock_tx_timestamp()
This patch fixes an issue that a syscall (e.g. sendto syscall) cannot work correctly. Since the sendto syscall doesn't have msg_control buffer, the
packet: fix second argument of sock_tx_timestamp()
This patch fixes an issue that a syscall (e.g. sendto syscall) cannot work correctly. Since the sendto syscall doesn't have msg_control buffer, the sock_tx_timestamp() in packet_snd() cannot work correctly because the socks.tsflags is set to 0. So, this patch sets the socks.tsflags to sk->sk_tsflags as default.
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Reported-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com> Reported-by: Keita Kobayashi <keita.kobayashi.ym@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-20160713-1, v4.4.15, v4.6.4 |
|
#
eb70db87 |
| 01-Jul-2016 |
David S. Miller <davem@davemloft.net> |
packet: Use symmetric hash for PACKET_FANOUT_HASH.
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same buck
packet: Use symmetric hash for PACKET_FANOUT_HASH.
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket.
The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy.
But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one.
Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack.
Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
113214be |
| 30-Jun-2016 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: refactor bpf_prog_get and type check into helper
Since bpf_prog_get() and program type check is used in a couple of places, refactor this into a small helper function that we can make use of. S
bpf: refactor bpf_prog_get and type check into helper
Since bpf_prog_get() and program type check is used in a couple of places, refactor this into a small helper function that we can make use of. Since the non RO prog->aux part is not used in performance critical paths and a program destruction via RCU is rather very unlikley when doing the put, we shouldn't have an issue just doing the bpf_prog_get() + prog->type != type check, but actually not taking the ref at all (due to being in fdget() / fdput() section of the bpf fd) is even cleaner and makes the diff smaller as well, so just go for that. Callsites are changed to make use of the new helper where possible.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.6.3, v4.4.14 |
|
#
1276f24e |
| 08-Jun-2016 |
Mike Rapoport <rppt@linux.vnet.ibm.com> |
packet: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_from_skb
Signed-off-by: Mike Rapoport <rpp
packet: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_from_skb
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|