#
1a922fee |
| 02-Dec-2016 |
Sargun Dhillon <sargun@sargun.me> |
samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers
This patch modifies test_current_task_under_cgroup_user. The test has several helpers around creating a temporary environ
samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers
This patch modifies test_current_task_under_cgroup_user. The test has several helpers around creating a temporary environment for cgroup testing, and moving the current task around cgroups. This set of helpers can then be used in other tests.
Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
69a9d09b |
| 01-Dec-2016 |
Alexei Starovoitov <ast@fb.com> |
samples/bpf: silence compiler warnings
silence some of the clang compiler warnings like: include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false arch/x86/inclu
samples/bpf: silence compiler warnings
silence some of the clang compiler warnings like: include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension since they add too much noise to samples/bpf/ build.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b5b5eca9 |
| 02-Dec-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bpf-support-for-sockets'
David Ahern says:
==================== net: Add bpf support for sockets
The recently added VRF support in Linux leverages the bind-to-device API for programs
Merge branch 'bpf-support-for-sockets'
David Ahern says:
==================== net: Add bpf support for sockets
The recently added VRF support in Linux leverages the bind-to-device API for programs to specify an L3 domain for a socket. While SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable program has support for it. Even for those programs that do support it, the API requires processes to be started as root (CAP_NET_RAW) which is not desirable from a general security perspective.
This patch set leverages Daniel Mack's work to attach bpf programs to a cgroup to provide a capability to set sk_bound_dev_if for all AF_INET{6} sockets opened by a process in a cgroup when the sockets are allocated.
For example: 1. configure vrf (e.g., using ifupdown2) auto eth0 iface eth0 inet dhcp vrf mgmt
auto mgmt iface mgmt vrf-table auto
2. configure cgroup mount -t cgroup2 none /tmp/cgroupv2 mkdir /tmp/cgroupv2/mgmt test_cgrp2_sock /tmp/cgroupv2/mgmt 15
3. set shell into cgroup (e.g., can be done at login using pam) echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs
At this point all commands run in the shell (e.g, apt) have sockets automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'), including processes not running as root.
This capability enables running any program in a VRF context and is key to deploying Management VRF, a fundamental configuration for networking gear, with any Linux OS installation.
This patchset also exports the socket family, type and protocol as read-only allowing bpf filters to deny a process in a cgroup the ability to open specific types of AF_INET or AF_INET6 sockets.
v7 - comments from Alexei
v6 - add export of socket family, type and protocol ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
554ae6e7 |
| 01-Dec-2016 |
David Ahern <dsa@cumulusnetworks.com> |
samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket based family, protocol and type.
Signed-off-by: David Ahern <dsa@cumul
samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket based family, protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ad2805dc |
| 01-Dec-2016 |
David Ahern <dsa@cumulusnetworks.com> |
samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets
samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they are created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f577e22c |
| 02-Dec-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'lwt-bpf'
Thomas Graf says:
==================== bpf: BPF for lightweight tunnel encapsulation
This series implements BPF program invocation from dst entries via the lightweight tunne
Merge branch 'lwt-bpf'
Thomas Graf says:
==================== bpf: BPF for lightweight tunnel encapsulation
This series implements BPF program invocation from dst entries via the lightweight tunnels infrastructure. The BPF program can be attached to lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and see an L3 skb as context. Programs attached to input and output are read-only. Programs attached to lwtunnel_xmit() can modify and redirect, push headers and redirect packets.
The facility can be used to: - Collect statistics and generate sampling data for a subset of traffic based on the dst utilized by the packet thus allowing to extend the existing realms. - Apply additional per route/dst filters to prohibit certain outgoing or incoming packets based on BPF filters. In particular, this allows to maintain per dst custom state across multiple packets in BPF maps and apply filters based on statistics and behaviour observed over time. - Attachment of L2 headers at transmit where resolving the L2 address is not required. - Possibly many more.
v3 -> v4: - Bumped LWT_BPF_MAX_HEADROOM from 128 to 256 (Alexei) - Renamed bpf_skb_push() helper to bpf_skb_change_head() to relate to existing bpf_skb_change_tail() helper (Alexei/Daniel) - Added check in __bpf_redirect_common() to verify that program added a link header before redirecting to a l2 device. Adding the check to lwt-bpf code was considered but dropped due to massive code required due to retrieval of net_device via per-cpu redirect buffer. A test case was added to cover the scenario when a program directs to an l2 device without adding an appropriate l2 header. (Alexei) - Prohibited access to tc_classid (Daniel) - Collapsed bpf_verifier_ops instance for lwt in/out as they are identical (Daniel) - Some cosmetic changes
v2 -> v3: - Added real world sample lwt_len_hist_kern.c which demonstrates how to collect a histogram on packet sizes for all packets flowing through a number of routes. - Restricted output to be read-only. Since the header can no longer be modified, the rerouting functionality has been removed again. - Added test case which cover destructive modification of packet data.
v1 -> v2: - Added new BPF_LWT_REROUTE return code for program to indicate that new route lookup should be performed. Suggested by Tom. - New sample to illustrate rerouting - New patch 05: Recursion limit for lwtunnel_output for the case when user creates circular dst redirection. Also resolves the issue for ILA. - Fix to ensure headroom for potential future L2 header is still guaranteed ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f74599f7 |
| 30-Nov-2016 |
Thomas Graf <tgraf@suug.ch> |
bpf: Add tests and samples for LWT-BPF
Adds a series of tests to verify the functionality of attaching BPF programs at LWT hooks.
Also adds a sample which collects a histogram of packet sizes which
bpf: Add tests and samples for LWT-BPF
Adds a series of tests to verify the functionality of attaching BPF programs at LWT hooks.
Also adds a sample which collects a histogram of packet sizes which pass through an LWT hook.
$ ./lwt_len_hist.sh Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 39857.69 1 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 22 | | 64 -> 127 : 98 | | 128 -> 255 : 213 | | 256 -> 511 : 1444251 |******** | 512 -> 1023 : 660610 |*** | 1024 -> 2047 : 535241 |** | 2048 -> 4095 : 19 | | 4096 -> 8191 : 180 | | 8192 -> 16383 : 5578023 |************************************* | 16384 -> 32767 : 632099 |*** | 32768 -> 65535 : 6575 | |
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9bee294f |
| 29-Nov-2016 |
Alexei Starovoitov <ast@fb.com> |
samples/bpf: fix include path
Fix the following build error: HOSTCC samples/bpf/test_lru_dist.o ../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory
This is due
samples/bpf: fix include path
Fix the following build error: HOSTCC samples/bpf/test_lru_dist.o ../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory
This is due to objtree != srctree. Use srctree, since that's where bpf_util.h is located.
Fixes: e00c7b216f34 ("bpf: fix multiple issues in selftest suite and samples") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
0edbf9e5 |
| 28-Nov-2016 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 4.9-rc7 into usb-next
We want the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
53c4ce02 |
| 27-Nov-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bpf-misc-next'
Daniel Borkmann says:
==================== BPF cleanups and misc updates
This patch set adds couple of cleanups in first few patches, exposes owner_prog_type for array
Merge branch 'bpf-misc-next'
Daniel Borkmann says:
==================== BPF cleanups and misc updates
This patch set adds couple of cleanups in first few patches, exposes owner_prog_type for array maps as well as mlocked mem for maps in fdinfo, allows for mount permissions in fs and fixes various outstanding issues in selftests and samples. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e00c7b21 |
| 25-Nov-2016 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since the sys/resource.h header is not included.
2) test_verifier fails
bpf: fix multiple issues in selftest suite and samples
1) The test_lru_map and test_lru_dist fails building on my machine since the sys/resource.h header is not included.
2) test_verifier fails in one test case where we try to call an invalid function, since the verifier log output changed wrt printing function names.
3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for retrieving the number of possible CPUs. This is broken at least in our scenario and really just doesn't work.
glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF. First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l, if that fails, depending on the config, it either tries to count CPUs in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead. If /proc/cpuinfo has some issue, it returns just 1 worst case. This oddity is nothing new [1], but semantics/behaviour seems to be settled. _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if that fails it looks into /proc/stat for cpuX entries, and if also that fails for some reason, /proc/cpuinfo is consulted (and returning 1 if unlikely all breaks down).
While that might match num_possible_cpus() from the kernel in some cases, it's really not guaranteed with CPU hotplugging, and can result in a buffer overflow since the array in user space could have too few number of slots, and on perpcu map lookup, the kernel will write beyond that memory of the value buffer.
William Tu reported such mismatches:
[...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu() happens when CPU hotadd is enabled. For example, in Fusion when setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64 -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf() will be 2 [2]. [...]
Documentation/cputopology.txt says /sys/devices/system/cpu/possible outputs cpu_possible_mask. That is the same as in num_possible_cpus(), so first step would be to fix the _SC_NPROCESSORS_CONF calls with our own implementation. Later, we could add support to bpf(2) for passing a mask via CPU_SET(3), for example, to just select a subset of CPUs.
BPF samples code needs this fix as well (at least so that people stop copying this). Thus, define bpf_num_possible_cpus() once in selftests and import it from there for the sample code to avoid duplicating it. The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
After all three issues are fixed, the test suite runs fine again:
# make run_tests | grep self selftests: test_verifier [PASS] selftests: test_maps [PASS] selftests: test_lru_map [PASS] selftests: test_kmod.sh [PASS]
[1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
Fixes: 3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps") Fixes: 86af8b4191d2 ("Add sample for adding simple drop program to link") Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Fixes: e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH") Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id") Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ca89fa77 |
| 25-Nov-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'cgroup-bpf'
Daniel Mack says:
==================== Add eBPF hooks for cgroups
This is v9 of the patch set to allow eBPF programs for network filtering and accounting to be attached t
Merge branch 'cgroup-bpf'
Daniel Mack says:
==================== Add eBPF hooks for cgroups
This is v9 of the patch set to allow eBPF programs for network filtering and accounting to be attached to cgroups, so that they apply to all sockets of all tasks placed in that cgroup. The logic also allows to be extendeded for other cgroup based eBPF logic.
Again, only minor details are updated in this version.
Changes from v8:
* Move the egress hooks into ip_finish_output() and ip6_finish_output() so they run after the netfilter hooks. For IPv4 multicast, add a new ip_mc_finish_output() callback that is invoked on success by netfilter, and call the eBPF program from there.
Changes from v7:
* Replace the static inline function cgroup_bpf_run_filter() with two specific macros for ingress and egress. This addresses David Miller's concern regarding skb->sk vs. sk in the egress path. Thanks a lot to Daniel Borkmann and Alexei Starovoitov for the suggestions.
Changes from v6:
* Rebased to 4.9-rc2
* Add EXPORT_SYMBOL(__cgroup_bpf_run_filter). The kbuild test robot now succeeds in building this version of the patch set.
* Switch from bpf_prog_run_save_cb() to bpf_prog_run_clear_cb() to not tamper with the contents of skb->cb[]. Pointed out by Daniel Borkmann.
* Use sk_to_full_sk() in the egress path, as suggested by Daniel Borkmann.
* Renamed BPF_PROG_TYPE_CGROUP_SOCKET to BPF_PROG_TYPE_CGROUP_SKB, as requested by David Ahern.
* Added Alexei's Acked-by tags.
Changes from v5:
* The eBPF programs now operate on L3 rather than on L2 of the packets, and the egress hooks were moved from __dev_queue_xmit() to ip*_output().
* For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb through BPF_LD_[ABS|IND] instructions, but hook up the bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann for the help.
Changes from v4:
* Plug an skb leak when dropping packets due to eBPF verdicts in __dev_queue_xmit(). Spotted by Daniel Borkmann.
* Check for sk_fullsock(sk) in __cgroup_bpf_run_filter() so we don't operate on timewait or request sockets. Suggested by Daniel Borkmann.
* Add missing @parent parameter in kerneldoc of __cgroup_bpf_update(). Spotted by Rami Rosen.
* Include linux/jump_label.h from bpf-cgroup.h to fix a kbuild error.
Changes from v3:
* Dropped the _FILTER suffix from BPF_PROG_TYPE_CGROUP_SOCKET_FILTER, renamed BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS to BPF_CGROUP_INET_{IN,E}GRESS and alias BPF_MAX_ATTACH_TYPE to __BPF_MAX_ATTACH_TYPE, as suggested by Daniel Borkmann.
* Dropped the attach_flags member from the anonymous struct for BPF attach operations in union bpf_attr. They can be added later on via CHECK_ATTR. Requested by Daniel Borkmann and Alexei.
* Release old_prog at the end of __cgroup_bpf_update rather that at the beginning to fix a race gap between program updates and their users. Spotted by Daniel Borkmann.
* Plugged an skb leak when dropping packets on the egress path. Spotted by Daniel Borkmann.
* Add cgroups@vger.kernel.org to the loop, as suggested by Rami Rosen.
* Some minor coding style adoptions not worth mentioning in particular.
Changes from v2:
* Fixed the RCU locking details Tejun pointed out.
* Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler.
Changes from v1:
* Moved all bpf specific cgroup code into its own file, and stub out related functions for !CONFIG_CGROUP_BPF as static inline nops. This way, the call sites are not cluttered with #ifdef guards while the feature remains compile-time configurable.
* Implemented the new scheme proposed by Tejun. Per cgroup, store one set of pointers that are pinned to the cgroup, and one for the programs that are effective. When a program is attached or detached, the change is propagated to all the cgroup's descendants. If a subcgroup has its own pinned program, skip the whole subbranch in order to allow delegation models.
* The hookup for egress packets is now done from __dev_queue_xmit().
* A static key is now used in both the ingress and egress fast paths to keep performance penalties close to zero if the feature is not in use.
* Overall cleanup to make the accessors use the program arrays. This should make it much easier to add new program types, which will then automatically follow the pinned vs. effective logic.
* Fixed locking issues, as pointed out by Eric Dumazet and Alexei Starovoitov. Changes to the program array are now done with xchg() and are protected by cgroup_mutex.
* eBPF programs are now expected to return 1 to let the packet pass, not >= 0. Pointed out by Alexei.
* Operation is now limited to INET sockets, so local AF_UNIX sockets are not affected. The enum members are renamed accordingly. In case other socket families should be supported, this can be extended in the future.
* The sample program learned to support both ingress and egress, and can now optionally make the eBPF program drop packets by making it return 0. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d8c5b17f |
| 23-Nov-2016 |
Daniel Mack <daniel@zonque.org> |
samples: bpf: add userspace example for attaching eBPF programs to cgroups
Add a simple userpace program to demonstrate the new API to attach eBPF programs to cgroups. This is what it does:
* Crea
samples: bpf: add userspace example for attaching eBPF programs to cgroups
Add a simple userpace program to demonstrate the new API to attach eBPF programs to cgroups. This is what it does:
* Create arraymap in kernel with 4 byte keys and 8 byte values
* Load eBPF program
The eBPF program accesses the map passed in to store two pieces of information. The number of invocations of the program, which maps to the number of packets received, is stored to key 0. Key 1 is incremented on each iteration by the number of bytes stored in the skb.
* Detach any eBPF program previously attached to the cgroup
* Attach the new program to the cgroup using BPF_PROG_ATTACH
* Once a second, read map[0] and map[1] to see how many bytes and packets were seen on any socket of tasks in the given cgroup.
The program takes a cgroup path as 1st argument, and either "ingress" or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument, which will make the generated eBPF program return 0 instead of 1, so the kernel will drop the packet.
libbpf gained two new wrappers for the new syscall commands.
Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
69e6cdd0 |
| 23-Nov-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
064e6a8b |
| 23-Nov-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into x86/fpu, to resolve conflicts
Conflicts: arch/x86/kernel/fpu/core.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
02cb689b |
| 22-Nov-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
820b1a93 |
| 22-Nov-2016 |
Mauro Carvalho Chehab <mchehab@s-opensource.com> |
Merge tag 'v4.9-rc6' into patchwork
Linux 4.9-rc6
* tag 'v4.9-rc6': (305 commits) Linux 4.9-rc6 ext4: sanity check the block and cluster size at mount time fscrypto: don't use on-stack buffer
Merge tag 'v4.9-rc6' into patchwork
Linux 4.9-rc6
* tag 'v4.9-rc6': (305 commits) Linux 4.9-rc6 ext4: sanity check the block and cluster size at mount time fscrypto: don't use on-stack buffer for key derivation fscrypto: don't use on-stack buffer for filename encryption i2c: i2c-mux-pca954x: fix deselect enabling for device-tree kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr KVM: async_pf: avoid recursive flushing of work items kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use KVM: Disable irq while unregistering user notifier KVM: x86: do not go through vcpu in __get_kvmclock_ns MAINTAINERS: Add LED subsystem co-maintainer crypto: algif_hash - Fix NULL hash crash with shash powerpc/mm: Fix missing update of HID register on secondary CPUs KVM: arm64: Fix the issues when guest PMCCFILTR is configured arm64: KVM: pmu: Fix AArch32 cycle counter access powerpc/mm/radix: Invalidate ERAT on tlbiel for POWER9 DD1 i2c: digicolor: use clk_disable_unprepare instead of clk_unprepare ipmi/bt-bmc: change compatible node to 'aspeed, ast2400-ibt-bmc' Revert "drm/mediatek: set vblank_disable_allowed to true" ...
show more ...
|
Revision tags: openbmc-4.4-20161121-1, v4.4.33 |
|
#
0acbc7aa |
| 16-Nov-2016 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
e6ca4f16 |
| 15-Nov-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bpf-lru'
Martin KaFai Lau says:
==================== bpf: LRU map
This patch set adds LRU map implementation to the existing BPF map family.
The first few patches introduce the basi
Merge branch 'bpf-lru'
Martin KaFai Lau says:
==================== bpf: LRU map
This patch set adds LRU map implementation to the existing BPF map family.
The first few patches introduce the basic BPF LRU list implementation.
The later patches introduce the LRU versions of the existing BPF_MAP_TYPE_LRU_[PERCPU_]HASH maps by leveraging the BPF LRU list.
v2: - Added a percpu LRU list option which can be specified as a map attribute.
[Note: percpu LRU list has nothing to do with the map's value]
- Removed the cpu variable from the struct bpf_lru_locallist since it is not needed.
- Changed the __bpf_lru_node_move_out to __bpf_lru_node_move_to_free in patch 1 to prepare the percpu LRU list in patch 2.
- Moved the test_lru_map under selftests
- Refactored a few things in the test codes ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.32 |
|
#
5db58faf |
| 11-Nov-2016 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Add tests for the LRU bpf_htab
This patch has some unit tests and a test_lru_dist.
The test_lru_dist reads in the numeric keys from a file. The files used here are generated by a modified fio-
bpf: Add tests for the LRU bpf_htab
This patch has some unit tests and a test_lru_dist.
The test_lru_dist reads in the numeric keys from a file. The files used here are generated by a modified fio-genzipf tool originated from the fio test suit. The sample data file can be found here: https://github.com/iamkafai/bpf-lru
The zipf.* data files have 100k numeric keys and the key is also ranged from 1 to 100k.
The test_lru_dist outputs the number of unique keys (nr_unique). F.e. The following means, 61239 of them is unique out of 100k keys. nr_misses means it cannot be found in the LRU map, so nr_misses must be >= nr_unique. test_lru_dist also simulates a perfect LRU map as a comparison:
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a1_01.out 4000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) .... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a0_01.out 40000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) ... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
LRU map has also been added to map_perf_test: /* Global LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates
/* Percpu LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
bb598c1b |
| 15-Nov-2016 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in 'net-next-.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e76d21c4 |
| 14-Nov-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix off by one wrt. indexing when dumping /proc/net/route entries, from Alexander Duyc
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix off by one wrt. indexing when dumping /proc/net/route entries, from Alexander Duyck.
2) Fix lockdep splats in iwlwifi, from Johannes Berg.
3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH is disabled, from Liping Zhang.
4) Memory leak when nft_expr_clone() fails, also from Liping Zhang.
5) Disable UFO when path will apply IPSEC tranformations, from Jakub Sitnicki.
6) Don't bogusly double cwnd in dctcp module, from Florian Westphal.
7) skb_checksum_help() should never actually use the value "0" for the resulting checksum, that has a special meaning, use CSUM_MANGLED_0 instead. From Eric Dumazet.
8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from Yuval MIntz.
9) Fix SCTP reference counting of associations and transports in sctp_diag. From Xin Long.
10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in a previous layer or similar, so explicitly clear the ipv6 control block in the skb. From Eli Cooper.
11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG Cong.
12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad Shenai.
13) Fix potential access past the end of the skb page frag array in tcp_sendmsg(). From Eric Dumazet.
14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix from David Ahern.
15) Don't return an error in tcp_sendmsg() if we wronte any bytes successfully, from Eric Dumazet.
16) Extraneous unlocks in netlink_diag_dump(), we removed the locking but forgot to purge these unlock calls. From Eric Dumazet.
17) Fix memory leak in error path of __genl_register_family(). We leak the attrbuf, from WANG Cong.
18) cgroupstats netlink policy table is mis-sized, from WANG Cong.
19) Several XDP bug fixes in mlx5, from Saeed Mahameed.
20) Fix several device refcount leaks in network drivers, from Johan Hovold.
21) icmp6_send() should use skb dst device not skb->dev to determine L3 routing domain. From David Ahern.
22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong.
23) We leak new macvlan port in some cases of maclan_common_netlink() errors. Fix from Gao Feng.
24) Similar to the icmp6_send() fix, icmp_route_lookup() should determine L3 routing domain using skb_dst(skb)->dev not skb->dev. Also from David Ahern.
25) Several fixes for route offloading and FIB notification handling in mlxsw driver, from Jiri Pirko.
26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet.
27) Fix long standing regression in ipv4 redirect handling, wrt. validating the new neighbour's reachability. From Stephen Suryaputra Lin.
28) If sk_filter() trims the packet excessively, handle it reasonably in tcp input instead of exploding. From Eric Dumazet.
29) Fix handling of napi hash state when copying channels in sfc driver, from Bert Kenward.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits) mlxsw: spectrum_router: Flush FIB tables during fini net: stmmac: Fix lack of link transition for fixed PHYs sctp: change sk state only when it has assocs in sctp_shutdown bnx2: Wait for in-flight DMA to complete at probe stage Revert "bnx2: Reset device during driver initialization" ps3_gelic: fix spelling mistake in debug message net: ethernet: ixp4xx_eth: fix spelling mistake in debug message ibmvnic: Fix size of debugfs name buffer ibmvnic: Unmap ibmvnic_statistics structure sfc: clear napi_hash state when copying channels mlxsw: spectrum_router: Correctly dump neighbour activity mlxsw: spectrum: Fix refcount bug on span entries bnxt_en: Fix VF virtual link state. bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" tcp: take care of truncations done by sk_filter() ipv4: use new_gw for redirect neigh lookup r8152: Fix error path in open function net: bpqether.h: remove if_ether.h guard net: __skb_flow_dissect() must cap its return value ...
show more ...
|
#
79774d6b |
| 12-Nov-2016 |
David S. Miller <davem@davemloft.net> |
Merge branch 'fix-bpf_redirect'
Martin KaFai Lau says:
==================== bpf: Fix bpf_redirect to an ipip/ip6tnl dev
This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an ipip/i
Merge branch 'fix-bpf_redirect'
Martin KaFai Lau says:
==================== bpf: Fix bpf_redirect to an ipip/ip6tnl dev
This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an ipip/ip6tnl. The current problem is IP-EthHdr-IP is sent out instead of IP-IP.
Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit() in act_mirred.c which is only available in net-next. We can consider to refactor it once this patch is pulled into net-next from net. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.31 |
|
#
90e02896 |
| 09-Nov-2016 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Add test for bpf_redirect to ipip/ip6tnl
The test creates two netns, ns1 and ns2. The host (the default netns) has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping
bpf: Add test for bpf_redirect to ipip/ip6tnl
The test creates two netns, ns1 and ns2. The host (the default netns) has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets from a veth to the ipip/ip6tnl. The test is configured in a way so that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure that bpf_redirect at egress also works for the direct egress (i.e. not forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
712cba5d |
| 07-Nov-2016 |
Max Filippov <jcmvbkbc@gmail.com> |
Merge tag 'v4.9-rc3' into xtensa-for-next
Linux 4.9-rc3
|