#
dd8070bf |
| 23-May-2018 |
Johannes Berg <johannes.berg@intel.com> |
Merge remote-tracking branch 'net-next/master' into mac80211-next
Bring in net-next which had pulled in net, so I have the changes from mac80211 and can apply a patch that would otherwise conflict.
Merge remote-tracking branch 'net-next/master' into mac80211-next
Bring in net-next which had pulled in net, so I have the changes from mac80211 and can apply a patch that would otherwise conflict.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
show more ...
|
#
6f6e434a |
| 21-May-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal.
TLS data structur
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part.
The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c049ffb3 |
| 21-May-2018 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 4.17-rc6 into usb-next
We want the bug fixes and this resolves the merge issues with the usbip driver.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
583dbad3 |
| 20-May-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
- Unbreak the BPF compilation which got broken by the unconditio
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Thomas Gleixner:
- Unbreak the BPF compilation which got broken by the unconditional requirement of asm-goto, which is not supported by clang.
- Prevent probing on exception masking instructions in uprobes and kprobes to avoid the issues of the delayed exceptions instead of having an ugly workaround.
- Prevent a double free_page() in the error path of do_kexec_load()
- A set of objtool updates addressing various issues mostly related to switch tables and the noreturn detection for recursive sibling calls
- Header sync for tools.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Detect RIP-relative switch table references, part 2 objtool: Detect RIP-relative switch table references objtool: Support GCC 8 switch tables objtool: Support GCC 8's cold subfunctions objtool: Fix "noreturn" detection for recursive sibling calls objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation uprobes/x86: Prohibit probing on MOV SS instruction kprobes/x86: Prohibit probing on exception masking instructions x86/kexec: Avoid double free_page() upon do_kexec_load() failure
show more ...
|
#
177bfd72 |
| 19-May-2018 |
Ingo Molnar <mingo@kernel.org> |
Merge branches 'x86/urgent' and 'core/urgent' into x86/boot, to pick up fixes and avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
b9f672af |
| 16-May-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for yo
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Provide a new BPF helper for doing a FIB and neighbor lookup in the kernel tables from an XDP or tc BPF program. The helper provides a fast-path for forwarding packets. The API supports IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are implemented in this initial work, from David (Ahern).
2) Just a tiny diff but huge feature enabled for nfp driver by extending the BPF offload beyond a pure host processing offload. Offloaded XDP programs are allowed to set the RX queue index and thus opening the door for defining a fully programmable RSS/n-tuple filter replacement. Once BPF decided on a queue already, the device data-path will skip the conventional RSS processing completely, from Jakub.
3) The original sockmap implementation was array based similar to devmap. However unlike devmap where an ifindex has a 1:1 mapping into the map there are use cases with sockets that need to be referenced using longer keys. Hence, sockhash map is added reusing as much of the sockmap code as possible, from John.
4) Introduce BTF ID. The ID is allocatd through an IDR similar as with BPF maps and progs. It also makes BTF accessible to user space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data through BPF_OBJ_GET_INFO_BY_FD, from Martin.
5) Enable BPF stackmap with build_id also in NMI context. Due to the up_read() of current->mm->mmap_sem build_id cannot be parsed. This work defers the up_read() via a per-cpu irq_work so that at least limited support can be enabled, from Song.
6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND JIT conversion as well as implementation of an optimized 32/64 bit immediate load in the arm64 JIT that allows to reduce the number of emitted instructions; in case of tested real-world programs they were shrinking by three percent, from Daniel.
7) Add ifindex parameter to the libbpf loader in order to enable BPF offload support. Right now only iproute2 can load offloaded BPF and this will also enable libbpf for direct integration into other applications, from David (Beckett).
8) Convert the plain text documentation under Documentation/bpf/ into RST format since this is the appropriate standard the kernel is moving to for all documentation. Also add an overview README.rst, from Jesper.
9) Add __printf verification attribute to the bpf_verifier_vlog() helper. Though it uses va_list we can still allow gcc to check the format string, from Mathieu.
10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...' is a bash 4.0+ feature which is not guaranteed to be available when calling out to shell, therefore use a more portable variant, from Joe.
11) Fix a 64 bit division in xdp_umem_reg() by using div_u64() instead of relying on the gcc built-in, from Björn.
12) Fix a sock hashmap kmalloc warning reported by syzbot when an overly large key size is used in hashmap then causing overflows in htab->elem_size. Reject bogus attr->key_size early in the sock_hash_alloc(), from Yonghong.
13) Ensure in BPF selftests when urandom_read is being linked that --build-id is always enabled so that test_stacktrace_build_id[_nmi] won't be failing, from Alexei.
14) Add bitsperlong.h as well as errno.h uapi headers into the tools header infrastructure which point to one of the arch specific uapi headers. This was needed in order to fix a build error on some systems for the BPF selftests, from Sirio.
15) Allow for short options to be used in the xdp_monitor BPF sample code. And also a bpf.h tools uapi header sync in order to fix a selftest build failure. Both from Prashant.
16) More formally clarify the meaning of ID in the direct packet access section of the BPF documentation, from Wang. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1d827879 |
| 15-May-2018 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'fix-samples'
Jakub Kicinski says:
==================== Following patches address build issues after recent move to libbpf. For out-of-tree builds we would see the following error:
gc
Merge branch 'fix-samples'
Jakub Kicinski says:
==================== Following patches address build issues after recent move to libbpf. For out-of-tree builds we would see the following error:
gcc: error: samples/bpf/../../tools/lib/bpf/libbpf.a: No such file or directory
libbpf build system is now always invoked explicitly rather than relying on building single objects most of the time. We need to resolve the friction between Kbuild and tools/ build system.
Mini-library called libbpf.h in samples is renamed to bpf_insn.h, using linux/filter.h seems not completely trivial since some samples get upset when order on include search path in changed. We do have to rename libbpf.h, however, because otherwise it's hard to reliably get to libbpf's header in out-of-tree builds.
v2: - fix the build error harder (patch 3); - add patch 5 (make clang less noisy). ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
768759ed |
| 15-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
samples: bpf: make the build less noisy
Building samples with clang ignores the $(Q) setting, always printing full command to the output. Make it less verbose.
Signed-off-by: Jakub Kicinski <jakub
samples: bpf: make the build less noisy
Building samples with clang ignores the $(Q) setting, always printing full command to the output. Make it less verbose.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
0cc54db1 |
| 15-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
samples: bpf: move libbpf from object dependencies to libs
Make complains that it doesn't know how to make libbpf.a:
scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doe
samples: bpf: move libbpf from object dependencies to libs
Make complains that it doesn't know how to make libbpf.a:
scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern
Now that we have it as a dependency of the sources simply add libbpf.a to libraries not objects.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
787360f8 |
| 15-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
samples: bpf: fix build after move to compiling full libbpf.a
There are many ways users may compile samples, some of them got broken by commit 5f9380572b4b ("samples: bpf: compile and link against f
samples: bpf: fix build after move to compiling full libbpf.a
There are many ways users may compile samples, some of them got broken by commit 5f9380572b4b ("samples: bpf: compile and link against full libbpf"). Improve path resolution and make libbpf building a dependency of source files to force its build.
Samples should now again build with any of: cd samples/bpf; make make samples/bpf/ make -C samples/bpf cd samples/bpf; make O=builddir make samples/bpf/ O=builddir make -C samples/bpf O=builddir export KBUILD_OUTPUT=builddir make samples/bpf/ make -C samples/bpf
Fixes: 5f9380572b4b ("samples: bpf: compile and link against full libbpf") Reported-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
bba95255 |
| 13-May-2018 |
Zhi Wang <zhi.a.wang@intel.com> |
Merge branch 'drm-intel-next-queued' into gvt-next
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
|
#
b1ae32db |
| 13-May-2018 |
Alexei Starovoitov <ast@kernel.org> |
x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
Workaround for the sake of BPF compilation which utilizes kernel headers, but clang does not support ASM GOTO and fails the build.
x86/cpufeature: Guard asm_volatile_goto usage for BPF compilation
Workaround for the sake of BPF compilation which utilizes kernel headers, but clang does not support ASM GOTO and fails the build.
Fixes: d0266046ad54 ("x86: Remove FAST_FEATURE_TESTS") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: daniel@iogearbox.net Cc: peterz@infradead.org Cc: netdev@vger.kernel.org Cc: bp@alien8.de Cc: yhs@fb.com Cc: kernel-team@fb.com Cc: torvalds@linux-foundation.org Cc: davem@davemloft.net Link: https://lkml.kernel.org/r/20180513193222.1997938-1-ast@kernel.org
show more ...
|
#
94cc2fde |
| 11-May-2018 |
Maarten Lankhorst <maarten.lankhorst@linux.intel.com> |
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
drm-misc-next is still based on v4.16-rc7, and was getting a bit stale.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.inte
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next
drm-misc-next is still based on v4.16-rc7, and was getting a bit stale.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
show more ...
|
#
a84880ef |
| 10-May-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-perf-rb-libbpf'
Jakub Kicinski says:
==================== This series started out as a follow up to the bpftool perf event dumping patches.
As suggested by Daniel patch 1 makes u
Merge branch 'bpf-perf-rb-libbpf'
Jakub Kicinski says:
==================== This series started out as a follow up to the bpftool perf event dumping patches.
As suggested by Daniel patch 1 makes use of PERF_SAMPLE_TIME to simplify code and improve accuracy of timestamps.
Remaining patches are trying to move perf event loop into libbpf as suggested by Alexei. One user for this new function is bpftool which links with libbpf nicely, the other, unfortunately, is in samples/bpf. Remaining patches make samples/bpf link against full libbpf.a (not just a handful of objects). Once we have full power of libbpf at our disposal we can convert some of XDP samples to use libbpf loader instead of bpf_load.c. My understanding is that this is the desired direction, at least for networking code. ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
be5bca44 |
| 10-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
samples: bpf: convert some XDP samples from bpf_load to libbpf
Now that we can use full powers of libbpf in BPF samples, we should perhaps make the simplest XDP programs not depend on bpf_load helpe
samples: bpf: convert some XDP samples from bpf_load to libbpf
Now that we can use full powers of libbpf in BPF samples, we should perhaps make the simplest XDP programs not depend on bpf_load helpers. This way newcomers will be exposed to the recommended library from the start.
Use of bpf_prog_load_xattr() will also make it trivial to later on request offload of the programs by simply adding ifindex to the xattr.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
d0cabbb0 |
| 10-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
tools: bpf: move the event reading loop to libbpf
There are two copies of event reading loop - in bpftool and trace_helpers "library". Consolidate them and move the code to libbpf. Return codes fr
tools: bpf: move the event reading loop to libbpf
There are two copies of event reading loop - in bpftool and trace_helpers "library". Consolidate them and move the code to libbpf. Return codes from trace_helpers are kept, but renamed to include LIBBPF prefix.
Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
5f938057 |
| 10-May-2018 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
samples: bpf: compile and link against full libbpf
samples/bpf currently cherry-picks object files from tools/lib/bpf to link against. Just compile the full library and link statically against it.
samples: bpf: compile and link against full libbpf
samples/bpf currently cherry-picks object files from tools/lib/bpf to link against. Just compile the full library and link statically against it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
ff1f56d9 |
| 10-May-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-fib-lookup-helper'
David Ahern says:
==================== Provide a helper for doing a FIB and neighbor lookup in the kernel tables from an XDP program. The helper provides a fast
Merge branch 'bpf-fib-lookup-helper'
David Ahern says:
==================== Provide a helper for doing a FIB and neighbor lookup in the kernel tables from an XDP program. The helper provides a fastpath for forwarding packets. If the packet is a local delivery or for any reason is not a simple lookup and forward, the packet is expected to continue up the stack for full processing.
The response from a FIB and neighbor lookup is either the egress index with the bpf_fib_lookup struct filled in with dmac and gateway or 0 meaning the packet should continue up the stack. In time we can revisit this to return the FIB lookup result errno if it is one of the special RTN_'s such as RTN_BLACKHOLE (-EINVAL) so that the XDP programs can do an early drop if desired.
Patches 1-6 do some more refactoring to IPv6 with the end goal of extracting a FIB lookup function that aligns with fib_lookup for IPv4, basically returning a fib6_info without creating a dst based entry.
Patch 7 adds lookup functions to the ipv6 stub. These are needed since bpf is built into the kernel and ipv6 may not be built or loaded.
Patch 8 adds the bpf helper and 9 adds a sample program.
v3 - remove ETH_ALEN and in6_addr from uapi header
v2 - removed pkt_access from bpf_func_proto as noticed by Daniel - added check in that IPv6 forwarding is enabled - added DaveM's ack on patches 1-7 and 9 based on v1 response and fact that no changes were made to them in v2
v1 - updated commit messages and cover letter - added comment to sample program noting lack of verification on egress device supporting XDP
RFC v2 - fixed use of foward helper from cls_act as noted by Daniel - in patch 1 rename fib6_lookup_1 as well for consistency ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
fe616055 |
| 09-May-2018 |
David Ahern <dsahern@gmail.com> |
samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP
Simple example of fast-path forwarding. It has a serious flaw in not verifying the egress device index supports XDP forwarding. If the egr
samples/bpf: Add example of ipv4 and ipv6 forwarding in XDP
Simple example of fast-path forwarding. It has a serious flaw in not verifying the egress device index supports XDP forwarding. If the egress device does not packets are dropped.
Take this only as a simple example of fast-path forwarding.
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
01adc485 |
| 07-May-2018 |
David S. Miller <davem@davemloft.net> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Minor conflict, a CHECK was placed into an if() statement in net-next, whilst a newline was added to that CHECK call in 'net'. Thank
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Minor conflict, a CHECK was placed into an if() statement in net-next, whilst a newline was added to that CHECK call in 'net'. Thanks to Daniel for the merge resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
08dbc7a6 |
| 03-May-2018 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'AF_XDP-initial-support'
Björn Töpel says:
==================== This patch set introduces a new address family called AF_XDP that is optimized for high performance packet processing an
Merge branch 'AF_XDP-initial-support'
Björn Töpel says:
==================== This patch set introduces a new address family called AF_XDP that is optimized for high performance packet processing and, in upcoming patch sets, zero-copy semantics. In this patch set, we have removed all zero-copy related code in order to make it smaller, simpler and hopefully more review friendly. This patch set only supports copy-mode for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode for RX using the XDP_DRV path. Zero-copy support requires XDP and driver changes that Jesper Dangaard Brouer is working on. Some of his work has already been accepted. We will publish our zero-copy support for RX and TX on top of his patch sets at a later point in time.
An AF_XDP socket (XSK) is created with the normal socket() syscall. Associated with each XSK are two queues: the RX queue and the TX queue. A socket can receive packets on the RX queue and it can send packets on the TX queue. These queues are registered and sized with the setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory to have at least one of these queues for each socket. In contrast to AF_PACKET V2/V3 these descriptor queues are separated from packet buffers. An RX or TX descriptor points to a data buffer in a memory area called a UMEM. RX and TX can share the same UMEM so that a packet does not have to be copied between RX and TX. Moreover, if a packet needs to be kept for a while due to a possible retransmit, the descriptor that points to that packet can be changed to point to another and reused right away. This again avoids copying data.
This new dedicated packet buffer area is call a UMEM. It consists of a number of equally size frames and each frame has a unique frame id. A descriptor in one of the queues references a frame by referencing its frame id. The user space allocates memory for this UMEM using whatever means it feels is most appropriate (malloc, mmap, huge pages, etc). This memory area is then registered with the kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two queues: the FILL queue and the COMPLETION queue. The fill queue is used by the application to send down frame ids for the kernel to fill in with RX packet data. References to these frames will then appear in the RX queue of the XSK once they have been received. The completion queue, on the other hand, contains frame ids that the kernel has transmitted completely and can now be used again by user space, for either TX or RX. Thus, the frame ids appearing in the completion queue are ids that were previously transmitted using the TX queue. In summary, the RX and FILL queues are used for the RX path and the TX and COMPLETION queues are used for the TX path.
The socket is then finally bound with a bind() call to a device and a specific queue id on that device, and it is not until bind is completed that traffic starts to flow. Note that in this patch set, all packet data is copied out to user-space.
A new feature in this patch set is that the UMEM can be shared between processes, if desired. If a process wants to do this, it simply skips the registration of the UMEM and its corresponding two queues, sets a flag in the bind call and submits the XSK of the process it would like to share UMEM with as well as its own newly created XSK socket. The new process will then receive frame id references in its own RX queue that point to this shared UMEM. Note that since the queue structures are single-consumer / single-producer (for performance reasons), the new process has to create its own socket with associated RX and TX queues, since it cannot share this with the other process. This is also the reason that there is only one set of FILL and COMPLETION queues per UMEM. It is the responsibility of a single process to handle the UMEM. If multiple-producer / multiple-consumer queues are implemented in the future, this requirement could be relaxed.
How is then packets distributed between these two XSK? We have introduced a new BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The user-space application can place an XSK at an arbitrary place in this map. The XDP program can then redirect a packet to a specific index in this map and at this point XDP validates that the XSK in that map was indeed bound to that device and queue number. If not, the packet is dropped. If the map is empty at that index, the packet is also dropped. This also means that it is currently mandatory to have an XDP program loaded (and one XSK in the XSKMAP) to be able to get any traffic to user space through the XSK.
AF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the driver does not have support for XDP, or XDP_SKB is explicitly chosen when loading the XDP program, XDP_SKB mode is employed that uses SKBs together with the generic XDP support and copies out the data to user space. A fallback mode that works for any network device. On the other hand, if the driver has support for XDP, it will be used by the AF_XDP code to provide better performance, but there is still a copy of the data into user space.
There is a xdpsock benchmarking/test application included that demonstrates how to use AF_XDP sockets with both private and shared UMEMs. Say that you would like your UDP traffic from port 4242 to end up in queue 16, that we will enable AF_XDP on. Here, we use ethtool for this:
ethtool -N p3p2 rx-flow-hash udp4 fn ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \ action 16
Running the rxdrop benchmark in XDP_DRV mode can then be done using:
samples/bpf/xdpsock -i p3p2 -q 16 -r -N
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options can be displayed with "-h", as usual.
We have run some benchmarks on a dual socket system with two Broadwell E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14 cores which gives a total of 28, but only two cores are used in these experiments. One for TR/RX and one for the user space application. The memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is 8192MB and with 8 of those DIMMs in the system we have 64 GB of total memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The NIC is Intel I40E 40Gbit/s using the i40e driver.
Below are the results in Mpps of the I40E NIC benchmark runs for 64 and 1500 byte packets, generated by a commercial packet generator HW outputing packets at full 40 Gbit/s line rate. The results are without retpoline so that we can compare against previous numbers. With retpoline, the AF_XDP numbers drop with between 10 - 15 percent.
AF_XDP performance 64 byte packets. Results from V2 in parenthesis. Benchmark XDP_SKB XDP_DRV rxdrop 2.9(3.0) 9.6(9.5) txpush 2.6(2.5) NA* l2fwd 1.9(1.9) 2.5(2.5) (TX using XDP_SKB in both cases)
AF_XDP performance 1500 byte packets: Benchmark XDP_SKB XDP_DRV rxdrop 2.1(2.2) 3.3(3.3) l2fwd 1.4(1.4) 1.8(1.8) (TX using XDP_SKB in both cases)
* NA since we have no support for TX using the XDP_DRV infrastructure in this patch set. This is for a future patch set since it involves changes to the XDP NDOs. Some of this has been upstreamed by Jesper Dangaard Brouer.
XDP performance on our system as a base line:
64 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 32.3(32.9)M 0
1500 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 3.3(3.3)M 0
Changes from V2:
* Fixed a race in XSKMAP map found by Will. The code has been completely rearchitected and is now simpler, faster, and hopefully also not racy. Please review and check if it holds.
If you would like to diff V2 against V3, you can find them here: https://github.com/bjoto/linux/tree/af-xdp-v2-on-bpf-next https://github.com/bjoto/linux/tree/af-xdp-v3-on-bpf-next
The structure of the patch set is as follows:
Patches 1-3: Basic socket and umem plumbing Patches 4-9: RX support together with the new XSKMAP Patches 10-13: TX support Patch 14: Statistics support with getsockopt() Patch 15: Sample application
We based this patch set on bpf-next commit a3fe1f6f2ada ("tools: bpftool: change time format for program 'loaded at:' information")
To do for this patch set:
* Syzkaller torture session being worked on
Post-series plan:
* Optimize performance
* Kernel selftest
* Kernel load module support of AF_XDP would be nice. Unclear how to achieve this though since our XDP code depends on net/core.
* Support for AF_XDP sockets without an XPD program loaded. In this case all the traffic on a queue should go up to the user space socket.
* Daniel Borkmann's suggestion for a "copy to XDP socket, and return XDP_PASS" for a tcpdump-like functionality.
* And of course getting to zero-copy support in small increments, starting with TX then adding RX.
Thanks: Björn and Magnus ====================
Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
b4b8faa1 |
| 02-May-2018 |
Magnus Karlsson <magnus.karlsson@intel.com> |
samples/bpf: sample application and documentation for AF_XDP sockets
This is a sample application for AF_XDP sockets. The application supports three different modes of operation: rxdrop, txonly and
samples/bpf: sample application and documentation for AF_XDP sockets
This is a sample application for AF_XDP sockets. The application supports three different modes of operation: rxdrop, txonly and l2fwd.
To show-case a simple round-robin load-balancing between a set of sockets in an xskmap, set the RR_LB compile time define option to 1 in "xdpsock.h".
v2: The entries variable was calculated twice in {umem,xq}_nb_avail.
Co-authored-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
53f071e1 |
| 02-May-2018 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next-queued
Need d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API") in dinq to be able to fix https://bugs.f
Merge drm/drm-next into drm-intel-next-queued
Need d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API") in dinq to be able to fix https://bugs.freedesktop.org/show_bug.cgi?id=106085.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
#
552c69b3 |
| 02-May-2018 |
John Johansen <john.johansen@canonical.com> |
Merge tag 'v4.17-rc3' into apparmor-next
Linux v4.17-rc3
Merge in v4.17 for LSM updates
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
#
f60ad0a0 |
| 29-Apr-2018 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf_get_stack'
Yonghong Song says:
==================== Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitat
Merge branch 'bpf_get_stack'
Yonghong Song says:
==================== Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitation though. If two stack traces have the same hash, only one will get stored in the stackmap table regardless of whether BPF_F_REUSE_STACKID is specified or not, so some stack traces may be missing from user perspective.
This patch implements a new helper, bpf_get_stack, will send stack traces directly to bpf program. The bpf program is able to see all stack traces, and then can do in-kernel processing or send stack traces to user space through shared map or bpf_perf_event_output.
Patches #1 and #2 implemented the core kernel support. Patch #3 removes two never-hit branches in verifier. Patches #4 and #5 are two verifier improves to make bpf programming easier. Patch #6 synced the new helper to tools headers. Patch #7 moved perf_event polling code and ksym lookup code from samples/bpf to tools/testing/selftests/bpf. Patch #8 added a verifier test in tools/bpf for new verifier change. Patches #9 and #10 added tests for raw tracepoint prog and tracepoint prog respectively.
Changelogs: v8 -> v9: . make function perf_event_mmap (in trace_helpers.c) extern to decouple perf_event_mmap and perf_event_poller. . add jit enabled handling for kernel stack verification in Patch #9. Since we did not have a good way to verify jit enabled kernel stack, just return true if the kernel stack is not empty. . In path #9, using raw_syscalls/sys_enter instead of sched/sched_switch, removed calling cmd "task 1 dd if=/dev/zero of=/dev/null" which is left with dangling process after the program exited. v7 -> v8: . rebase on top of latest bpf-next . simplify BPF_ARSH dst_reg->smin_val/smax_value tracking . rewrite the description of bpf_get_stack() in uapi bpf.h based on new format. v6 -> v7: . do perf callchain buffer allocation inside the verifier. so if the prog->has_callchain_buf is set, it is guaranteed that the buffer has been allocated. . change condition "trace_nr <= skip" to "trace_nr < skip" so that for zero size buffer, return 0 instead of -EFAULT v5 -> v6: . after refining return register smax_value and umax_value for helpers bpf_get_stack and bpf_probe_read_str, bounds and var_off of the return register are further refined. . added missing commit message for tools header sync commit. . removed one unnecessary empty line. v4 -> v5: . relied on dst_reg->var_off to refine umin_val/umax_val in verifier handling BPF_ARSH value range tracking, suggested by Edward. v3 -> v4: . fixed a bug when meta ptr is set to NULL in check_func_arg. . introduced tnum_arshift and added detailed comments for the underlying implementation . avoided using VLA in tools/bpf test_progs. v2 -> v3: . used meta to track helper memory size argument . implemented range checking for ARSH in verifier . moved perf event polling and ksym related functions from samples/bpf to tools/bpf . added test to compare build id's between bpf_get_stackid and bpf_get_stack v1 -> v2: . fixed compilation error when CONFIG_PERF_EVENTS is not enabled ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|