#
8d5d8852 |
| 17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
xdp: rhashtable with allocator ID to pointer mapping
Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xd
xdp: rhashtable with allocator ID to pointer mapping
Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster.
The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes.
For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model().
It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing.
Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info.
V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch
V8: Address sparse should be static warnings (from kbuild test robot)
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
cac320c8 |
| 17-Apr-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: convert to use generic xdp_frame and xdp_return_frame API
The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP
virtio_net: convert to use generic xdp_frame and xdp_return_frame API
The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP data pointer address and uses virt_to_head_page() to retrieve struct page.
Use the XDP return API to get away from such assumptions. Instead queue an xdp_frame, which allow us to use the xdp_return_frame API, when releasing the frame.
V8: Avoid endianness issues (found by kbuild test robot) V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter)
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9267c430 |
| 13-Apr-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: add missing virtqueue kick when flushing packets
We tends to batch submitting packets during XDP_TX. This requires to kick virtqueue after a batch, we tried to do it through xdp_do_flush
virtio-net: add missing virtqueue kick when flushing packets
We tends to batch submitting packets during XDP_TX. This requires to kick virtqueue after a batch, we tried to do it through xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So explicitly kick the virtqueue in this case.
Reported-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Tested-by: Kimitoshi Takahashi <ktaka@nii.ac.jp> Cc: Daniel Borkmann <daniel@iogearbox.net> Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
bda7fab5 |
| 22-Mar-2018 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
The operstate update logic will leave an interface in the default UNKNOWN operstate if the interface carrier state never changes from
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
The operstate update logic will leave an interface in the default UNKNOWN operstate if the interface carrier state never changes from the default carrier up state set at creation. This includes the case of an explicit call to netif_carrier_on, as the carrier on to on transition has no effect on operstate.
This affects virtio-net for the case that the virtio peer does not support VIRTIO_NET_F_STATUS (the feature that provides carrier state updates). Without this feature, the virtio specification states that "the link should be assumed active," so, logically, the operstate should be UP instead of UNKNOWN. This has impact on user space applications that use the operstate to make availability decisions for the interface.
Resolve this by changing the virtio probe logic slightly to call netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS cases, and then the existing call to netif_carrier_on for the "without" case will cause an operstate transition.
Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
3cc81a9a |
| 02-Mar-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4e09ff53 |
| 28-Feb-2018 |
Jason Wang <jasowang@redhat.com> |
virtio-net: disable NAPI only when enabled during XDP set
We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), t
virtio-net: disable NAPI only when enabled during XDP set
We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before.
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8dcc5b0a |
| 20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP
When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call.
I
virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP
When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call.
It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not allocated the needed XDP TX queues yet. This is the case for virtio_net, which first allocates the XDP TX queues once an XDP/bpf prog is attached (in virtnet_xdp_set()).
Thus, a crash will occur for virtio_net when redirecting to another virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog. The sample xdp_redirect_map tries to attach a dummy XDP prog to take this into account, but it can also easily fail if the virtio_net (or actually underlying vhost driver) have not allocated enough extra queues for the device.
Allocating more queue this is currently a manual config. Hint for libvirt XML add:
<driver name='vhost' queues='16'> <host mrg_rxbuf='off'/> <guest tso4='off' tso6='off' ecn='off' ufo='off'/> </driver>
The solution in this patch is to check that the device have loaded an XDP/bpf prog before proceeding. This is similar to the check performed in driver ixgbe.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
11b7d897 |
| 20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix memory leak in XDP_REDIRECT
XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons (which can be inspected by tracepoints). The current semantics is that on failure the
virtio_net: fix memory leak in XDP_REDIRECT
XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons (which can be inspected by tracepoints). The current semantics is that on failure the driver calling xdp_do_redirect() must handle freeing or recycling the page associated with this frame. This can be seen as an optimization, as drivers usually have an optimized XDP_DROP code path for frame recycling in place already.
The virtio_net driver didn't handle when xdp_do_redirect() failed. This caused a memory leak as the page refcnt wasn't decremented on failures.
The function __virtnet_xdp_xmit() did handle one type of failure, when the xmit queue virtqueue_add_outbuf() is full, which "hides" releasing a refcnt on the page. Instead the function __virtnet_xdp_xmit() must follow API of xdp_do_redirect(), which on errors leave it up to the caller to free the page, of the failed send operation.
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
95dbe9e7 |
| 20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: fix XDP code path in receive_small()
When configuring virtio_net to use the code path 'receive_small()', in-order to get correct XDP_REDIRECT support, I discovered TCP packets would get
virtio_net: fix XDP code path in receive_small()
When configuring virtio_net to use the code path 'receive_small()', in-order to get correct XDP_REDIRECT support, I discovered TCP packets would get silently dropped when loading an XDP program action XDP_PASS.
The bug seems to be that receive_small() when XDP is loaded check that hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the flags VIRTIO_NET_HDR_F_* : #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */ #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set.
The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags stored in hdr->hdr.gso_type. Thus, the fix is just check that none of the gso_type flags have been set.
Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7324f539 |
| 20-Feb-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: disable XDP_REDIRECT in receive_mergeable() case
The virtio_net code have three different RX code-paths in receive_buf(). Two of these code paths can handle XDP, but one of them is broke
virtio_net: disable XDP_REDIRECT in receive_mergeable() case
The virtio_net code have three different RX code-paths in receive_buf(). Two of these code paths can handle XDP, but one of them is broken for at least XDP_REDIRECT.
Function(1): receive_big() does not support XDP. Function(2): receive_small() support XDP fully and uses build_skb(). Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb().
The simple explanation is that receive_mergeable() is broken because it uses napi_alloc_skb(), which violates XDP given XDP assumes packet header+data in single page and enough tail room for skb_shared_info.
The longer explaination is that receive_mergeable() tries to work-around and satisfy these XDP requiresments e.g. by having a function xdp_linearize_page() that allocates and memcpy RX buffers around (in case packet is scattered across multiple rx buffers). This does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because we have not implemented bpf_xdp_adjust_tail yet).
The XDP_REDIRECT action combined with cpumap is broken, and cause hard to debug crashes. The main issue is that the RX packet does not have the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing skb_shared_info to overlap the next packets head-room (in which cpumap stores info).
Reproducing depend on the packet payload length and if RX-buffer size happened to have tail-room for skb_shared_info or not. But to make this even harder to troubleshoot, the RX-buffer size is runtime dynamically change based on an Exponentially Weighted Moving Average (EWMA) over the packet length, when refilling RX rings.
This patch only disable XDP_REDIRECT support in receive_mergeable() case, because it can cause a real crash.
IMHO we should consider NOT supporting XDP in receive_mergeable() at all, because the principles behind XDP are to gain speed by (1) code simplicity, (2) sacrificing memory and (3) where possible moving runtime checks to setup time. These principles are clearly being violated in receive_mergeable(), that e.g. runtime track average buffer size to save memory consumption.
In the longer run, we should consider introducing a separate receive function when attaching an XDP program, and also change the memory model to be compatible with XDP when attaching an XDP prog.
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d7dfc5cf |
| 17-Jan-2018 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Add ethtool stats
The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment.
$ ethtool -S ens10 NIC
virtio_net: Add ethtool stats
The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment.
$ ethtool -S ens10 NIC statistics: rx_queue_0_packets: 2090408 rx_queue_0_bytes: 3164825094 rx_queue_1_packets: 2082531 rx_queue_1_bytes: 3152932314 tx_queue_0_packets: 2770841 tx_queue_0_bytes: 4194955474 tx_queue_1_packets: 3084697 tx_queue_1_bytes: 4670196372
This change converts existing per-cpu stats structure into per-queue one. This should not impact on performance since each queue counter is not updated concurrently by multiple cpus.
Performance numbers: - Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf
Before After Diff UDP_STREAM 18byte 86.22 87.00 +0.90% UDP_STREAM 1472byte 4055.27 4042.18 -0.32% TCP_STREAM 16956.32 16890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71%
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
faa9b39f |
| 05-Jan-2018 |
Jason Baron <jbaron@akamai.com> |
virtio_net: propagate linkspeed/duplex settings from the hypervisor
The ability to set speed and duplex for virtio_net is useful in various scenarios as described here:
16032be virtio_net: add etht
virtio_net: propagate linkspeed/duplex settings from the hypervisor
The ability to set speed and duplex for virtio_net is useful in various scenarios as described here:
16032be virtio_net: add ethtool support for set and get of settings
However, it would be nice to be able to set this from the hypervisor, such that virtio_net doesn't require custom guest ethtool commands.
Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows the hypervisor to export a linkspeed and duplex setting. The user can subsequently overwrite it later if desired via: 'ethtool -s'.
Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up.
Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
754b8a21 |
| 03-Jan-2018 |
Jesper Dangaard Brouer <brouer@redhat.com> |
virtio_net: setup xdp_rxq_info
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. T
virtio_net: setup xdp_rxq_info
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions.
Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close
V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device
Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
fdaa767a |
| 06-Dec-2017 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
virtio_net: Disable interrupts if napi_complete_done rescheduled napi
Since commit 39e6c8208d7b ("net: solve a NAPI race") napi has been able to be rescheduled within napi_complete_done() even in no
virtio_net: Disable interrupts if napi_complete_done rescheduled napi
Since commit 39e6c8208d7b ("net: solve a NAPI race") napi has been able to be rescheduled within napi_complete_done() even in non-busypoll case, but virtnet_poll() always enabled interrupts before complete, and when napi was rescheduled within napi_complete_done() it did not disable interrupts. This caused more interrupts when event idx is disabled.
According to commit cbdadbbf0c79 ("virtio_net: fix race in RX VQ processing") we cannot place virtqueue_enable_cb_prepare() after NAPI_STATE_SCHED is cleared, so disable interrupts again if napi_complete_done() returned false.
Tested with vhost-user of OVS 2.7 on host, which does not have the event idx feature.
* Before patch:
$ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec
212992 1472 60.00 32763206 0 6430.32 212992 60.00 23384299 4589.56
Interrupts on guest: 9872369 Packets/interrupt: 2.37
* After patch
$ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec
212992 1472 60.00 32794646 0 6436.49 212992 60.00 32793501 6436.27
Interrupts on guest: 4941299 Packets/interrupt: 6.64
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
03e9f8a0 |
| 04-Dec-2017 |
Yunjian Wang <wangyunjian@huawei.com> |
virtio_net: fix return value check in receive_mergeable()
The function virtqueue_get_buf_ctx() could return NULL, the return value 'buf' need to be checked with NULL, not value 'ctx'.
Signed-off-by
virtio_net: fix return value check in receive_mergeable()
The function virtqueue_get_buf_ctx() could return NULL, the return value 'buf' need to be checked with NULL, not value 'ctx'.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
#
453f85d4 |
| 15-Nov-2017 |
Mel Gorman <mgorman@techsingularity.net> |
mm: remove __GFP_COLD
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advant
mm: remove __GFP_COLD
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator.
This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway.
The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path.
Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
f4e63525 |
| 03-Nov-2017 |
Jakub Kicinski <jakub.kicinski@netronome.com> |
net: bpf: rename ndo_xdp to ndo_bpf
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Re
net: bpf: rename ndo_xdp to ndo_bpf
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
de8f3a83 |
| 24-Sep-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and
bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room.
xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail.
The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
dd543797 |
| 22-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: correctly set xdp_xmit for mergeable buffer
We should set xdp_xmit only when xdp_do_redirect() succeed.
Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang
virtio-net: correctly set xdp_xmit for mergeable buffer
We should set xdp_xmit only when xdp_do_redirect() succeed.
Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
186b3c99 |
| 19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: support XDP_REDIRECT
This patch tries to add XDP_REDIRECT for virtio-net. The changes are not complex as we could use exist XDP_TX helpers for most of the work. The rest is passing the X
virtio-net: support XDP_REDIRECT
This patch tries to add XDP_REDIRECT for virtio-net. The changes are not complex as we could use exist XDP_TX helpers for most of the work. The rest is passing the XDP_TX to NAPI handler for implementing batching.
Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
31240345 |
| 19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: add packet len average only when needed during XDP
There's no need to add packet len average in the case of XDP_PASS since it will be done soon after skb is created.
Cc: John Fastabend
virtio-net: add packet len average only when needed during XDP
There's no need to add packet len average in the case of XDP_PASS since it will be done soon after skb is created.
Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9457642a |
| 19-Sep-2017 |
Jason Wang <jasowang@redhat.com> |
virtio-net: remove unnecessary parameter of virtnet_xdp_xmit()
CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@dav
virtio-net: remove unnecessary parameter of virtnet_xdp_xmit()
CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
dadc0736 |
| 24-Aug-2017 |
Eric Dumazet <edumazet@google.com> |
virtio_net: be drop monitor friendly
This change is needed to not fool drop monitor. (perf record ... -e skb:kfree_skb )
Packets were properly sent and are consumed after TX completion.
Signed-off
virtio_net: be drop monitor friendly
This change is needed to not fool drop monitor. (perf record ... -e skb:kfree_skb )
Packets were properly sent and are consumed after TX completion.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
718ad681 |
| 18-Aug-2017 |
stephen hemminger <stephen@networkplumber.org> |
net: drop unused attribute argument from sysfs queue funcs
The show and store functions don't need/use the attribute.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David
net: drop unused attribute argument from sysfs queue funcs
The show and store functions don't need/use the attribute.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a4a76503 |
| 15-Aug-2017 |
stephen hemminger <stephen@networkplumber.org> |
virtio: put paren around sizeof
Kernel coding style is to put paren around operand of sizeof.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@dav
virtio: put paren around sizeof
Kernel coding style is to put paren around operand of sizeof.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|