#
36154be4 |
| 22-Feb-2017 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Fix wrong CQE decompression
In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wr
net/mlx5e: Fix wrong CQE decompression
In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages.
The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE.
We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session.
Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues.
Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Tom Herbert <tom@herbertland.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
6dc4b54e |
| 22-Feb-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Update MPWQE stride size when modifying CQE compress state
When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size =
net/mlx5e: Update MPWQE stride size when modifying CQE compress state
When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B
This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression).
This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size.
Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly)
Fixes: 7219ab34f184 ("net/mlx5e: CQE compression") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
18bcf742 |
| 22-Feb-2017 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5e: s390 system compilation fix
Add necessary headers include for s390 arch compilation.
Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Fixes: d605d6686dc7 ("net/mlx5e: Add suppor
net/mlx5e: s390 system compilation fix
Add necessary headers include for s390 arch compilation.
Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Fixes: d605d6686dc7 ("net/mlx5e: Add support for ethtool self..") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.9 |
|
#
b70149dd |
| 06-Dec-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: XDP Tx, no inline copy on ConnectX-5
ConnectX-5 and later HW generations will report min inline mode == MLX5_INLINE_MODE_NONE, which means driver is not required to copy packet headers to
net/mlx5e: XDP Tx, no inline copy on ConnectX-5
ConnectX-5 and later HW generations will report min inline mode == MLX5_INLINE_MODE_NONE, which means driver is not required to copy packet headers to inline fields of TX WQE.
Avoid copy to inline segment in XDP TX routine when HW inline mode doesn't require it.
This will improve CPU utilization and boost XDP TX performance.
Tested with xdp2 single flow: CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz HCA: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Before: 7.4Mpps After: 7.8Mpps Improvement: 5%
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
show more ...
|
#
2b31f7ae |
| 28-Nov-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: TX WQE update
Add new TX WQE fields for Connect-X5 vlan insertion support, type and vlan_tci, when type = MLX5_ETH_WQE_INSERT_VLAN the HW will insert the vlan and prio fields (vlan_tci) to
net/mlx5: TX WQE update
Add new TX WQE fields for Connect-X5 vlan insertion support, type and vlan_tci, when type = MLX5_ETH_WQE_INSERT_VLAN the HW will insert the vlan and prio fields (vlan_tci) to the packet.
Those bits and the inline header fields are mutually exclusive, and valid only when: MLX5_CAP_ETH(mdev, wqe_inline_mode) == MLX5_CAP_INLINE_MODE_NOT_REQUIRED and MLX5_CAP_ETH(mdev, wqe_vlan_insert), who will be set in ConnectX-5 and later HW generations.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
show more ...
|
#
a67edbf4 |
| 24-Jan-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add initial bpf tracepoints
This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable
bpf: add initial bpf tracepoints
This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable.
For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded.
Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output:
# ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER
[1] https://lwn.net/Articles/705270/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5eb0249b |
| 10-Dec-2016 |
Shaker Daibes <shakerd@mellanox.com> |
net/mlx5e: CQE compression control code reuse
This patch is intended for code reuse of mlx5e_modify_rx_cqe_compression function.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: S
net/mlx5e: CQE compression control code reuse
This patch is intended for code reuse of mlx5e_modify_rx_cqe_compression function.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
e048fc50 |
| 19-Jan-2017 |
Eric Dumazet <edumazet@google.com> |
net/mlx5e: Do not recycle pages from emergency reserve
A driver using dev_alloc_page() must not reuse a page allocated from emergency memory reserve.
Otherwise all packets using this page will be i
net/mlx5e: Do not recycle pages from emergency reserve
A driver using dev_alloc_page() must not reuse a page allocated from emergency memory reserve.
Otherwise all packets using this page will be immediately dropped, unless for very specific sockets having SOCK_MEMALLOC bit set.
This issue might be hard to debug, because only a fraction of received packets would be dropped.
Fixes: 4415a0319f92 ("net/mlx5e: Implement RX mapped page cache for page recycle") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d8bec2b2 |
| 18-Jan-2017 |
Martin KaFai Lau <kafai@fb.com> |
net/mlx5e: Support bpf_xdp_adjust_head()
This patch adds bpf_xdp_adjust_head() support to mlx5e.
1. rx_headroom is added to struct mlx5e_rq. It uses an existing 4 byte hole in the struct. 2. Th
net/mlx5e: Support bpf_xdp_adjust_head()
This patch adds bpf_xdp_adjust_head() support to mlx5e.
1. rx_headroom is added to struct mlx5e_rq. It uses an existing 4 byte hole in the struct. 2. The adjusted data length is checked against MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu). 3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h. MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason but it is not a must.
v2: - Keep the xdp specific logic in mlx5e_xdp_handle() - Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame()
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c0f1147d |
| 06-Dec-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5e: Change the SQ/RQ operational state to positive logic
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen we will have a time interval that the RQ/SQ is not really rea
net/mlx5e: Change the SQ/RQ operational state to positive logic
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen we will have a time interval that the RQ/SQ is not really ready and the state indicates that its not in FLUSH state because the initial SQ/RQ struct memory starts as zeros. Now we changed the state to indicate if the SQ/RQ is opened and we will set the READY state after finishing preparing all the SQ/RQ resources.
Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b8335d91 |
| 06-Dec-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Don't notify HW when filling the edge of ICO SQ
We are going to do this a couple of steps ahead anyway.
Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <sa
net/mlx5e: Don't notify HW when filling the edge of ICO SQ
We are going to do this a couple of steps ahead anyway.
Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
366cbf2f |
| 30-Nov-2016 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is supe
bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background.
Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually.
mlx5 needs actual fixing as it has the same issue as described already in 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9bcc8606 |
| 27-Nov-2016 |
Shaker Daibes <shakerd@mellanox.com> |
net/mlx5e: Add CQE compression user control
The user can now override the automatic driver decision using the rx_cqe_compress flag, which is the preference for CQE compression. The flag is initializ
net/mlx5e: Add CQE compression user control
The user can now override the automatic driver decision using the rx_cqe_compress flag, which is the preference for CQE compression. The flag is initialized with the automatic driver decision.
Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28, v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22 |
|
#
f5f82476 |
| 22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Support VLAN actions in the offloads mode
Many virtualization systems use a policy under which a vlan tag is pushed to packets sent by guests, and popped before the packet is for
net/mlx5: E-Switch, Support VLAN actions in the offloads mode
Many virtualization systems use a policy under which a vlan tag is pushed to packets sent by guests, and popped before the packet is forwarded to the VM.
The current generation of the mlx5 HW doesn't fully support that on a per flow level. As such, we are addressing the above common use case with the SRIOV e-Switch abilities to push vlan into packets sent by VFs and pop vlan from packets forwarded to VFs.
The HW can match on the correct vlan being present in packets forwarded to VFs (eSwitch steering is done before stripping the tag), so this part is offloaded as is.
A common practice for vlans is to avoid both push vlan and pop vlan for inter-host VM/VM (east-west) communication because in this case, push on egress cancels out with pop on ingress.
For supporting that, we use a global eswitch vlan pop policy, hence allowing guest A to communicate with both remote VM B and local VM C. This works since the HW pops the vlan only if it exists (e.g for C --> A packets but not for B --> A packets).
On the slow path, when a VF vport has an offloaded flow which involves pushing vlans, wheres another flow is not currently offloaded, the packets from the 2nd flow seen by the VF representor on the host have vlan. The VF rep driver removes such vlan before calling into the host networking stack.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8515c581 |
| 22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Refactor retrival of skb from rx completion element (cqe)
Factor the relevant code into a static inline helper (skb_from_cqe) doing that.
Move the call to napi_gro_receive to be carried
net/mlx5e: Refactor retrival of skb from rx completion element (cqe)
Factor the relevant code into a static inline helper (skb_from_cqe) doing that.
Move the call to napi_gro_receive to be carried out just after mlx5e_complete_rx_cqe returns.
Both changes are to be used for the VF representor as well in the next commit.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
35b510e2 |
| 21-Sep-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: XDP TX xmit more
Previously we rang XDP SQ doorbell on every forwarded XDP packet.
Here we introduce a xmit more like mechanism that will queue up more than one packet into SQ (up to RX
net/mlx5e: XDP TX xmit more
Previously we rang XDP SQ doorbell on every forwarded XDP packet.
Here we introduce a xmit more like mechanism that will queue up more than one packet into SQ (up to RX napi budget) w/o notifying the hardware.
Once RX napi budget is consumed and we exit napi RX loop, we will flush (doorbell) all XDP looped packets in case there are such.
XDP forward packet rate:
Comparing XDP with and w/o xmit more (bulk transmit):
RX Cores XDP TX XDP TX (xmit more) --------------------------------------------------- 1 6.5Mpps 12.4Mpps 2 13.2Mpps 24.2Mpps 4 25.2Mpps 36.3Mpps* 8 36.3Mpps* 36.3Mpps*
*My xmitter was limited to 36.3Mpps, so it is the bottleneck. It seems that receive side can handle more.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b5503b99 |
| 21-Sep-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: XDP TX forwarding support
Adding support for XDP_TX forwarding from xdp program. Using XDP, now user can loop packets out of the same port.
We create a dedicated TX SQ for each channel t
net/mlx5e: XDP TX forwarding support
Adding support for XDP_TX forwarding from xdp program. Using XDP, now user can loop packets out of the same port.
We create a dedicated TX SQ for each channel that will serve XDP programs that return XDP_TX action to loop packets back to the wire directly from the channel RQ RX path.
For that RX pages will now need to be mapped bi-directionally, and on XDP_TX action we will sync the page back to device then queue it into SQ for transmission. The XDP xmit frame function will report back to the RX path if the page was consumed (transmitted), if so, RX path will forget about that page as if it were released to the stack. Later on, on XDP TX completion, the page will be released back to the page cache.
For simplicity this patch will hit a doorbell on every XDP TX packet.
Next patch will introduce a xmit more like mechanism that will queue up more than one packet into SQ w/o notifying the hardware, once RX napi loop is done we will hit doorbell once for all XDP TX packets form the previous loop. This should drastically improve XDP TX performance.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f10b7cc7 |
| 21-Sep-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Have a clear separation between different SQ types
Make a clear separate between Regular SQ (TXQ) and ICO SQ creation, destruction and union their mutual information structures.
Don't al
net/mlx5e: Have a clear separation between different SQ types
Make a clear separate between Regular SQ (TXQ) and ICO SQ creation, destruction and union their mutual information structures.
Don't allocate redundant TXQ skb/wqe_info/dma_fifo arrays for ICO SQ. And have a different SQ edge for ICO SQ than TXQ SQ, to be more accurate.
In preparation for XDP TX support.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
86994156 |
| 21-Sep-2016 |
Rana Shahout <ranas@mellanox.com> |
net/mlx5e: XDP fast RX drop bpf programs support
Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
When XDP is on we make sure to change channels RQs type to MLX5_WQ_TYPE_LINKED_LIST
net/mlx5e: XDP fast RX drop bpf programs support
Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx5e driver.
When XDP is on we make sure to change channels RQs type to MLX5_WQ_TYPE_LINKED_LIST rather than "striding RQ" type to ensure "page per packet".
On XDP set, we fail if HW LRO is set and request from user to turn it off. Since on ConnectX4-LX HW LRO is always on by default, this will be annoying, but we prefer not to enforce LRO off from XDP set function.
Full channels reset (close/open) is required only when setting XDP on/off.
When XDP set is called just to exchange programs, we will update each RQ xdp program on the fly and for synchronization with current data path RX activity of that RQ, we temporally disable that RQ and ensure RX path is not running, quickly update and re-enable that RQ, for that we do: - rq.state = disabled - napi_synnchronize - xchg(rq->xdp_prg) - rq.state = enabled - napi_schedule // Just in case we've missed an IRQ
Packet rate performance testing was done with pktgen 64B packets and on TX side and, TC drop action on RX side compared to XDP fast drop.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Comparison is done between: 1. Baseline, Before this patch with TC drop action 2. This patch with TC drop action 3. This patch with XDP RX fast drop
RX Cores Baseline(TC drop) TC drop XDP fast Drop -------------------------------------------------------------- 1 5.3Mpps 5.3Mpps 16.5Mpps 2 10.2Mpps 10.2Mpps 31.3Mpps 4 20.5Mpps 19.9Mpps 36.3Mpps*
*My xmitter was limited to 36.3Mpps, so it is the bottleneck. It seems that receive side can handle more.
Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
21c59685 |
| 21-Sep-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Union RQ RX info per RQ type
We have two types of RX RQs, and they use two separate sets of info arrays and structures in RX data path function. Today those structures are mutually exclu
net/mlx5e: Union RQ RX info per RQ type
We have two types of RX RQs, and they use two separate sets of info arrays and structures in RX data path function. Today those structures are mutually exclusive per RQ type, hence one kind is allocated on RQ creation according to the RQ type.
For better cache locality and to minimalize the sizeof(struct mlx5e_rq), in this patch we define them as a union.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1bfecfca |
| 21-Sep-2016 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Build RX SKB on demand
For non-striding RQ configuration before this patch we had a ring with pre-allocated SKBs and mapped the SKB->data buffers for device.
For robustness and better RX
net/mlx5e: Build RX SKB on demand
For non-striding RQ configuration before this patch we had a ring with pre-allocated SKBs and mapped the SKB->data buffers for device.
For robustness and better RX data buffers management, we allocate a page per packet and build_skb around it.
This patch (which is a prerequisite for XDP) will actually reduce performance for normal stack usage, because we are now hitting a bottleneck in the page allocator. We use the page-cache to restore or even improve performance in comparison to the old RX scheme.
Packet rate performance testing was done with pktgen 64B packets on xmit side and TC ingress dropping action on RX side.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Comparison is done between: 1.Baseline, before 'net/mlx5e: Build RX SKB on demand' 2.Build SKB with RX page cache (This patch)
RX Cores Baseline Build SKB+page-cache Improvement ----------------------------------------------------------- 1 4.16Mpps 5.33Mpps 28% 2 7.16Mpps 10.24Mpps 43% 4 13.61Mpps 20.51Mpps 51% 8 25.32Mpps 32.00Mpps 26%
All respective cores were 100% utilized.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4415a031 |
| 15-Sep-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Implement RX mapped page cache for page recycle
Instead of reallocating and mapping pages for RX data-path, recycle already used pages in a per ring cache.
Performance tests: The followi
net/mlx5e: Implement RX mapped page cache for page recycle
Instead of reallocating and mapping pages for RX data-path, recycle already used pages in a per ring cache.
Performance tests: The following results were measured on a freshly booted system, giving optimal baseline performance, as high-order pages are yet to be fragmented and depleted.
We ran pktgen single-stream benchmarks, with iptables-raw-drop:
Single stride, 64 bytes: * 4,739,057 - baseline * 4,749,550 - order0 no cache * 4,786,899 - order0 with cache 1% gain
Larger packets, no page cross, 1024 bytes: * 3,982,361 - baseline * 3,845,682 - order0 no cache * 4,127,852 - order0 with cache 3.7% gain
Larger packets, every 3rd packet crosses a page, 1500 bytes: * 3,731,189 - baseline * 3,579,414 - order0 no cache * 3,931,708 - order0 with cache 5.4% gain
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a5a0c590 |
| 15-Sep-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Introduce API for RX mapped pages
Manage the allocation and deallocation of mapped RX pages only through dedicated API functions.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed
net/mlx5e: Introduce API for RX mapped pages
Manage the allocation and deallocation of mapped RX pages only through dedicated API functions.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7e426671 |
| 15-Sep-2016 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Single flow order-0 pages for Striding RQ
To improve the memory consumption scheme, we omit the flow that demands and splits high-order pages in Striding RQ, and stay with a single Stridi
net/mlx5e: Single flow order-0 pages for Striding RQ
To improve the memory consumption scheme, we omit the flow that demands and splits high-order pages in Striding RQ, and stay with a single Striding RQ flow that uses order-0 pages.
Moving to fragmented memory allows the use of larger MPWQEs, which reduces the number of UMR posts and filler CQEs.
Moving to a single flow allows several optimizations that improve performance, especially in production servers where we would anyway fallback to order-0 allocations: - inline functions that were called via function pointers. - improve the UMR post process.
This patch alone is expected to give a slight performance reduction. However, the new memory scheme gives the possibility to use a page-cache of a fair size, that doesn't inflate the memory footprint, which will dramatically fix the reduction and even give a performance gain.
Performance tests: The following results were measured on a freshly booted system, giving optimal baseline performance, as high-order pages are yet to be fragmented and depleted.
We ran pktgen single-stream benchmarks, with iptables-raw-drop:
Single stride, 64 bytes: * 4,739,057 - baseline * 4,749,550 - this patch no reduction
Larger packets, no page cross, 1024 bytes: * 3,982,361 - baseline * 3,845,682 - this patch 3.5% reduction
Larger packets, every 3rd packet crosses a page, 1500 bytes: * 3,731,189 - baseline * 3,579,414 - this patch 4% reduction
Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.4.21, v4.7.4 |
|
#
cd17d230 |
| 07-Sep-2016 |
Gal Pressman <galp@mellanox.com> |
net/mlx5e: Fix parsing of vlan packets when updating lro header
Currently vlan tagged packets were not parsed correctly and assumed to be regular IPv4/IPv6 packets. We should check for 802.1Q/802.1a
net/mlx5e: Fix parsing of vlan packets when updating lro header
Currently vlan tagged packets were not parsed correctly and assumed to be regular IPv4/IPv6 packets. We should check for 802.1Q/802.1ad tags and update the lro header accordingly. This fixes the use case where LRO is on and rxvlan is off (vlan stripping is off).
Fixes: e586b3b0baee ('net/mlx5: Ethernet Datapath files') Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|