#
93761ca1 |
| 30-Apr-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: XDP, Avoid indirect call in TX flow
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call when/if CONFIG_RETPOLINE=y.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Revi
net/mlx5e: XDP, Avoid indirect call in TX flow
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call when/if CONFIG_RETPOLINE=y.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
1b698fa5 |
| 28-May-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occu
xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
show more ...
|
#
39d6443c |
| 20-May-2020 |
Björn Töpel <bjorn.topel@intel.com> |
mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common i
mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL
Use the new MEM_TYPE_XSK_BUFF_POOL API in lieu of MEM_TYPE_ZERO_COPY in mlx5e. It allows to drop a lot of code from the driver (which is now common in AF_XDP core and was related to XSK RX frame allocation, DMA mapping, etc.) and slightly improve performance (RX +0.8 Mpps, TX +0.4 Mpps).
rfc->v1: Put back the sanity check for XSK params, use XSK API to get the total headroom size. (Maxim)
v1->v2: Fix DMA address handling, set XDP metadata to invalid. (Maxim)
v2->v3: Handle frame_sz, use xsk_buff_xdp_get_frame_dma, use xsk_buff API for DMA sync on TX, add performance numbers. (Maxim)
v3->v4: Remove unused variable num_xsk_frames. (Jakub)
Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-12-bjorn.topel@gmail.com
show more ...
|
#
a71506a4 |
| 20-May-2020 |
Magnus Karlsson <magnus.karlsson@intel.com> |
xsk: Move driver interface to xdp_sock_drv.h
Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver impleme
xsk: Move driver interface to xdp_sock_drv.h
Move the AF_XDP zero-copy driver interface to its own include file called xdp_sock_drv.h. This, hopefully, will make it more clear for NIC driver implementors to know what functions to use for zero-copy support.
v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub)
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com
show more ...
|
Revision tags: v5.4.32, v5.4.31, v5.4.30, v5.4.29 |
|
#
5ffb4d85 |
| 30-Mar-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Calculate SQ stop room in a robust way
Currently, different formulas are used to estimate the space that may be taken by WQEs in the SQ during a single packet transmit. This space is call
net/mlx5e: Calculate SQ stop room in a robust way
Currently, different formulas are used to estimate the space that may be taken by WQEs in the SQ during a single packet transmit. This space is called stop room, and it's checked in the end of packet transmit to find out if the next packet could overflow the SQ. If it could, the driver tells the kernel to stop sending next packets.
Many factors affect the stop room:
1. Padding with NOPs to avoid WQEs spanning over page boundaries.
2. Enabled and disabled offloads (TLS, upcoming MPWQE).
3. The maximum size of a WQE.
The padding is performed before every WQE if it doesn't fit the current page.
The current formula assumes that only one padding will be required per packet, and it doesn't take into account that the WQEs posted during the transmission of a single packet might exceed the page size in very rare circumstances. For example, to hit this condition with 4096-byte pages, TLS offload will have to interrupt an almost-full MPWQE session, be in the resync flow and try to transmit a near to maximum amount of data.
To avoid SQ overflows in such rare cases after MPWQE is added, this patch introduces a more robust formula to estimate the stop room. The new formula uses the fact that a WQE of size X will not require more than X-1 WQEBBs of padding. More exact estimations are possible, but they result in much more complex and error-prone code for little gain.
Before this patch, the TLS stop room included space for both INNOVA and ConnectX TLS offloads that couldn't run at the same time anyway, so this patch accounts only for the active one.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
d628ee4f |
| 14-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mlx5: Rx queue setup time determine frame_sz for XDP
The mlx5 driver have multiple memory models, which are also changed according to whether a XDP bpf_prog is attached.
The 'rx_striding_rq' settin
mlx5: Rx queue setup time determine frame_sz for XDP
The mlx5 driver have multiple memory models, which are also changed according to whether a XDP bpf_prog is attached.
The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.: # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
On the general case with 4K page_size and regular MTU packet, then the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
The info on the given frame size is stored differently depending on the RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe. In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ). In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is what the XDP case cares about.
To reduce effect on fast-path, this patch determine the frame_sz at setup time, to avoid determining the memory model runtime. Variable is named frame0_sz to make it clear that this is only the frame size of the first fragment.
This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe as it have done a DMA-map on the entire PAGE_SIZE. The driver also already does a XDP length check against sq->hw_mtu on the possible XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
V3+4: Change variable name first_frame_sz to frame0_sz
V2: Fix that frag_size need to be recalc before creating SKB.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945348021.97035.12295039384250022883.stgit@firesoul
show more ...
|
#
ec9cdca0 |
| 16-Apr-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Unify reserving space for WQEs
In our fast-path design, a WQE (Work Queue Element) must not cross the page boundary. To enforce that, for WQEs consisting of more than one BB (Basic Block)
net/mlx5e: Unify reserving space for WQEs
In our fast-path design, a WQE (Work Queue Element) must not cross the page boundary. To enforce that, for WQEs consisting of more than one BB (Basic Block), the driver checks the available contiguous space in the WQ in advance, and if it's not enough, it pads it with NOPs.
This patch modifies the code that calculates the position of next WQE, considering the padding, and prepares the WQE. This code is common for all SQ types. In this patch it's reorganized in a way that makes the usage pattern unified for all SQ types, and makes the implementations self-contained and look almost the same, preparing the repeating code to further attempts to deduplicate it.
One place is left as is: mlx5e_sq_xmit and mlx5e_fill_sq_frag_edge call inside, because it is special in a way that it may also copy WQE's cseg and eseg when reserving space. This will be eliminated in one of the following patches, and this place will be converted to the new approach, too.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12 |
|
#
fed0c6cf |
| 15-Nov-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Fetch WQE: reuse code and enforce typing
There are multiple functions mlx5{e,i}_*_fetch_wqe that contain the same code, that is repeated, because they operate on different SQ struct types
net/mlx5e: Fetch WQE: reuse code and enforce typing
There are multiple functions mlx5{e,i}_*_fetch_wqe that contain the same code, that is repeated, because they operate on different SQ struct types. mlx5e_sq_fetch_wqe also returns void *, instead of the concrete WQE type.
This commit generalizes the fetch WQE operation by putting this code into a single function. To simplify calls of the generic function in concrete use cases, macros are provided that substitute the right WQE size and cast the return type.
Before this patch, fetch_wqe used to calculate pi itself, but the value was often known to the caller. This calculation is moved outside to eliminate this unnecessary step and prepare for the fill_frag_edge refactoring in the next patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
e2e11dbf |
| 09-Feb-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: XDP, Print the offending TX descriptor on error completion
Upon an error completion on an XDP SQ, print the offending WQE to ease the debug process.
Signed-off-by: Tariq Toukan <tariqt@m
net/mlx5e: XDP, Print the offending TX descriptor on error completion
Upon an error completion on an XDP SQ, print the offending WQE to ease the debug process.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
f1b95753 |
| 09-Feb-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: TX, Generalise code and usage of error CQE dump
Error CQE was dumped only for TXQ SQs. Generalise the function, and add usage for error completions on ICO SQs and XDP SQs.
Signed-off-by:
net/mlx5e: TX, Generalise code and usage of error CQE dump
Error CQE was dumped only for TXQ SQs. Generalise the function, and add usage for error completions on ICO SQs and XDP SQs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11 |
|
#
beb3e4b2 |
| 26-Aug-2019 |
Kevin Laatz <kevin.laatz@intel.com> |
mlx5e: modify driver for handling offsets
With the addition of the unaligned chunks option, we need to make sure we handle the offsets accordingly based on the mode we are currently running in. This
mlx5e: modify driver for handling offsets
With the addition of the unaligned chunks option, we need to make sure we handle the offsets accordingly based on the mode we are currently running in. This patch modifies the driver to appropriately mask the address for each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2 |
|
#
7cf6f811 |
| 14-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: XDP, Slight enhancement for WQE fetch function
Instead of passing an output param, let function return the WQE pointer. In addition, pass &pi so it gets its value in the function, and sav
net/mlx5e: XDP, Slight enhancement for WQE fetch function
Instead of passing an output param, let function return the WQE pointer. In addition, pass &pi so it gets its value in the function, and save the redundant assignment that comes after it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2 |
|
#
6c085a8a |
| 12-May-2019 |
Shay Agroskin <shayag@mellanox.com> |
net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller than a certain size (set to 256 bytes) would be
net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller than a certain size (set to 256 bytes) would be sent inline within its WQE TX descriptor (mem-copied), in case the hardware tx queue is congested beyond a pre-defined water-mark.
If a MPWQE cannot contain an additional inline packet, we close this MPWQE session, and send the packet inlined within the next MPWQE. To save some MPWQE session close+open operations, we don't open MPWQE sessions that are contiguously smaller than certain size (set to the HW MPWQE maximum size). If there isn't enough contiguous room in the send queue, we fill it with NOPs and wrap the send queue index around.
This way, qualified packets are always sent inline.
Perf tests: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
With 24 channels: | ------ | bounced packets | inlined packets | inline ratio | | before | 113.6Mpps | 96.3Mpps | 84% | | after | 115Mpps | 99.5Mpps | 86% |
With one channel:
| ------ | bounced packets | inlined packets | inline ratio | | before | 6.7Mpps | 0pps | 0% | | after | 6.8Mpps | 0pps | 0% |
As we can see, there is improvement in both inline ratio and overall packet rate for 24 channels. Also, we see no degradation for the one-channel case.
Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
db05815b |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Add XSK zero-copy support
This commit adds support for AF_XDP zero-copy RX and TX.
We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one
net/mlx5e: Add XSK zero-copy support
This commit adds support for AF_XDP zero-copy RX and TX.
We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels.
XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly.
Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered.
Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage.
We create a dedicated XSK SQ in the channel. This separation has its advantages:
1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it.
2. Calculating statistics separately.
When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core.
Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers.
LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time.
The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows:
1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down.
2. MTU changes while there are UMEMs registered.
Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation.
The performance testing was performed on a machine with the following configuration:
- 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link
The results with retpoline disabled, single stream:
txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps
The results with retpoline enabled, single stream:
txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
a011b49f |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Consider XSK in XDP MTU limit calculation
Use the existing mlx5e_get_linear_rq_headroom function to calculate the headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom into
net/mlx5e: Consider XSK in XDP MTU limit calculation
Use the existing mlx5e_get_linear_rq_headroom function to calculate the headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom into consideration, which will be used in the following patches.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
84a0a231 |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: XDP_TX from UMEM support
When an XDP program returns XDP_TX, and the RQ is XSK-enabled, it requires careful handling, because convert_to_xdp_frame creates a new page and copies the data t
net/mlx5e: XDP_TX from UMEM support
When an XDP program returns XDP_TX, and the RQ is XSK-enabled, it requires careful handling, because convert_to_xdp_frame creates a new page and copies the data there, while our driver expects the xdp_frame to point to the same memory as the xdp_buff. Handle this case separately: map the page, and in the end unmap it and call xdp_return_frame.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
b9673cf5 |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Share the XDP SQ for XDP_TX between RQs
Put the XDP SQ that is used for XDP_TX into the channel. It used to be a part of the RQ, but with introduction of AF_XDP there will be one more RQ
net/mlx5e: Share the XDP SQ for XDP_TX between RQs
Put the XDP SQ that is used for XDP_TX into the channel. It used to be a part of the RQ, but with introduction of AF_XDP there will be one more RQ that could share the same XDP SQ. This patch is a preparation for that change.
Separate XDP_TX statistics per RQ were implemented in one of the previous patches.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
d963fa15 |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Refactor struct mlx5e_xdp_info
Currently, struct mlx5e_xdp_info has some issues that have to be cleaned up before the upcoming AF_XDP support makes things too complicated and messy. This
net/mlx5e: Refactor struct mlx5e_xdp_info
Currently, struct mlx5e_xdp_info has some issues that have to be cleaned up before the upcoming AF_XDP support makes things too complicated and messy. This structure is used both when sending the packet and on completion. Moreover, the cleanup procedure on completion depends on the origin of the packet (XDP_REDIRECT, XDP_TX). Adding AF_XDP support will add new flows that use this structure even differently. To avoid overcomplicating the code, this commit refactors the usage of this structure in the following ways:
1. struct mlx5e_xdp_info is split into two different structures. One is struct mlx5e_xdp_xmit_data, a transient structure that doesn't need to be stored and is only used while sending the packet. The other is still struct mlx5e_xdp_info that is stored in a FIFO and contains the fields needed on completion.
2. The fields of struct mlx5e_xdp_info that are used in different flows are put into a union. A special enum indicates the cleanup mode and helps choose the right union member. This approach is clear and explicit. Although it could be possible to "guess" the mode by looking at the values of the fields and at the XDP SQ type, it wouldn't be that clear and extendable and would require looking through the whole chain to understand what's going on.
For the reference, there are the fields of struct mlx5e_xdp_info that are used in different flows (including AF_XDP ones):
Packet origin | Fields used on completion | Cleanup steps -----------------------+---------------------------+------------------ XDP_REDIRECT, | xdpf, dma_addr | DMA unmap and XDP_TX from XSK RQ | | xdp_return_frame. -----------------------+---------------------------+------------------ XDP_TX from regular RQ | di | Recycle page. -----------------------+---------------------------+------------------ AF_XDP TX | (none) | Increment the | | producer index in | | Completion Ring.
On send, the same set of mlx5e_xdp_xmit_data fields is used in all flows: DMA and virtual addresses and length.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
6ed9350f |
| 26-Jun-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Replace deprecated PCI_DMA_TODEVICE
The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE.
Signed-of
net/mlx5e: Replace deprecated PCI_DMA_TODEVICE
The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
Revision tags: v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0 |
|
#
33e10924 |
| 01-Mar-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Put the common XDP code into a function
The same code that returns XDP frames and releases pages is used both in mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that cle
net/mlx5e: Put the common XDP code into a function
The same code that returns XDP frames and releases pages is used both in mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that cleans up an MPWQE.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
c2273219 |
| 14-Mar-2019 |
Shay Agroskin <shayag@mellanox.com> |
net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, th
net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark.
This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE).
To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU.
Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
* Tested with hyper-threading disabled
XDP_TX:
| | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same |
XDP_REDIRECT:
** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch.
| | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same |
As we can see, feature significantly improves scaling, without hurting single ring performance.
Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v4.19.26 |
|
#
73cab880 |
| 25-Feb-2019 |
Shay Agroskin <shayag@mellanox.com> |
net/mlx5e: XDP, Add TX MPWQE session counter
This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: Sha
net/mlx5e: XDP, Add TX MPWQE session counter
This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
15143bf5 |
| 10-Mar-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush
The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue.
For this, use a new bit on rq->flags.
Signed
net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush
The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue.
For this, use a new bit on rq->flags.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
d460c271 |
| 08-Apr-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Fix the max MTU check in case of XDP
MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ. This commi
net/mlx5e: Fix the max MTU check in case of XDP
MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ. This commit fixes the calculations and adds a brief explanation for the formula used.
Fixes: a26a5bdf3ee2d ("net/mlx5e: Restrict the combination of large MTU and XDP") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
#
12fc512f |
| 15-Mar-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Fix use-after-free after xdp_return_frame
xdp_return_frame releases the frame. It leads to releasing the page, so it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf i
net/mlx5e: Fix use-after-free after xdp_return_frame
xdp_return_frame releases the frame. It leads to releasing the page, so it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves the memory access to precede the return of the frame.
Fixes: 58b99ee3e3ebe ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|