Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13 |
|
#
7aa50380 |
| 21-Feb-2023 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Fix SQ wake logic in ptp napi_poll context
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken up. Before change, the ptp tx sq may never wake up if the ptp tx ts s
net/mlx5e: Fix SQ wake logic in ptp napi_poll context
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.
Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
c1783e74 |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: XDP, Add support for multi-buffer XDP redirect-in
Handle multi-buffer XDP redirect-in requests coming through mlx5e_xdp_xmit.
Extend struct mlx5e_xmit_data_frags with an additional dma_a
net/mlx5e: XDP, Add support for multi-buffer XDP redirect-in
Handle multi-buffer XDP redirect-in requests coming through mlx5e_xdp_xmit.
Extend struct mlx5e_xmit_data_frags with an additional dma_arr field, to point to the fragments dma mapping, as they cannot be retrieved via the page_pool_get_dma_addr() function.
Push a dma_addr xdpi instance per each fragment, and use them in the completion flow to dma_unmap the frags.
Finally, remove the restriction in mlx5e_open_xdpsq, and set the flag in xdp_features.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
eb9b9fdc |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Introduce extended version for mlx5e_xmit_data
Introduce struct mlx5e_xmit_data_frags to be used for non-linear xmit buffers. Let it include sinfo pointer.
Take one bit from the len fiel
net/mlx5e: Introduce extended version for mlx5e_xmit_data
Introduce struct mlx5e_xmit_data_frags to be used for non-linear xmit buffers. Let it include sinfo pointer.
Take one bit from the len field to indicate if the descriptor has fragments and can be casted-up into the extended version.
Zero-init to make sure has_frags, and potentially future fields, are zero when not explicitly assigned.
Another field will be added in a downstream patch to indicate and point to dma addresses of the different frags, for redirect-in requests.
This simplifies the mlx5e_xmit_xdp_frame/mlx5e_xmit_xdp_frame_mpwqe functions params.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e32654f1 |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Move struct mlx5e_xmit_data to datapath header
Move TX datapath struct from the generic en.h to the datapath txrx.h header, where it belongs.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.c
net/mlx5e: Move struct mlx5e_xmit_data to datapath header
Move TX datapath struct from the generic en.h to the datapath txrx.h header, where it belongs.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v6.2, v6.1.12 |
|
#
4c2a1323 |
| 14-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can get released in two ways:
1) In the mlx5e driver when trimm
net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can get released in two ways:
1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true).
2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false).
Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1.
This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases:
* On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose.
* On rq shutdown, make sure that all wqes that were not in the linked list are released.
For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %:
+----------------------------------------------+ | Page Pool stats (/sec) | Before | After | +-------------------------+---------+----------+ |rx_pp_alloc_fast | 2137754 | 2261033 | |rx_pp_alloc_slow | 47 | 9 | |rx_pp_alloc_empty | 47 | 9 | |rx_pp_alloc_refill | 23230 | 819 | |rx_pp_alloc_waive | 0 | 0 | |rx_pp_recycle_cached | 672182 | 2209015 | |rx_pp_recycle_cache_full | 1789 | 0 | |rx_pp_recycle_ring | 1485848 | 52259 | |rx_pp_recycle_ring_full | 3003 | 584 | +----------------------------------------------+
With this patch, the performance in striding rq for the above test is back to baseline.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
6f574284 |
| 18-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to the page pool and stop using atomic page reference counting.
net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to the page pool and stop using atomic page reference counting.
The mlx5e driver used to manage in-flight pages using page refcounting: for each fragment there were 2 atomic write operations happening (one for building the skb and one on skb release).
The page_pool api introduced a method to track page fragments more optimally: * The page's pp_fragment_count is set to a large bias on page alloc (1 x atomic write operation). * The driver tracks the actual page fragments in a non atomic variable. * When the skb is recycled, pp_fragment_count is decremented (atomic write operation). * When page is released in the driver, the unused number of fragments (relative to the bias) is deducted from pp_fragment_count (atomic write operation). * Last page defragmentation will only be an atomic read.
So in total there are `number of fragments + 1` atomic write ops. As opposed to previously: `2 * frags` atomic writes ops.
Pages are wrapped in a mlx5e_frag_page structure which also contains the number of fragments. This makes it easy to count the fragments in the driver.
This change brings performance improvements for the case when the old rx page_cache had low recycling rates due to head of queue blocking. For a iperf3 TCP test with a single stream, on a single core (iperf and receive queue running on same core), the following improvements can be noticed:
* Striding rq: - before (net-next baseline): bitrate = 30.1 Gbits/sec - after : bitrate = 31.4 Gbits/sec (diff: 4.14 %)
* Legacy rq: - before (net-next baseline): bitrate = 30.2 Gbits/sec - after : bitrate = 33.0 Gbits/sec (diff: 8.48 %)
There are 2 temporary performance degradations introduced:
1) TCP streams that had a good recycling rate with the old page_cache have a degradation for both striding and linear rq. This is due to very low page pool cache recycling: the pages are released during skb recycle which will release pages to the page pool ring for safety. The following patches in this series will tackle this problem by deferring the page release in the driver to increase the chance of having pages recycled to the cache.
2) XDP performance is now lower (4-5 %) due to the higher number of atomic operations used for fragment management. But this opens the door for supporting multiple packets per page in XDP, which will bring a big gain.
Otherwise, performance is similar to baseline.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
4a5c5e25 |
| 14-Dec-2022 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the page_pool api do it.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Re
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the page_pool api do it.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
d39092ca |
| 30-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
This change removes the usage of mlx5e_alloc_unit union for striding rq. The change is more straightforward than legacy rq as the a
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
This change removes the usage of mlx5e_alloc_unit union for striding rq. The change is more straightforward than legacy rq as the alloc units union is already in place.
This patch only moves things around: instead of an array of unions make it a union of arrays. This has the effect that each mlx5e_mpw_info will allocate the largest possible size of the array member. For xsk this means that the array of xdp_buff pointers for the wqe will still be contiguous, but there will be some extra unused bytes at the end of the array.
As further patch in the series will add the mlx5e_frag_page struct for which the described size constraint will no longer hold.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
aa98d15e |
| 14-Mar-2023 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Utilize the entire fifo
Previous check was comparing against the fifo mask. The mask is size of the fifo (power of two) minus one, so a less than or equal comparator should be used for ch
net/mlx5e: Utilize the entire fifo
Previous check was comparing against the fifo mask. The mask is size of the fifo (power of two) minus one, so a less than or equal comparator should be used for checking if the fifo has room for the SKB.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3a50cf1e |
| 02-Feb-2023 |
Vadim Fedorenko <vadfed@meta.com> |
mlx5: fix possible ptp queue fifo use-after-free
Fifo indexes are not checked during pop operations and it leads to potential use-after-free when poping from empty queue. Such case was possible duri
mlx5: fix possible ptp queue fifo use-after-free
Fifo indexes are not checked during pop operations and it leads to potential use-after-free when poping from empty queue. Such case was possible during re-sync action. WARN_ON_ONCE covers future cases.
There were out-of-order cqe spotted which lead to drain of the queue and use-after-free because of lack of fifo pointers check. Special check and counter are added to avoid resync operation if SKB could not exist in the fifo because of OOO cqe (skb_id must be between consumer and producer index).
Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
e435941b |
| 02-Feb-2023 |
Vadim Fedorenko <vadfed@meta.com> |
mlx5: fix skb leak while fifo resync and push
During ptp resync operation SKBs were poped from the fifo but were never freed neither by napi_consume nor by dev_kfree_skb_any. Add call to napi_consum
mlx5: fix skb leak while fifo resync and push
During ptp resync operation SKBs were poped from the fifo but were never freed neither by napi_consume nor by dev_kfree_skb_any. Add call to napi_consume_skb to properly free SKBs.
Another leak was happening because mlx5e_skb_fifo_has_room() had an error in the check. Comparing free running counters works well unless C promotes the types to something wider than the counter. In this case counters are u16 but the result of the substraction is promouted to int and it causes wrong result (negative value) of the check when producer have already overlapped but consumer haven't yet. Explicit cast to u16 fixes the issue.
Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp") Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
b5618a6b |
| 15-Feb-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
The last usage was removed as part of commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support").
Signed-off-by: Tariq Toukan <tariqt@nvid
net/mlx5e: Remove unused function mlx5e_sq_xmit_simple
The last usage was removed as part of commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support").
Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
bc8d405b |
| 19-Jan-2023 |
Toke Høiland-Jørgensen <toke@redhat.com> |
net/mlx5e: Support RX XDP metadata
Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe pointer to the mlx5e_skb_from* functions so it can be retrieved from the XDP ctx to do th
net/mlx5e: Support RX XDP metadata
Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe pointer to the mlx5e_skb_from* functions so it can be retrieved from the XDP ctx to do this.
Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230119221536.3349901-17-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
Revision tags: v6.0.13, v6.1, v6.0.12, v6.0.11 |
|
#
022dbea0 |
| 28-Nov-2022 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Suppress Send WQEBB room warning for PAGE_SIZE >= 16KB
Send WQEBB size is 64 bytes and the max number of WQEBBs for an SQ is 255. For 16KB pages and greater, there is always sufficient sp
net/mlx5e: Suppress Send WQEBB room warning for PAGE_SIZE >= 16KB
Send WQEBB size is 64 bytes and the max number of WQEBBs for an SQ is 255. For 16KB pages and greater, there is always sufficient spaces for all WQEBBs of an SQ. Cast mlx5e_get_max_sq_wqebbs(mdev) to u16. Prevents -Wtautological-constant-out-of-range-compare warnings from occurring when PAGE_SIZE >= 16KB.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77 |
|
#
f9c955b4 |
| 03-Nov-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Add missing sanity checks for max TX WQE size
The commit cited below started using the firmware capability for the maximum TX WQE size. This commit adds an important check to verify that
net/mlx5e: Add missing sanity checks for max TX WQE size
The commit cited below started using the firmware capability for the maximum TX WQE size. This commit adds an important check to verify that the driver doesn't attempt to exceed this capability, and also restores another check mistakenly removed in the cited commit (a WQE must not exceed the page size).
Fixes: c27bd1718c06 ("net/mlx5e: Read max WQEBBs on the SQ from firmware") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.76, v6.0.6 |
|
#
19b43a43 |
| 26-Oct-2022 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Extend SKB room check to include PTP-SQ
When tx_port_ts is set, the driver diverts all UPD traffic over PTP port to a dedicated PTP-SQ. The SKBs are cached until the wire-CQE arrives. Whe
net/mlx5e: Extend SKB room check to include PTP-SQ
When tx_port_ts is set, the driver diverts all UPD traffic over PTP port to a dedicated PTP-SQ. The SKBs are cached until the wire-CQE arrives. When the packet size is greater then MTU, the firmware might drop it and the packet won't be transmitted to the wire, hence the wire-CQE won't reach the driver. In this case the SKBs are accumulated in the SKB fifo. Add room check to consider the PTP-SQ SKB fifo, when the SKB fifo is full, driver stops the queue resulting in a TX timeout. Devlink TX-reporter can recover from it.
Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0 |
|
#
cf544517 |
| 30-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and
net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ
XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and creates an optimized flow for XSK. A side effect is an opportunity to optimize the regular RX flow by dropping branching for XSK cases.
Performance improvement is up to 6.4% in the aligned mode and up to 7.5% in the unaligned mode.
Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps
CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.71 |
|
#
4c78782e |
| 27-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: kTLS, Check ICOSQ WQE size in advance
Instead of WARNing in runtime when TLS offload WQEs posted to ICOSQ are over the hardware limit, check their size before enabling TLS RX offload, and
net/mlx5e: kTLS, Check ICOSQ WQE size in advance
Instead of WARNing in runtime when TLS offload WQEs posted to ICOSQ are over the hardware limit, check their size before enabling TLS RX offload, and block the offload if the condition fails. It also allows to drop a u16 field from struct mlx5e_icosq.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
21a0502d |
| 27-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Use the aligned max TX MPWQE size
TX MPWQE size is limited to the cacheline-aligned maximum. Use the same value for the stop room and the capability check.
Signed-off-by: Maxim Mikityans
net/mlx5e: Use the aligned max TX MPWQE size
TX MPWQE size is limited to the cacheline-aligned maximum. Use the same value for the stop room and the capability check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18 |
|
#
ddc87e7d |
| 28-Jan-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Store DMA address inside struct page
Use page_pool_set_dma_addr() to store the DMA address of a page inside struct page, in order to avoid passing struct mlx5e_dma_info to XDP handlers. P
net/mlx5e: Store DMA address inside struct page
Use page_pool_set_dma_addr() to store the DMA address of a page inside struct page, in order to avoid passing struct mlx5e_dma_info to XDP handlers. Previously, struct mlx5e_dma_info was used to pass both the DMA address and the page, and it worked well for the single-fragment case.
When XDP multi buffer is in use, and a fragmented xdp_frame has to be transmitted, the driver needs to know the DMA addresses of fragments, however, the array of fragments in struct skb_shared_info doesn't contain them. In order to pass the DMA addresses, the driver puts them into struct page itself, which is accessible from the array of fragments in struct skb_shared_info. The existing XDP handlers are modified to remove the dependency on struct mlx5e_dma_info.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.17 |
|
#
6b23f6ab |
| 25-Jan-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Move mlx5e_select_queue to en/selq.c
This commit moves mlx5e_select_queue and all stuff related to ndo_select_queue to en/selq.c to put all stuff working with selq into a separate file.
net/mlx5e: Move mlx5e_select_queue to en/selq.c
This commit moves mlx5e_select_queue and all stuff related to ndo_select_queue to en/selq.c to put all stuff working with selq into a separate file.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36 |
|
#
76c31e5f |
| 10-May-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Use FW limitation for max MPW WQEBBs
Calculate maximal count of MPW WQEBBs on SQ's creation and store it there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS. Update mlx5e_tx
net/mlx5e: Use FW limitation for max MPW WQEBBs
Calculate maximal count of MPW WQEBBs on SQ's creation and store it there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS. Update mlx5e_tx_mpwqe_is_full() and mlx5e_xdp_mpqwe_is_full() .
Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
c27bd171 |
| 17-Jan-2022 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Read max WQEBBs on the SQ from firmware
Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks, where WQE is a Work Queue Element) on the TX side was assumed to be 16 (fix
net/mlx5e: Read max WQEBBs on the SQ from firmware
Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks, where WQE is a Work Queue Element) on the TX side was assumed to be 16 (fixed value). All firmware versions till today comply to this. In order to be more flexible and resilient, read from FW the corresponding: max_wqe_sz_sq. This value describes the maximum WQE size given in bytes, thus max WQEBBs is given by the division in WQEBB's byte size. The driver uses the top between 16 and the division result. This ensures synchronization between driver and firmware and avoids unexpected behavior. Store this value on the different SQs (Send Queues) for easy access.
Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
b8d91145 |
| 26-Jan-2022 |
Khalid Manaa <khalidm@nvidia.com> |
net/mlx5e: Fix wrong calculation of header index in HW_GRO
The HW doesn't wrap the CQE.shampo.header_index field according to the headers buffer size, instead it always increases it until reaching o
net/mlx5e: Fix wrong calculation of header index in HW_GRO
The HW doesn't wrap the CQE.shampo.header_index field according to the headers buffer size, instead it always increases it until reaching overflow of u16 size.
Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the CQE header_index field to find the actual header index in the headers buffer.
Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support") Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10 |
|
#
64509b05 |
| 14-Sep-2020 |
Ben Ben-Ishay <benishay@nvidia.com> |
net/mlx5e: Add data path for SHAMPO feature
The header buffer is used to store the headers of the rx packets. The header buffer size deduced from WorkQueue size + restriction of max packets per Work
net/mlx5e: Add data path for SHAMPO feature
The header buffer is used to store the headers of the rx packets. The header buffer size deduced from WorkQueue size + restriction of max packets per WorkQueueElement. This commit adds the functionality for posting/updating memory for the header buffer during the posting/updating of WQEs.
Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|