Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6 |
|
#
ef9369e9 |
| 29-Sep-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rq
When a page allocation fails during refill in mlx5e_refill_rx_wqes, the page will be released again on the next refill call. Th
net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rq
When a page allocation fails during refill in mlx5e_refill_rx_wqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below:
[ 338.326070] WARNING: CPU: 4 PID: 0 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 338.328993] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329094] Call Trace: [ 338.329097] <IRQ> [ 338.329100] ? __warn+0x7d/0x120 [ 338.329105] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329173] ? report_bug+0x155/0x180 [ 338.329179] ? handle_bug+0x3c/0x60 [ 338.329183] ? exc_invalid_op+0x13/0x60 [ 338.329187] ? asm_exc_invalid_op+0x16/0x20 [ 338.329192] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329259] mlx5e_post_rx_wqes+0x210/0x5a0 [mlx5_core] [ 338.329327] ? mlx5e_poll_rx_cq+0x88/0x6f0 [mlx5_core] [ 338.329394] mlx5e_napi_poll+0x127/0x6b0 [mlx5_core] [ 338.329461] __napi_poll+0x25/0x1a0 [ 338.329465] net_rx_action+0x28a/0x300 [ 338.329468] __do_softirq+0xcd/0x279 [ 338.329473] irq_exit_rcu+0x6a/0x90 [ 338.329477] common_interrupt+0x82/0xa0 [ 338.329482] </IRQ>
This patch fixes the legacy rq case by releasing all allocated fragments and then setting the skip flag on all released fragments. It is important to note that the number of released fragments will be higher than the number of allocated fragments when an allocation error occurs.
Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
be43b748 |
| 02-Oct-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq
When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the page will be released again on the next refill call.
net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rq
When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below:
[ 2436.447717] WARNING: CPU: 1 PID: 2419 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 2436.447895] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.447991] Call Trace: [ 2436.447975] mlx5e_post_rx_mpwqes+0x1d5/0xcf0 [mlx5_core] [ 2436.447994] <IRQ> [ 2436.447996] ? __warn+0x7d/0x120 [ 2436.448009] ? mlx5e_handle_rx_cqe_mpwrq+0x109/0x1d0 [mlx5_core] [ 2436.448002] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448044] ? mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] [ 2436.448061] ? report_bug+0x155/0x180 [ 2436.448065] ? handle_bug+0x36/0x70 [ 2436.448067] ? exc_invalid_op+0x13/0x60 [ 2436.448070] ? asm_exc_invalid_op+0x16/0x20 [ 2436.448079] mlx5e_napi_poll+0x122/0x6b0 [mlx5_core] [ 2436.448077] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448113] ? generic_exec_single+0x35/0x100 [ 2436.448117] __napi_poll+0x25/0x1a0 [ 2436.448120] net_rx_action+0x28a/0x300 [ 2436.448122] __do_softirq+0xcd/0x279 [ 2436.448126] irq_exit_rcu+0x6a/0x90 [ 2436.448128] sysvec_apic_timer_interrupt+0x6e/0x90 [ 2436.448130] </IRQ>
This patch fixes the striding rq case by setting the skip flag on all the wqe pages that were expected to have new pages allocated.
Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44 |
|
#
a9ca9f9c |
| 04-Aug-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
page_pool: split types and declarations from page_pool.h
Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and
page_pool: split types and declarations from page_pool.h
Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.43 |
|
#
33b18a0f |
| 31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Change the parameter of IPsec RX skb handle function
Refactor the function to pass in reg B value only.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leo
net/mlx5e: Change the parameter of IPsec RX skb handle function
Refactor the function to pass in reg B value only.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/3b3c53f64660d464893eaecc41298b1ce49c6baa.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32 |
|
#
7abd955a |
| 31-May-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
Currently mlx5e releases pages directly to the page_pool for XDP_TX and does page fragment counting for XDP_REDIRECT. RX pages from the pa
net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
Currently mlx5e releases pages directly to the page_pool for XDP_TX and does page fragment counting for XDP_REDIRECT. RX pages from the page_pool are leaking on XDP_REDIRECT because the xdp core will release only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page is marked as "skip release" which avoids the driver release.
A fix would be to take an extra fragment for XDP_REDIRECT and not set the "skip release" bit so that the release on the driver side can handle the remaining bias fragments. But this would be a shortsighted solution. Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to always do fragment tracking. The "skip release" bit is no longer necessary for XDP.
Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.31, v6.1.30 |
|
#
2e2d1965 |
| 22-May-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK close. If the same regular RQ is closed (a config c
net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK close. If the same regular RQ is closed (a config change for example) soon after the XSK close, a double release occurs because the missing wqes get released a second time.
Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.29, v6.1.28 |
|
#
b51f4113 |
| 10-May-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
net: introduce and use skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc
net: introduce and use skb_frag_fill_page_desc()
Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag.
Introduce skb_frag_fill_page_desc() to do that.
net/bpf/test_run.c does not call skb_frag_off_set() to set the offset, "copy_from_user(page_address(page), ...)" and 'shinfo' being part of the 'data' kzalloced in bpf_test_init() suggest that it is assuming offset to be initialized as zero, so call skb_frag_fill_page_desc() with offset being zero for this case.
Also, skb_frag_set_page() is not used anymore, so remove it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23 |
|
#
40afb3b1 |
| 03-Apr-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
When the XDP handler marks the data for transmission (XDP_TX), it is incorrect to release the page fragment. Instead, the fragment
net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
When the XDP handler marks the data for transmission (XDP_TX), it is incorrect to release the page fragment. Instead, the fragments should be marked as MLX5E_WQE_FRAG_SKIP_RELEASE because XDP will release the page directly to the page_pool (page_pool_put_defragged_page) after TX completion.
The linear case already does this. This patch fixes the nonlinear part as well.
Also, the looping over the fragments was incorrect: When handling pages after XDP_TX in the legacy rq nonlinear case, the loop was skipping the first wqe fragment.
Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
c8e90902 |
| 30-Mar-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
mlx5e_free_rx_descs is responsible for calling the dealloc_wqe op which returns pages to the page_pool. This can happen during flus
net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
mlx5e_free_rx_descs is responsible for calling the dealloc_wqe op which returns pages to the page_pool. This can happen during flush or close. For XSK, the regular RQ is flushed (when replaced by the XSK RQ) and also closed later. This is normally not a problem as the wqe list is empty on a second call to mlx5e_free_rx_descs. However, for striding RQ, the previously released wqes from the list will appear as missing and will be released a second time by mlx5e_free_rx_missing_descs.
This patch sets the no release bits on the striding RQ wqes in the dealloc_wqe op to prevent releasing the pages a second time.
Please note that the bits are set only in the control path during close and not in the data path.
Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
f52ac702 |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ
Here we add support for multi-buffer XDP handling in Striding RQ, which is our default out-of-the-box RQ type. Before this series, loading
net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ
Here we add support for multi-buffer XDP handling in Striding RQ, which is our default out-of-the-box RQ type. Before this series, loading such an XDP program would fail, until you switch to the legacy RQ (by unsetting the rx_striding_rq priv-flag).
To overcome the lack of headroom and tailroom between the strides, we allocate a side page to be used for the descriptor (xdp_buff / skb) and the linear part. When an XDP program is attached, we structure the xdp_buff so that it contains no data in the linear part, and the whole packet resides in the fragments.
In case of XDP_PASS, where an SKB still needs to be created, we copy up to 256 bytes to its linear part, to match the current behavior, and satisfy functions that assume finding the packet headers in the SKB linear part (like eth_type_trans).
Performance testing:
Packet rate test, 64 bytes, 32 channels, MTU 9000 bytes. CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz. NIC: ConnectX-6 Dx, at 100 Gbps.
+----------+-------------+-------------+---------+ | Test | Legacy RQ | Striding RQ | Speedup | +----------+-------------+-------------+---------+ | XDP_DROP | 101,615,544 | 117,191,020 | +15% | +----------+-------------+-------------+---------+ | XDP_TX | 95,608,169 | 117,043,422 | +22% | +----------+-------------+-------------+---------+
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2cb0e27d |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: RX, Prepare non-linear striding RQ for XDP multi-buffer support
In preparation for supporting XDP multi-buffer in striding RQ, use xdp_buff struct to describe the packet. Make its skb_sha
net/mlx5e: RX, Prepare non-linear striding RQ for XDP multi-buffer support
In preparation for supporting XDP multi-buffer in striding RQ, use xdp_buff struct to describe the packet. Make its skb_shared_info collide the one of the allocated SKB, then add the fragments using the xdp_buff API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
221c8c7a |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: RX, Generalize mlx5e_fill_mxbuf()
Make the function more generic. Let it get an additional frame_sz parameter instead of deriving it from the RQ struct.
No functional change here, just a
net/mlx5e: RX, Generalize mlx5e_fill_mxbuf()
Make the function more generic. Let it get an additional frame_sz parameter instead of deriving it from the RQ struct.
No functional change here, just a preparation for a downstream patch.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
27602319 |
| 17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: RX, Take shared info fragment addition into a function
Introduce mlx5e_add_skb_shared_info_frag(), a function dedicated for adding a fragment into a struct skb_shared_info object.
Use it
net/mlx5e: RX, Take shared info fragment addition into a function
Introduce mlx5e_add_skb_shared_info_frag(), a function dedicated for adding a fragment into a struct skb_shared_info object.
Use it in the Legacy RQ flow. Similar usage will be added in a downstream patch by the corresponding Striding RQ flow.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13 |
|
#
3905f8d6 |
| 22-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
The recycle parameter used during page release is no longer necessary: the page pool can detect when the page cannot be recyc
net/mlx5e: RX, Remove unnecessary recycle parameter and page_cache stats
The recycle parameter used during page release is no longer necessary: the page pool can detect when the page cannot be recycled to the cache or ring without any outside hint.
The page pool will also take care of cleaning up after itself once all the inflight pages have been released. So no need to explicitly release pages to the system.
Remove the internal page_cache stats as the mlx5e_page_cache struct no longer exists.
Delete the documentation entries along with the stats.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
cd640b05 |
| 21-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
To avoid overflowing the page pool's cache, don't release the whole bulk which is usually larger than the cache refill size. Group release+
net/mlx5e: RX, Break the wqe bulk refill in smaller chunks
To avoid overflowing the page pool's cache, don't release the whole bulk which is usually larger than the cache refill size. Group release+alloc instead into cache refill units that allow releasing to the cache and then allocating from the cache.
A refill_unit variable is added as a iteration unit over the wqe_bulk when doing release+alloc.
For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 0% to 52%:
+---------------------------------------------+ | Page Pool stats (/sec) | Before | After | +-------------------------+---------+---------+ |rx_pp_alloc_fast | 2145422 | 2193802 | |rx_pp_alloc_slow | 2 | 0 | |rx_pp_alloc_empty | 2 | 0 | |rx_pp_alloc_refill | 34059 | 16634 | |rx_pp_alloc_waive | 0 | 0 | |rx_pp_recycle_cached | 0 | 1145818 | |rx_pp_recycle_cache_full | 0 | 0 | |rx_pp_recycle_ring | 2179361 | 1064616 | |rx_pp_recycle_ring_full | 121 | 0 | +---------------------------------------------+
With this patch, the performance for legacy rq for the above test is back to baseline.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
76238d0f |
| 21-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
Don't mix xsk buffer releases with page releases anymore. This is needed for handling of deferred page release.
Add a new bulk fr
net/mlx5e: RX, Split off release path for xsk buffers for legacy rq
Don't mix xsk buffer releases with page releases anymore. This is needed for handling of deferred page release.
Add a new bulk free function for xsk buffers from wqe frags.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
3f93f829 |
| 21-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Defer page release in legacy rq for better recycling
Currently, fragmented pages from the page pool can be released in two ways:
1) In the mlx5e driver when trimming off the unused f
net/mlx5e: RX, Defer page release in legacy rq for better recycling
Currently, fragmented pages from the page pool can be released in two ways:
1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true).
2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false).
Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1.
This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. A flag is added to make sure that there's no release before first alloc and that XDP_TX fragments are not released prematurely.
This is a preparation patch that doesn't unlock the performance improvements yet. A followup patch will do that.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
625dff29 |
| 21-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
Change the bool flag to a bitfield as we'll use it in a downstream patch in the series to add signaling about skipping a fragment
net/mlx5e: RX, Change wqe last_in_page field from bool to bit flags
Change the bool flag to a bitfield as we'll use it in a downstream patch in the series to add signaling about skipping a fragment release.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.2, v6.1.12 |
|
#
4c2a1323 |
| 14-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can get released in two ways:
1) In the mlx5e driver when trimm
net/mlx5e: RX, Defer page release in striding rq for better recycling
Currently, for striding RQ, fragmented pages from the page pool can get released in two ways:
1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true).
2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false).
Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1.
This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases:
* On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose.
* On rq shutdown, make sure that all wqes that were not in the linked list are released.
For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %:
+----------------------------------------------+ | Page Pool stats (/sec) | Before | After | +-------------------------+---------+----------+ |rx_pp_alloc_fast | 2137754 | 2261033 | |rx_pp_alloc_slow | 47 | 9 | |rx_pp_alloc_empty | 47 | 9 | |rx_pp_alloc_refill | 23230 | 819 | |rx_pp_alloc_waive | 0 | 0 | |rx_pp_recycle_cached | 672182 | 2209015 | |rx_pp_recycle_cache_full | 1789 | 0 | |rx_pp_recycle_ring | 1485848 | 52259 | |rx_pp_recycle_ring_full | 3003 | 584 | +----------------------------------------------+
With this patch, the performance in striding rq for the above test is back to baseline.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
38a36efc |
| 17-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
The xdp_xmit_bitmap currently serves only one purpose: to avoid releasing pages that are still in use due to XDP TX.
A following patch w
net/mlx5e: RX, Rename xdp_xmit_bitmap to a more generic name
The xdp_xmit_bitmap currently serves only one purpose: to avoid releasing pages that are still in use due to XDP TX.
A following patch will use this bitmap in a slightly different context but for the same purpose. So rename the bitmap to a more generic name that reflects the purpose not the context.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.11, v6.1.10, v6.1.9, v6.1.8 |
|
#
6f574284 |
| 18-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to the page pool and stop using atomic page reference counting.
net/mlx5e: RX, Enable skb page recycling through the page_pool
Start using the page_pool skb recycling api to recycle all pages back to the page pool and stop using atomic page reference counting.
The mlx5e driver used to manage in-flight pages using page refcounting: for each fragment there were 2 atomic write operations happening (one for building the skb and one on skb release).
The page_pool api introduced a method to track page fragments more optimally: * The page's pp_fragment_count is set to a large bias on page alloc (1 x atomic write operation). * The driver tracks the actual page fragments in a non atomic variable. * When the skb is recycled, pp_fragment_count is decremented (atomic write operation). * When page is released in the driver, the unused number of fragments (relative to the bias) is deducted from pp_fragment_count (atomic write operation). * Last page defragmentation will only be an atomic read.
So in total there are `number of fragments + 1` atomic write ops. As opposed to previously: `2 * frags` atomic writes ops.
Pages are wrapped in a mlx5e_frag_page structure which also contains the number of fragments. This makes it easy to count the fragments in the driver.
This change brings performance improvements for the case when the old rx page_cache had low recycling rates due to head of queue blocking. For a iperf3 TCP test with a single stream, on a single core (iperf and receive queue running on same core), the following improvements can be noticed:
* Striding rq: - before (net-next baseline): bitrate = 30.1 Gbits/sec - after : bitrate = 31.4 Gbits/sec (diff: 4.14 %)
* Legacy rq: - before (net-next baseline): bitrate = 30.2 Gbits/sec - after : bitrate = 33.0 Gbits/sec (diff: 8.48 %)
There are 2 temporary performance degradations introduced:
1) TCP streams that had a good recycling rate with the old page_cache have a degradation for both striding and linear rq. This is due to very low page pool cache recycling: the pages are released during skb recycle which will release pages to the page pool ring for safety. The following patches in this series will tackle this problem by deferring the page release in the driver to increase the chance of having pages recycled to the cache.
2) XDP performance is now lower (4-5 %) due to the higher number of atomic operations used for fragment management. But this opens the door for supporting multiple packets per page in XDP, which will bring a big gain.
Otherwise, performance is similar to baseline.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
4a5c5e25 |
| 14-Dec-2022 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the page_pool api do it.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Re
net/mlx5e: RX, Enable dma map and sync from page_pool allocator
Remove driver dma mapping and unmapping of pages. Let the page_pool api do it.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.0.13 |
|
#
08c9b61b |
| 13-Dec-2022 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Remove internal page_cache
This patch removes the internal rx page_cache and uses the generic page_pool api only. It used to be that the page_pool couldn't handle all the mlx5 driver
net/mlx5e: RX, Remove internal page_cache
This patch removes the internal rx page_cache and uses the generic page_pool api only. It used to be that the page_pool couldn't handle all the mlx5 driver usecases, but with the introduction of skb recycling and page fragmentaton in the page_pool full switch can now be made. Some benfits of this transition: * Better page recycling in the cases when the page_cache was suffering from head of queue blocking. The page_pool doesn't have this issue. * DMA mapping/unmapping can be managed by the page_pool. * mlx5e_rq size reduced by more than 50% due to the page_cache array being deleted.
This patch only removes the page_cache. Downstream patches will enable the required page_pool features and will add further fine-tuning.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
ca6ef9f0 |
| 14-Mar-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Store SHAMPO header pages in array
Save allocated SHAMPO header pages to an array to which the mlx5e_dma_info page will point to.
This change is a preparation for introducing mlx5e_f
net/mlx5e: RX, Store SHAMPO header pages in array
Save allocated SHAMPO header pages to an array to which the mlx5e_dma_info page will point to.
This change is a preparation for introducing mlx5e_frag_page structure in a downstream patch. There's no new functionality introduced.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
d39092ca |
| 30-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
This change removes the usage of mlx5e_alloc_unit union for striding rq. The change is more straightforward than legacy rq as the a
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq
This change removes the usage of mlx5e_alloc_unit union for striding rq. The change is more straightforward than legacy rq as the alloc units union is already in place.
This patch only moves things around: instead of an array of unions make it a union of arrays. This has the effect that each mlx5e_mpw_info will allocate the largest possible size of the array member. For xsk this means that the array of xdp_buff pointers for the wqe will still be contiguous, but there will be some extra unused bytes at the end of the array.
As further patch in the series will add the mlx5e_frag_page struct for which the described size constraint will no longer hold.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|