"4c2a1323" (history) in projects: openbmc

Searched hist:"4 c2a1323" (Results 1 – 4 of 4) sorted by relevance

/openbmc/linux/drivers/net/ethernet/mellanox/mlx5/core/en/
H A D	reporter_rx.c	4c2a1323 Tue Feb 14 04:01:40 CST 2023 Dragos Tatulea <dtatulea@nvidia.com> net/mlx5e: RX, Defer page release in striding rq for better recycling Currently, for striding RQ, fragmented pages from the page pool can get released in two ways: 1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true). 2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false). Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1. This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases: * On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose. * On rq shutdown, make sure that all wqes that were not in the linked list are released. For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %: +----------------------------------------------+ \| Page Pool stats (/sec) \| Before \| After \| +-------------------------+---------+----------+ \|rx_pp_alloc_fast \| 2137754 \| 2261033 \| \|rx_pp_alloc_slow \| 47 \| 9 \| \|rx_pp_alloc_empty \| 47 \| 9 \| \|rx_pp_alloc_refill \| 23230 \| 819 \| \|rx_pp_alloc_waive \| 0 \| 0 \| \|rx_pp_recycle_cached \| 672182 \| 2209015 \| \|rx_pp_recycle_cache_full \| 1789 \| 0 \| \|rx_pp_recycle_ring \| 1485848 \| 52259 \| \|rx_pp_recycle_ring_full \| 3003 \| 584 \| +----------------------------------------------+ With this patch, the performance in striding rq for the above test is back to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
H A D	txrx.h	4c2a1323 Tue Feb 14 04:01:40 CST 2023 Dragos Tatulea <dtatulea@nvidia.com> net/mlx5e: RX, Defer page release in striding rq for better recycling Currently, for striding RQ, fragmented pages from the page pool can get released in two ways: 1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true). 2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false). Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1. This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases: * On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose. * On rq shutdown, make sure that all wqes that were not in the linked list are released. For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %: +----------------------------------------------+ \| Page Pool stats (/sec) \| Before \| After \| +-------------------------+---------+----------+ \|rx_pp_alloc_fast \| 2137754 \| 2261033 \| \|rx_pp_alloc_slow \| 47 \| 9 \| \|rx_pp_alloc_empty \| 47 \| 9 \| \|rx_pp_alloc_refill \| 23230 \| 819 \| \|rx_pp_alloc_waive \| 0 \| 0 \| \|rx_pp_recycle_cached \| 672182 \| 2209015 \| \|rx_pp_recycle_cache_full \| 1789 \| 0 \| \|rx_pp_recycle_ring \| 1485848 \| 52259 \| \|rx_pp_recycle_ring_full \| 3003 \| 584 \| +----------------------------------------------+ With this patch, the performance in striding rq for the above test is back to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
/openbmc/linux/drivers/net/ethernet/mellanox/mlx5/core/
H A D	en_rx.c	4c2a1323 Tue Feb 14 04:01:40 CST 2023 Dragos Tatulea <dtatulea@nvidia.com> net/mlx5e: RX, Defer page release in striding rq for better recycling Currently, for striding RQ, fragmented pages from the page pool can get released in two ways: 1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true). 2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false). Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1. This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases: * On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose. * On rq shutdown, make sure that all wqes that were not in the linked list are released. For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %: +----------------------------------------------+ \| Page Pool stats (/sec) \| Before \| After \| +-------------------------+---------+----------+ \|rx_pp_alloc_fast \| 2137754 \| 2261033 \| \|rx_pp_alloc_slow \| 47 \| 9 \| \|rx_pp_alloc_empty \| 47 \| 9 \| \|rx_pp_alloc_refill \| 23230 \| 819 \| \|rx_pp_alloc_waive \| 0 \| 0 \| \|rx_pp_recycle_cached \| 672182 \| 2209015 \| \|rx_pp_recycle_cache_full \| 1789 \| 0 \| \|rx_pp_recycle_ring \| 1485848 \| 52259 \| \|rx_pp_recycle_ring_full \| 3003 \| 584 \| +----------------------------------------------+ With this patch, the performance in striding rq for the above test is back to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
H A D	en_main.c	4c2a1323 Tue Feb 14 04:01:40 CST 2023 Dragos Tatulea <dtatulea@nvidia.com> net/mlx5e: RX, Defer page release in striding rq for better recycling Currently, for striding RQ, fragmented pages from the page pool can get released in two ways: 1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true). 2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false). Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1. This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases: * On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose. * On rq shutdown, make sure that all wqes that were not in the linked list are released. For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %: +----------------------------------------------+ \| Page Pool stats (/sec) \| Before \| After \| +-------------------------+---------+----------+ \|rx_pp_alloc_fast \| 2137754 \| 2261033 \| \|rx_pp_alloc_slow \| 47 \| 9 \| \|rx_pp_alloc_empty \| 47 \| 9 \| \|rx_pp_alloc_refill \| 23230 \| 819 \| \|rx_pp_alloc_waive \| 0 \| 0 \| \|rx_pp_recycle_cached \| 672182 \| 2209015 \| \|rx_pp_recycle_cache_full \| 1789 \| 0 \| \|rx_pp_recycle_ring \| 1485848 \| 52259 \| \|rx_pp_recycle_ring_full \| 3003 \| 584 \| +----------------------------------------------+ With this patch, the performance in striding rq for the above test is back to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Project(s)

Full Search
Definition
Symbol
File Path
History
Type