Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78 |
|
#
57295e06 |
| 08-Nov-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Add memory statistics for domain object
Add counters for number of buddies that are currently in use per domain per buddy type (STE, MODIFY-HEADER, MODIFY-PATTERN).
Signed-off-by: Ere
net/mlx5: DR, Add memory statistics for domain object
Add counters for number of buddies that are currently in use per domain per buddy type (STE, MODIFY-HEADER, MODIFY-PATTERN).
Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
72b2cff6 |
| 08-Nov-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Calculate sync threshold of each pool according to its type
When certain ICM chunk is no longer needed, it needs to be freed. Fully freeing ICM memory involves issuing FW SYNC_STEERING
net/mlx5: DR, Calculate sync threshold of each pool according to its type
When certain ICM chunk is no longer needed, it needs to be freed. Fully freeing ICM memory involves issuing FW SYNC_STEERING command. This is very time consuming, and it is impractical to do it for every freed chunk. Instead, we manage these 'freed' chunks in hot list (list of chunks that are not required by SW any more, but HW might still access them). When size of the hot list reaches certain threshold, we purge it and issue SYNC_STEERING FW command. There is one threshold for all the different ICM types, which is not optimal, as different ICM types require different approach: STEs pool is very large, and it is very 'dynamic' in its nature, so letting hot list to become too large will result in a significant perf hiccup when purging the hot list. Modify action is much smaller and less dynamic, so we can let the hot list to grow to almost the size of the whole pool.
This patch fixes this problem: instead of having same hot memory threshold for all the pools, sync operation will be triggered in accordance with the ICM type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64 |
|
#
108ff821 |
| 29-Aug-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Add modify-header-pattern ICM pool
There is a new ICM area for that memory, so we need to handle it as we did for the others ICM types. The patch added that specific pool with its requ
net/mlx5: DR, Add modify-header-pattern ICM pool
There is a new ICM area for that memory, so we need to handle it as we did for the others ICM types. The patch added that specific pool with its requirements and management.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52 |
|
#
edaea001 |
| 30-Jun-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Remove the buddy used_list
No need to have the used_list - we don't need to keep track of the used chunks, we only need to know the amount of used memory.
Signed-off-by: Yevgeny Klite
net/mlx5: DR, Remove the buddy used_list
No need to have the used_list - we don't need to keep track of the used chunks, we only need to know the amount of used memory.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44 |
|
#
4519fc45 |
| 25-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list
When ICM chunk is freed, it might still be accessed by HW until we do sync with HW. This sync is expensive operation, so we don
net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list
When ICM chunk is freed, it might still be accessed by HW until we do sync with HW. This sync is expensive operation, so we don't do it often. Instead, when the chunk is freed, it is moved to the buddy's "hot memory" list. Once sync is done, we traverse the hot list and finally free all the chunks.
It appears that traversing a long list takes unusually long time due to cache misses on many entries, which causes a big "hiccup" during rule insertion.
This patch deals with this issue the following way: - Move hot chunks list from buddy to pool, so that the pool will keep track of all its hot memory. - Replace the list with pre-allocated array on the memory pool struct, and store only the information that is needed to later free this chunk in its buddy allocator. This cost additional memory for the array that is dynamically allocated, but it allows not to save long list of hot chunks, so at peak times it actually saves memory due to the fact that each array entry is much smaller than the chunk struct.
This way an overhead of traversing the long list is virtually removed: the loop of freeing hot chunks takes ~27 msec instead of ~70 msec, where most of it are the actual freeing activities.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
133ea373 |
| 26-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Lower sync threshold for ICM hot memory
Instead of hiding the math in the code, define a value that sets the fraction of allowed hot memory of ICM pool. Set the threshold for sync of I
net/mlx5: DR, Lower sync threshold for ICM hot memory
Instead of hiding the math in the code, define a value that sets the fraction of allowed hot memory of ICM pool. Set the threshold for sync of ICM hot chunks to 1/4 of the pool instead of 1/2 of the pool. Although we will have more syncs, each sync will be shorter and will help with insertion rate stability.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
fb628b71 |
| 25-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Allocate htbl from its own slab allocator
SW steering allocates/frees lots of htbl structs. Create a separate kmem_cache and allocate htbls from this allocator.
Signed-off-by: Yevgeny
net/mlx5: DR, Allocate htbl from its own slab allocator
SW steering allocates/frees lots of htbl structs. Create a separate kmem_cache and allocate htbls from this allocator.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
fd785e52 |
| 25-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Allocate icm_chunks from their own slab allocator
SW steering allocates/frees lots of icm_chunk structs. To make this more efficiently, create a separate kmem_cache and allocate these
net/mlx5: DR, Allocate icm_chunks from their own slab allocator
SW steering allocates/frees lots of icm_chunk structs. To make this more efficiently, create a separate kmem_cache and allocate these chunks from this allocator. By doing this we observe that the alloc/free "hiccups" frequency has become much lower, which allows for a more steady rule insersion rate.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
06ab4a40 |
| 25-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation
Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation.
Chu
net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation
Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation.
Chunk destruction tend to come in large batches (during pool syncing). To reduce the "hiccup" in such cases, moving ste_arrays init from chunk destruction to initialization.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
d277b55f |
| 26-May-2022 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy
Remove an argument that can be extracted in the function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Ve
net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy
Remove an argument that can be extracted in the function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18 |
|
#
f51bb517 |
| 27-Jan-2022 |
Rongwei Liu <rongweil@nvidia.com> |
net/mlx5: DR, Remove num_of_entries byte_size from struct mlx5_dr_icm_chunk
Target to reduce the memory consumption in large scale of flow rules.
They can be calculated quickly from buddy memory po
net/mlx5: DR, Remove num_of_entries byte_size from struct mlx5_dr_icm_chunk
Target to reduce the memory consumption in large scale of flow rules.
They can be calculated quickly from buddy memory pool. 1. num_of_entries calls dr_icm_pool_get_chunk_num_of_entries(). 2. byte_size calls dr_icm_pool_get_chunk_byte_size().
Use chunk size in dr_icm_chunk to speed up and the one in dr_ste_htbl will be removed in the upcoming commit.
This commit reduce 8 bytes from struct mlx5_dr_icm_chunk and its current size is 56 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
5c4f9b6e |
| 27-Jan-2022 |
Rongwei Liu <rongweil@nvidia.com> |
net/mlx5: DR, Remove icm_addr from mlx5dr_icm_chunk to reduce memory
It can be calculated quickly from buddy memory pool by function mlx5dr_icm_pool_get_chunk_icm_addr(). This function is very light
net/mlx5: DR, Remove icm_addr from mlx5dr_icm_chunk to reduce memory
It can be calculated quickly from buddy memory pool by function mlx5dr_icm_pool_get_chunk_icm_addr(). This function is very lightweight and straightforward.
Reduce 8 bytes and current size of struct mlx5_dr_icm_chunk is 64 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
003f4f9a |
| 27-Jan-2022 |
Rongwei Liu <rongweil@nvidia.com> |
net/mlx5: DR, Remove mr_addr rkey from struct mlx5dr_icm_chunk
Reduce memory footprint by removing mr_addr and rkey from mlx5_dr_icm_chunk. 1. mr_addr is calculated by mlx5dr_icm_pool_get_chunk_mr_a
net/mlx5: DR, Remove mr_addr rkey from struct mlx5dr_icm_chunk
Reduce memory footprint by removing mr_addr and rkey from mlx5_dr_icm_chunk. 1. mr_addr is calculated by mlx5dr_icm_pool_get_chunk_mr_addr() 2. rkey is calculated by mlx5dr_icm_pool_get_chunk_rkey() The two new functions are very lightweight and straightforward.
Reduce 8 bytes from struct mlx5_dr_icm_chunk, its current size is 72 bytes.
Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16 |
|
#
ecd9c5cd |
| 29-Dec-2021 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
When deciding whether to start syncing and actually free all the "hot" ICM chunks, we need to consider the type of the ICM ch
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
When deciding whether to start syncing and actually free all the "hot" ICM chunks, we need to consider the type of the ICM chunks that we're dealing with. For instance, the amount of available ICM for MODIFY_ACTION is significantly lower than the usual STE ICM, so the threshold should account for that - otherwise we can deplete MODIFY_ACTION memory just by creating and deleting the same modify header action in a continuous loop.
This patch replaces the hard-coded threshold with a dynamic value.
Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
e5b2bc30 |
| 23-Dec-2021 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Cache STE shadow memory
During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. This includes the hw_ste, dr_ste and miss list per entry. Sin
net/mlx5: DR, Cache STE shadow memory
During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. This includes the hw_ste, dr_ste and miss list per entry. Since the scale of these allocations is large we noticed a performance hiccup that happens once malloc and free are stressed. In extreme usecases when ~1M chunks are freed at once, it might take up to 40 seconds to complete this, up to the point the kernel sees this as self-detected stall on CPU:
rcu: INFO: rcu_sched self-detected stall on CPU
To resolve this we will increase the reuse of shadow memory. Doing this we see that a time in the aforementioned usecase dropped from ~40 seconds to ~8-10 seconds.
Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator") Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12 |
|
#
83fec3f1 |
| 12-Oct-2021 |
Aharon Landau <aharonl@nvidia.com> |
RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except for the key itself.
As preparation for moving mlx5_core_mkey to mlx5_ib, t
RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except for the key itself.
As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by a u32 key.
Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
show more ...
|
#
d4d18848 |
| 29-Dec-2021 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
commit ecd9c5cd46e013659e2fad433057bad1ba66888e upstream.
When deciding whether to start syncing and actually free all the "
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
commit ecd9c5cd46e013659e2fad433057bad1ba66888e upstream.
When deciding whether to start syncing and actually free all the "hot" ICM chunks, we need to consider the type of the ICM chunks that we're dealing with. For instance, the amount of available ICM for MODIFY_ACTION is significantly lower than the usual STE ICM, so the threshold should account for that - otherwise we can deplete MODIFY_ACTION memory just by creating and deleting the same modify header action in a continuous loop.
This patch replaces the hard-coded threshold with a dynamic value.
Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
117a5a7f |
| 23-Dec-2021 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Cache STE shadow memory
commit e5b2bc30c21139ae10f0e56989389d0bc7b7b1d6 upstream.
During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. Th
net/mlx5: DR, Cache STE shadow memory
commit e5b2bc30c21139ae10f0e56989389d0bc7b7b1d6 upstream.
During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. This includes the hw_ste, dr_ste and miss list per entry. Since the scale of these allocations is large we noticed a performance hiccup that happens once malloc and free are stressed. In extreme usecases when ~1M chunks are freed at once, it might take up to 40 seconds to complete this, up to the point the kernel sees this as self-detected stall on CPU:
rcu: INFO: rcu_sched self-detected stall on CPU
To resolve this we will increase the reuse of shadow memory. Doing this we see that a time in the aforementioned usecase dropped from ~40 seconds to ~8-10 seconds.
Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator") Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10 |
|
#
284836d9 |
| 14-Sep-2020 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Free unused buddy ICM memory
Track buddy's used ICM memory, and free it if all of the buddy's memory bacame unused. Do this only for STEs. MODIFY_ACTION buddies are much smaller, so in
net/mlx5: DR, Free unused buddy ICM memory
Track buddy's used ICM memory, and free it if all of the buddy's memory bacame unused. Do this only for STEs. MODIFY_ACTION buddies are much smaller, so in case there is a large amount of modify_header actions, which result in large amount of MODIFY_ACTION buddies, doing this cleanup during sync will result in performance hit while not freeing significant amount of memory.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
1c586514 |
| 14-Sep-2020 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, ICM memory pools sync optimization
Track the pool's hot ICM memory when freeing/allocating chunk, so that when checking if the sync is required, just check if the pool hot memory has r
net/mlx5: DR, ICM memory pools sync optimization
Track the pool's hot ICM memory when freeing/allocating chunk, so that when checking if the sync is required, just check if the pool hot memory has reached the sync threshold.
Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
3eb1006a |
| 14-Sep-2020 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Sync chunks only during free
When freeing chunks, we want to sync the steering so that all the "hot" memory will be written to ICM and all the chunks that are in the hot_list will be a
net/mlx5: DR, Sync chunks only during free
When freeing chunks, we want to sync the steering so that all the "hot" memory will be written to ICM and all the chunks that are in the hot_list will be actually destroyed. When allocating from the pool, we don't have a need to sync the steering, as we're not freeing anything, and sync might just hurt the performance in terms of flow-per-second offloaded.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
#
a00cd878 |
| 14-Sep-2020 |
Yevgeny Kliteynik <kliteyn@nvidia.com> |
net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
Till now in order to manage the ICM memory we used bucket mechanism, which kept a bucket per specified size (sizes were betwee
net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
Till now in order to manage the ICM memory we used bucket mechanism, which kept a bucket per specified size (sizes were between 1 block to 2^21 blocks).
Now changing that with buddy-system mechanism, which gives us much more flexible way to manage the ICM memory. Its biggest advantage over the bucket is by using the same ICM memory area for all the sizes of blocks, which reduces the memory consumption.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
show more ...
|
Revision tags: v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36 |
|
#
dff8e2d1 |
| 24-Apr-2020 |
Erez Shitrit <erezsh@mellanox.com> |
net/mlx5: Use aligned variable while allocating ICM memory
The alignment value is part of the input structure, so use it and spare extra memory allocation when is not needed. Now, using the new abil
net/mlx5: Use aligned variable while allocating ICM memory
The alignment value is part of the input structure, so use it and spare extra memory allocation when is not needed. Now, using the new ability when allocating icm for Direct-Rule insertion. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11 |
|
#
b7d0db55 |
| 12-Jan-2020 |
Erez Shitrit <erezsh@mellanox.com> |
net/mlx5: DR, Improve log messages
Few print messages are in debug level where they should be in error, and few messages are missing.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by
net/mlx5: DR, Improve log messages
Few print messages are in debug level where they should be in error, and few messages are missing.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
show more ...
|
Revision tags: v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3 |
|
#
8b6b82ad |
| 02-Oct-2019 |
Michal Kubecek <mkubecek@suse.cz> |
mlx5: avoid 64-bit division in dr_icm_pool_mr_create()
Recently added code introduces 64-bit division in dr_icm_pool_mr_create() so that build on 32-bit architectures fails with
ERROR: "__umoddi3
mlx5: avoid 64-bit division in dr_icm_pool_mr_create()
Recently added code introduces 64-bit division in dr_icm_pool_mr_create() so that build on 32-bit architectures fails with
ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
As the divisor is always a power of 2, we can use bitwise operation instead.
Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|