Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65 |
|
#
29322791 |
| 01-Sep-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: xsk: change batched Tx descriptor cleaning
AF_XDP Tx descriptor cleaning in ice driver currently works in a "lazy" way - descriptors are not cleaned immediately after send. We rather hold on wi
ice: xsk: change batched Tx descriptor cleaning
AF_XDP Tx descriptor cleaning in ice driver currently works in a "lazy" way - descriptors are not cleaned immediately after send. We rather hold on with cleaning until we see that free space in ring drops below particular threshold. This was supposed to reduce the amount of unnecessary work related to cleaning and instead of keeping the ring empty, ring was rather saturated.
In AF_XDP realm cleaning Tx descriptors implies producing them to CQ. This is a way of letting know user space that particular descriptor has been sent, as John points out in [0].
We tried to implement serial descriptor cleaning which would be used in conjunction with batched cleaning but it made code base more convoluted and probably harder to maintain in future. Therefore we step away from batched cleaning in a current form in favor of an approach where we set RS bit on every last descriptor from a batch and clean always at the beginning of ice_xmit_zc().
This means that we give up a bit of Tx performance, but this doesn't hurt l2fwd scenario which is way more meaningful than txonly as this can be treaten as AF_XDP based packet generator. l2fwd is not hurt due to the fact that Tx side is much faster than Rx and Rx is the one that has to catch Tx up.
FWIW Tx descriptors are still produced in a batched way.
[0]: https://lore.kernel.org/bpf/62b0a20232920_3573208ab@john.notmuch/
Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
114f398d |
| 19-Sep-2022 |
Larysa Zaremba <larysa.zaremba@intel.com> |
ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
The original patch added the static branch to handle the situation, when assigning an XDP TX queue to every CPU is not possible, so
ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
The original patch added the static branch to handle the situation, when assigning an XDP TX queue to every CPU is not possible, so they have to be shared.
However, in the XDP transmit handler ice_xdp_xmit(), an error was returned in such cases even before static condition was checked, thus making queue sharing still impossible.
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
f020481b |
| 27-Jul-2022 |
Jacob Keller <jacob.e.keller@intel.com> |
ice: track Tx timestamp stats similar to other Intel drivers
Several Intel networking drivers which support PTP track when Tx timestamps are skipped or when they timeout without a timestamp from har
ice: track Tx timestamp stats similar to other Intel drivers
Several Intel networking drivers which support PTP track when Tx timestamps are skipped or when they timeout without a timestamp from hardware. The conditions which could cause these events are rare, but it can be useful to know when and how often they occur.
Implement similar statistics for the ice driver, tx_hwtstamp_skipped, tx_hwtstamp_timeouts, and tx_hwtstamp_flushed.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.15.57, v5.15.56 |
|
#
01658aee |
| 18-Jul-2022 |
Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> |
ice: Fix tunnel checksum offload with fragmented traffic
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, wh
ice: Fix tunnel checksum offload with fragmented traffic
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic.
Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect
Offload params: td_offset, cd_tunnel_params were corrupted, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register.
Fixes: 69e66c04c672 ("ice: Add mpls+tso support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30 |
|
#
69e66c04 |
| 17-Mar-2022 |
Joe Damato <jdamato@fastly.com> |
ice: Add mpls+tso support
Attempt to add mpls+tso support.
I don't have ice hardware available to test myself, but I just implemented this feature in i40e and thought it might be useful to implemen
ice: Add mpls+tso support
Attempt to add mpls+tso support.
I don't have ice hardware available to test myself, but I just implemented this feature in i40e and thought it might be useful to implement for ice while this is fresh in my brain.
Hoping some one at intel will be able to test this on my behalf.
Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20 |
|
#
457a02f0 |
| 03-Feb-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: avoid XDP checks in ice_clean_tx_irq()
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced Tx IRQ cleaning routine dedicated for XDP rings. Currently it is impossible to call ice_
ice: avoid XDP checks in ice_clean_tx_irq()
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced Tx IRQ cleaning routine dedicated for XDP rings. Currently it is impossible to call ice_clean_tx_irq() against XDP ring, so it is safe to drop ice_ring_is_xdp() calls in there.
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
b03d519d |
| 16-Feb-2022 |
Jacob Keller <jacob.e.keller@intel.com> |
ice: store VF pointer instead of VF ID
The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated
ice: store VF pointer instead of VF ID
The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated with the VFs.
This API was designed with the idea that VFs are stored in a simple array that was expected to be static throughout most of the driver's life.
We plan on refactoring VF storage in a few key ways:
1) converting from a simple static array to a hash table 2) using krefs to track VF references obtained from the hash table 3) use RCU to delay release of VF memory until after all references are dropped
This is motivated by the goal to ensure that the lifetime of VF structures is accounted for, and prevent various use-after-free bugs.
With the existing vsi->vf_id, the reference tracking for VFs would become somewhat convoluted, because each VSI maintains a vf_id field which will then require performing a look up. This means all these flows will require reference tracking and proper usage of rcu_read_lock, etc.
We know that the VF VSI will always be backed by a valid VF structure, because the VSI is created during VF initialization and removed before the VF is destroyed. Rely on this and store a reference to the VF in the VSI structure instead of storing a VF ID. This will simplify the usage and avoid the need to perform lookups on the hash table in the future.
For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a vsi->vf pointer is valid when dealing with VF VSIs. This will aid in debugging code which violates this assumption and avoid more disastrous panics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
0d54d8f7 |
| 02-Dec-2021 |
Brett Creeley <brett.creeley@intel.com> |
ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads
Currently the driver only supports 802.1Q VLAN insertion and stripping. However, once Double VLAN Mode (DVM) is fully supported, then b
ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads
Currently the driver only supports 802.1Q VLAN insertion and stripping. However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q and 802.1ad VLAN insertion and stripping will be supported. Unfortunately the VSI context parameters only allow for one VLAN ethertype at a time for VLAN offloads so only one or the other VLAN ethertype offload can be supported at once.
To support this, multiple changes are needed.
Rx path changes:
[1] In DVM, the Rx queue context l2tagsel field needs to be cleared so the outermost tag shows up in the l2tag2_2nd field of the Rx flex descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain 1 to support SVM configurations.
[2] Modify the ice_test_staterr() function to take a __le16 instead of the ice_32b_rx_flex_desc union pointer so this function can be used for both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.
[3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.
[4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX is enabled in netdev->features. If it is, then this is the VLAN ethertype that needs to be added to the stripping VLAN tag. Since ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.
Tx path changes:
[1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was added to the list of tx_flags to handle this case.
[2] When the stack requests the VLAN tag to be offloaded on Tx, the driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1 respectively. To determine which location to use, set a bit in the Tx ring flags field during ring allocation that can be used to determine which field to use in the Tx descriptor. In DVM, always use l2tag2, and in SVM, always use l2tag1.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
ee803dca |
| 08-Dec-2021 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
ice: respect metadata in legacy-rx/ice_construct_skb()
In "legacy-rx" mode represented by ice_construct_skb(), we can still use XDP (and XDP metadata), but after XDP_PASS the metadata will be lost a
ice: respect metadata in legacy-rx/ice_construct_skb()
In "legacy-rx" mode represented by ice_construct_skb(), we can still use XDP (and XDP metadata), but after XDP_PASS the metadata will be lost as it doesn't get copied to the skb. Copy it along with the frame headers. Account its size on skb allocation, and when copying just treat it as a part of the frame and do a pull after to "move" it to the "reserved" zone. Point net_prefetch() to xdp->data_meta instead of data. This won't change anything when the meta is not here, but will save some cache misses otherwise.
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
126cdfe1 |
| 25-Jan-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: xsk: Improve AF_XDP ZC Tx and use batching API
Apply the logic that was done for regular XDP from commit 9610bd988df9 ("ice: optimize XDP_TX workloads") to the ZC side of the driver. On top of
ice: xsk: Improve AF_XDP ZC Tx and use batching API
Apply the logic that was done for regular XDP from commit 9610bd988df9 ("ice: optimize XDP_TX workloads") to the ZC side of the driver. On top of that, introduce batching to Tx that is inspired by i40e's implementation with adjustments to the cleaning logic - take into the account NAPI budget in ice_clean_xdp_irq_zc().
Separating the stats structs onto separate cache lines seemed to improve the performance.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-8-maciej.fijalkowski@intel.com
show more ...
|
#
86e3f78c |
| 25-Jan-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: xsk: Avoid potential dead AF_XDP Tx processing
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced @next_dd and @next_rs to ice_tx_ring struct. Currently, their state is not resto
ice: xsk: Avoid potential dead AF_XDP Tx processing
Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced @next_dd and @next_rs to ice_tx_ring struct. Currently, their state is not restored in ice_clean_tx_ring(), which was not causing any troubles as the XDP rings are gone after we're done with XDP prog on interface.
For upcoming usage of mentioned fields in AF_XDP, this might expose us to a potential dead Tx side. Scenario would look like following (based on xdpsock):
- two xdpsock instances are spawned in Tx mode - one of them is killed - XDP prog is kept on interface due to the other xdpsock still running * this means that XDP rings stayed in place - xdpsock is launched again on same queue id that was terminated on - @next_dd and @next_rs setting is bogus, therefore transmit side is broken
To protect us from the above, restore the initial @next_rs and @next_dd values when cleaning the Tx ring.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-7-maciej.fijalkowski@intel.com
show more ...
|
#
a4e18669 |
| 25-Jan-2022 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: Remove likely for napi_complete_done
Remove the likely before napi_complete_done as this is the unlikely case when busy-poll is used. Removing this has a positive performance impact for busy-po
ice: Remove likely for napi_complete_done
Remove the likely before napi_complete_done as this is the unlikely case when busy-poll is used. Removing this has a positive performance impact for busy-poll and no negative impact to the regular case.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220125160446.78976-2-maciej.fijalkowski@intel.com
show more ...
|
Revision tags: v5.15.6, v5.15.5 |
|
#
5ce66631 |
| 23-Nov-2021 |
Alexander Lobakin <alexandr.lobakin@intel.com> |
ice: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ice driver run
ice: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. ice driver runs Tx completion polling cycle right before the Rx one and uses napi_consume_skb() to feed the cache with skbuff_heads of completed entries, so it's never empty and always warm at that moment. Switch to the napi_build_skb() to relax mm pressure on heavy Rx.
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
617f3e1b |
| 13-Dec-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: xsk: allocate separate memory for XDP SW ring
Currently, the zero-copy data path is reusing the memory region that was initially allocated for an array of struct ice_rx_buf for its own purposes
ice: xsk: allocate separate memory for XDP SW ring
Currently, the zero-copy data path is reusing the memory region that was initially allocated for an array of struct ice_rx_buf for its own purposes. This is error prone as it is based on the ice_rx_buf struct always being the same size or bigger than what the zero-copy path needs. There can also be old values present in that array giving rise to errors when the zero-copy path uses it.
Fix this by freeing the ice_rx_buf region and allocating a new array for the zero-copy path that has the right length and is initialized to zero.
Fixes: 57f7f8b6bc0b ("ice: Use xdp_buf instead of rx_buf for xsk zero-copy") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15 |
|
#
9c99d099 |
| 25-Oct-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
ice: use modern kernel API for kick
The kernel gained a new interface for drivers to use to combine tail bump (doorbell) and BQL updates, attempt to use those new interfaces.
Signed-off-by: Jesse B
ice: use modern kernel API for kick
The kernel gained a new interface for drivers to use to combine tail bump (doorbell) and BQL updates, attempt to use those new interfaces.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
cc14db11 |
| 27-Oct-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
ice: use prefetch methods
The kernel provides some prefetch mechanisms to speed up commonly cold cache line accesses during receive processing. Since these are software structures it helps to have t
ice: use prefetch methods
The kernel provides some prefetch mechanisms to speed up commonly cold cache line accesses during receive processing. Since these are software structures it helps to have these strategically placed prefetches.
Be careful to call BQL prefetch complete only for non XDP queues.
Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
1c96c168 |
| 25-Oct-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
ice: update to newer kernel API
Use the netif_tx_* API from netdevice.h which has simpler parameters.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <guruchara
ice: update to newer kernel API
Use the netif_tx_* API from netdevice.h which has simpler parameters.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
c8064e5b |
| 30-Nov-2021 |
Paolo Abeni <pabeni@redhat.com> |
bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the ge
bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver.
Let's additionally include the program name and id, and the relevant device name.
If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
show more ...
|
Revision tags: v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10 |
|
#
6f332353 |
| 06-Oct-2021 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
ice: use devm_kcalloc() instead of devm_kzalloc()
Use 2-factor multiplication argument form devm_kcalloc() instead of devm_kzalloc().
Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: G
ice: use devm_kcalloc() instead of devm_kzalloc()
Use 2-factor multiplication argument form devm_kcalloc() instead of devm_kzalloc().
Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.14.9, v5.14.8, v5.14.7 |
|
#
23be7075 |
| 20-Sep-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
ice: fix software generating extra interrupts
The driver tried to work around missing completion events that occurred while interrupts are disabled, by triggering a software interrupt whenever we ex
ice: fix software generating extra interrupts
The driver tried to work around missing completion events that occurred while interrupts are disabled, by triggering a software interrupt whenever we exit polling (but we had to have polled at least once).
This was causing a *lot* of extra interrupts for some workloads like NVMe over TCP, which resulted in regressions in performance. It was also visible when polling didn't prevent interrupts when busy_poll was enabled.
Fix the extra interrupts by utilizing our previously unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second, and then trigger a software interrupt within that rate limit.
While here, slightly refactor the code to avoid an overwrite of a local variable in the case of wb_en = true.
Fixes: b7306b42beaf ("ice: manage interrupts during poll exit") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
d8eb7ad5 |
| 20-Sep-2021 |
Jesse Brandeburg <jesse.brandeburg@intel.com> |
ice: update dim usage and moderation
The driver was having trouble with unreliable latency when doing single threaded ping-pong tests. This was root caused to the DIM algorithm landing on a too slow
ice: update dim usage and moderation
The driver was having trouble with unreliable latency when doing single threaded ping-pong tests. This was root caused to the DIM algorithm landing on a too slow interrupt value, which caused high latency, and it was especially present when queues were being switched frequently by the scheduler as happens on default setups today.
In attempting to improve this, we allow the upper rate limit for interrupts to move to rate limit of 4 microseconds as a max, which means that no vector can generate more than 250,000 interrupts per second. The old config was up to 100,000. The driver previously tried to program the rate limit too frequently and if the receive and transmit side were both active on the same vector, the INTRL would be set incorrectly, and this change fixes that issue as a side effect of the redesign.
This driver will operate from now on with a slightly changed DIM table with more emphasis towards latency sensitivity by having more table entries with lower latency than with high latency (high being >= 64 microseconds).
The driver also resets the DIM algorithm state with a new stats set when there is no work done and the data becomes stale (older than 1 second), for the respective receive or transmit portion of the interrupt.
Add a new helper for setting rate limit, which will be used more in a followup patch.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
Revision tags: v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61 |
|
#
22bf877e |
| 19-Aug-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: introduce XDP_TX fallback path
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to
ice: introduce XDP_TX fallback path
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU.
Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path.
Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
9610bd98 |
| 19-Aug-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: optimize XDP_TX workloads
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled.
Introduce two ring fields, @next_dd and @ne
ice: optimize XDP_TX workloads
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled.
Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively.
Note that at this point the threshold is a constant (32), but it is something that we could make configurable.
First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32.
Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit.
Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring.
With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors.
This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows.
Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip>
As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32
with tx_busy staying calm.
When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64
it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
eb087cd8 |
| 19-Aug-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: propagate xdp_ring onto rx_ring
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access w
ice: propagate xdp_ring onto rx_ring
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame.
Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI.
Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial.
Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup.
[1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
#
a55e16fa |
| 19-Aug-2021 |
Maciej Fijalkowski <maciej.fijalkowski@intel.com> |
ice: do not create xdp_frame on XDP_TX
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver,
ice: do not create xdp_frame on XDP_TX
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|