History log of /openbmc/linux/drivers/net/ethernet/intel/ice/ice_xsk.c (Results 1 – 25 of 530)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52
# ca2478a7 12-Sep-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.51' into for/openbmc/dev-6.6

This is the 6.6.51 stable release


Revision tags: v6.6.51, v6.6.50, v6.6.49, v6.6.48
# 2f057db2 23-Aug-2024 Larysa Zaremba <larysa.zaremba@intel.com>

ice: protect XDP configuration with a mutex

[ Upstream commit 2504b8405768a57a71e660dbfd5abd59f679a03f ]

The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can

ice: protect XDP configuration with a mutex

[ Upstream commit 2504b8405768a57a71e660dbfd5abd59f679a03f ]

The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.

XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked

With an unfortunate timing, such accesses can result in a crash such as the
one below:

[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>

The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.

There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().

Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().

Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open

The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.

Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.

Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
consistent by deconfiguring VSI when coalesce allocation fails.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.47, v6.6.46
# 0db00e5d 11-Aug-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.45' into for/openbmc/dev-6.6

This is the 6.6.45 stable release


Revision tags: v6.6.45, v6.6.44, v6.6.43
# ac2a3c75 26-Jul-2024 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: replace synchronize_rcu with synchronize_net

[ Upstream commit 405d9999aa0b4ae467ef391d1d9c7e0d30ad0841 ]

Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can
be called ins

ice: replace synchronize_rcu with synchronize_net

[ Upstream commit 405d9999aa0b4ae467ef391d1d9c7e0d30ad0841 ]

Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can
be called instead of synchronize_rcu() so that XDP rings can finish its
job in a faster way. Also let us do this as earlier in XSK queue disable
flow.

Additionally, turn off regular Tx queue before disabling irqs and NAPI.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 9016d17f 26-Jul-2024 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: don't busy wait for Rx queue disable in ice_qp_dis()

[ Upstream commit 1ff72a2f67791cd4ddad19ed830445f57b30e992 ]

When ice driver is spammed with multiple xdpsock instances and flow
control is

ice: don't busy wait for Rx queue disable in ice_qp_dis()

[ Upstream commit 1ff72a2f67791cd4ddad19ed830445f57b30e992 ]

When ice driver is spammed with multiple xdpsock instances and flow
control is enabled, there are cases when Rx queue gets stuck and unable
to reflect the disable state in QRX_CTRL register. Similar issue has
previously been addressed in commit 13a6233b033f ("ice: Add support to
enable/disable all Rx queues before waiting").

To workaround this, let us simply not wait for a disabled state as later
patch will make sure that regardless of the encountered error in the
process of disabling a queue pair, the Rx queue will be enabled.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 77292f93 26-Jul-2024 Michal Kubiak <michal.kubiak@intel.com>

ice: respect netif readiness in AF_XDP ZC related ndo's

[ Upstream commit ec145a18687fec8dd97eeb4f30057fa4debef577 ]

Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx
ring when l

ice: respect netif readiness in AF_XDP ZC related ndo's

[ Upstream commit ec145a18687fec8dd97eeb4f30057fa4debef577 ]

Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx
ring when link is either not yet fully initialized or process of
stopping the netdev has already started. To avoid this, add checks
against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc().
One could argue that bailing out early in ice_xsk_wakeup() would be
sufficient but given the fact that we produce Tx descriptors on behalf
of NAPI that is triggered for Rx traffic, the latter is also needed.

Bringing link up is an asynchronous event executed within
ice_service_task so even though interface has been brought up there is
still a time frame where link is not yet ok.

Without this patch, when AF_XDP ZC Tx is used simultaneously with stack
Tx, Tx timeouts occur after going through link flap (admin brings
interface down then up again). HW seem to be unable to transmit
descriptor to the wire after HW tail register bump which in turn causes
bit __QUEUE_STATE_STACK_XOFF to be set forever as
netdev_tx_completed_queue() sees no cleaned bytes on the input.

Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36
# 6c71a057 23-Jun-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.35' into dev-6.6

This is the 6.6.35 stable release


Revision tags: v6.6.35, v6.6.34, v6.6.33
# eab834ac 03-Jun-2024 Larysa Zaremba <larysa.zaremba@intel.com>

ice: remove af_xdp_zc_qps bitmap

[ Upstream commit adbf5a42341f6ea038d3626cd4437d9f0ad0b2dd ]

Referenced commit has introduced a bitmap to distinguish between ZC and
copy-mode AF_XDP queues, becaus

ice: remove af_xdp_zc_qps bitmap

[ Upstream commit adbf5a42341f6ea038d3626cd4437d9f0ad0b2dd ]

Referenced commit has introduced a bitmap to distinguish between ZC and
copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this
for us.

The bitmap would be especially useful when restoring previous state after
rebuild, if only it was not reallocated in the process. This leads to e.g.
xdpsock dying after changing number of queues.

Instead of preserving the bitmap during the rebuild, remove it completely
and distinguish between ZC and copy-mode queues based on the presence of
a device associated with the pool.

Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23
# f8b04488 17-Mar-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.22' into dev-6.6

Linux 6.6.22


# 4c0b028e 20-Feb-2024 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: reorder disabling IRQ and NAPI in ice_qp_dis

[ Upstream commit 99099c6bc75a30b76bb5d6774a0509ab6f06af05 ]

ice_qp_dis() currently does things in very mixed way. Tx is stopped
before disabling I

ice: reorder disabling IRQ and NAPI in ice_qp_dis

[ Upstream commit 99099c6bc75a30b76bb5d6774a0509ab6f06af05 ]

ice_qp_dis() currently does things in very mixed way. Tx is stopped
before disabling IRQ on related queue vector, then it takes care of
disabling Rx and finally NAPI is disabled.

Let us start with disabling IRQs in the first place followed by turning
off NAPI. Then it is safe to handle queues.

One subtle change on top of that is that even though ice_qp_ena() looks
more sane, clear ICE_CFG_BUSY as the last thing there.

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# 7d7ae873 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.15' into dev-6.6

This is the 6.6.15 stable release


Revision tags: v6.6.16, v6.6.15, v6.6.14
# e1ae4a6b 24-Jan-2024 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers

[ Upstream commit 290779905d09d5fdf6caa4f58ddefc3f4db0c0a9 ]

Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info

intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers

[ Upstream commit 290779905d09d5fdf6caa4f58ddefc3f4db0c0a9 ]

Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info to 0, which is incorrect. xdp_buffs that come from
xsk_buff_pool always have 256 bytes of a headroom, so they need to be
taken into account to retrieve xdp_buff::data via skb_frag_address().
Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from
xdp_buff::data_hard_start which would result in overwriting existing
payload.

Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# b6e1a1b3 24-Jan-2024 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags

[ Upstream commit f7f6aa8e24383fbb11ac55942e66da9660110f80 ]

XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
us

xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags

[ Upstream commit f7f6aa8e24383fbb11ac55942e66da9660110f80 ]

XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.

ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.

Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.

To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.

Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3
# c900529f 12-Sep-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Forwarding to v6.6-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.5.2, v6.1.51, v6.5.1
# 1ac731c5 30-Aug-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.6 merge window.


Revision tags: v6.1.50
# bd6c11bc 29-Aug-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocat

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations

- Store netdevs in an xarray, to simplify iterating over netdevs

- Refactor nexthop selection for multipath routes

- Improve sched class lifetime handling

- Add backup nexthop ID support for bridge

- Implement drop reasons support in openvswitch

- Several data races annotations and fixes

- Constify the sk parameter of routing functions

- Prepend kernel version to netconsole message

Protocols:

- Implement support for TCP probing the peer being under memory
pressure

- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct

- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor

- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes

- In-kernel support for the TLS alert protocol

- Better support for UDP reuseport with connected sockets

- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size

- Get rid of additional ancillary per MPTCP connection struct socket

- Implement support for BPF-based MPTCP packet schedulers

- Format MPTCP subtests selftests results in TAP

- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation

BPF:

- Multi-buffer support in AF_XDP

- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds

- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it

- Add SO_REUSEPORT support for TC bpf_sk_assign

- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64

- Support defragmenting IPv(4|6) packets in BPF

- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling

- Introduce bpf map element count and enable it for all program types

- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

- Add uprobe support for the bpf_get_func_ip helper

- Check skb ownership against full socket

- Support for up to 12 arguments in BPF trampoline

- Extend link_info for kprobe_multi and perf_event links

Netfilter:

- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending

- Allow NLA_POLICY_MASK to be used with BE16/BE32 types

Driver API:

- Page pool optimizations, to improve data locality and cache usage

- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers

- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info

- Extend and use the yaml devlink specs to [re]generate the split ops

- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes

- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool

- Remove phylink legacy mode support

- Support offload LED blinking to phy

- Add devlink port function attributes for IPsec

New hardware / drivers:

- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy

- WiFi:
- MediaTek mt7981 support

- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips

- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925

Drivers:

- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution

- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support

- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers

- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support

- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs

- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support

- Connector:
- support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...

show more ...


Revision tags: v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44
# 2612e3bb 07-Aug-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo V

Merge drm/drm-next into drm-intel-next

Catching-up with drm-next and drm-intel-gt-next.
It will unblock a code refactor around the platform
definitions (names vs acronyms).

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# 9f771739 07-Aug-2023 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/1

Merge drm/drm-next into drm-intel-gt-next

Need to pull in b3e4aae612ec ("drm/i915/hdcp: Modify hdcp_gsc_message msg sending mechanism") as
a dependency for https://patchwork.freedesktop.org/series/121735/

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


Revision tags: v6.1.43, v6.1.42, v6.1.41
# 61b73694 24-Jul-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.5-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.1.40
# e93165d5 19-Jul-2023 Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-19

We've added 45 non-merge comm

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-19

We've added 45 non-merge commits during the last 3 day(s) which contain
a total of 71 files changed, 7808 insertions(+), 592 deletions(-).

The main changes are:

1) multi-buffer support in AF_XDP, from Maciej Fijalkowski,
Magnus Karlsson, Tirthendu Sarkar.

2) BPF link support for tc BPF programs, from Daniel Borkmann.

3) Enable bpf_map_sum_elem_count kfunc for all program types,
from Anton Protopopov.

4) Add 'owner' field to bpf_rb_node to fix races in shared ownership,
Dave Marchevsky.

5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
bpf, net: Introduce skb_pointer_if_linear().
bpf: sync tools/ uapi header with
selftests/bpf: Add mprog API tests for BPF tcx links
selftests/bpf: Add mprog API tests for BPF tcx opts
bpftool: Extend net dump with tcx progs
libbpf: Add helper macro to clear opts structs
libbpf: Add link-based API for tcx
libbpf: Add opts-based attach/detach/query API for tcx
bpf: Add fd-based tcx multi-prog infra with link support
bpf: Add generic attach/detach/query API for multi-progs
selftests/xsk: reset NIC settings to default after running test suite
selftests/xsk: add test for too many frags
selftests/xsk: add metadata copy test for multi-buff
selftests/xsk: add invalid descriptor test for multi-buffer
selftests/xsk: add unaligned mode test for multi-buffer
selftests/xsk: add basic multi-buffer test
selftests/xsk: transmit and receive multi-buffer packets
xsk: add multi-buffer documentation
i40e: xsk: add TX multi-buffer support
ice: xsk: Tx multi-buffer support
...
====================

Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 3226e313 19-Jul-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'xsk-multi-buffer-support'

Maciej Fijalkowski says:

====================
xsk: multi-buffer support

v6->v7:
- rebase...[Alexei]

v5->v6:
- update bpf_xdp_query_opts__last_field in patc

Merge branch 'xsk-multi-buffer-support'

Maciej Fijalkowski says:

====================
xsk: multi-buffer support

v6->v7:
- rebase...[Alexei]

v5->v6:
- update bpf_xdp_query_opts__last_field in patch 10 [Alexei]

v4->v5:
- align options argument size to match options from xdp_desc [Benjamin]
- cleanup skb from xdp_sock on socket termination [Toke]
- introduce new netlink attribute for letting user space know about Tx
frag limit; this substitutes xdp_features flag previously dedicated
for setting ZC multi-buffer support [Toke, Jakub]
- include i40e ZC multi-buffer support
- enable TOO_MANY_FRAGS for ZC on xskxceiver; this is now possible due
to netlink attribute mentioned two bullets above

v3->v4:
-rely on ynl for adding new xdp_features flag [Jakub]
- move xskb_list to xsk_buff_pool

v2->v3:
- Fix issue with the next valid packet getting dropped after an invalid
packet with MAX_SKB_FRAGS + 1 frags [Magnus]
- query NETDEV_XDP_ACT_ZC_SG flag within xskxceiver and act on it
- remove redundant include in xsk.c [kernel test robot]
- s/NETDEV_XDP_ACT_NDO_ZC_SG/NETDEV_XDP_ACT_ZC_SG + kernel doc [Magnus,
Simon]

v1->v2:
- fix spelling issues in commit messages [Simon]
- remove XSK_DESC_MAX_FRAGS, use MAX_SKB_FRAGS instead [Stan, Alexei]
- add documentation patch
- fix build error from kernel test robot on patch 10

This series of patches add multi-buffer support for AF_XDP. XDP and
various NIC drivers already have support for multi-buffer packets. With
this patch set, programs using AF_XDP sockets can now also receive and
transmit multi-buffer packets both in copy as well as zero-copy mode.
ZC multi-buffer implementation is based on ice driver.

Some definitions to put us all on the same page:

* A packet consists of one or more frames

* A descriptor in one of the AF_XDP rings always refers to a single
frame. In the case the packet consists of a single frame, the
descriptor refers to the whole packet.

To represent a packet consisting of multiple frames, we introduce a
new flag called XDP_PKT_CONTD in the options field of the Rx and Tx
descriptors. If it is true (1) the packet continues with the next
descriptor and if it is false (0) it means this is the last descriptor
of the packet. Why the reverse logic of end-of-packet (eop) flag found
in many NICs? Just to preserve compatibility with non-multi-buffer
applications that have this bit set to false for all packets on Rx, and
the apps set the options field to zero for Tx, as anything else will
be treated as an invalid descriptor.

These are the semantics for producing packets onto XSK Tx ring
consisting of multiple frames:

* When an invalid descriptor is found, all the other
descriptors/frames of this packet are marked as invalid and not
completed. The next descriptor is treated as the start of a new
packet, even if this was not the intent (because we cannot guess
the intent). As before, if your program is producing invalid
descriptors you have a bug that must be fixed.

* Zero length descriptors are treated as invalid descriptors.

* For copy mode, the maximum supported number of frames in a packet is
equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all
descriptors accumulated so far are dropped and treated as
invalid. To produce an application that will work on any system
regardless of this config setting, limit the number of frags to 18,
as the minimum value of the config is 17.

* For zero-copy mode, the limit is up to what the NIC HW
supports. User space can discover this via newly introduced
NETDEV_A_DEV_XDP_ZC_MAX_SEGS netlink attribute.

Here is an example Tx path pseudo-code (using libxdp interfaces for
simplicity) ignoring that the umem is finite in size, and that we
eventually will run out of packets to send. Also assumes pkts.addr
points to a valid location in the umem.

void tx_packets(struct xsk_socket_info *xsk, struct pkt *pkts,
int batch_size)
{
u32 idx, i, pkt_nb = 0;

xsk_ring_prod__reserve(&xsk->tx, batch_size, &idx);

for (i = 0; i < batch_size;) {
u64 addr = pkts[pkt_nb].addr;
u32 len = pkts[pkt_nb].size;

do {
struct xdp_desc *tx_desc;

tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i++);
tx_desc->addr = addr;

if (len > xsk_frame_size) {
tx_desc->len = xsk_frame_size;
tx_desc->options |= XDP_PKT_CONTD;
} else {
tx_desc->len = len;
tx_desc->options = 0;
pkt_nb++;
}
len -= tx_desc->len;
addr += xsk_frame_size;

if (i == batch_size) {
/* Remember len, addr, pkt_nb for next
* iteration. Skipped for simplicity.
*/
break;
}
} while (len);
}

xsk_ring_prod__submit(&xsk->tx, i);
}

On the Rx path in copy mode, the xsk core copies the XDP data into
multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as
detailed before. Zero-copy mode in order to avoid the copies has to
maintain a chain of xdp_buff_xsk structs that represent whole packet.
This is because what actually is redirected is the xdp_buff and we
currently have no equivalent mechanism that is used for copy mode
(embedded skb_shared_info in xdp_buff) to carry the frags. This means
xdp_buff_xsk grows in size but these members are at the end and should
not be touched when data path is not dealing with fragmented packets.
This solution kept us within assumed performance impact, hence we
decided to proceed with it.

When the application gets a descriptor with the
XDP_PKT_CONTD flag set to one, it means that the packet consists of
multiple buffers and it continues with the next buffer in the following
descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it
means that this is the last buffer of the packet. AF_XDP guarantees that
only a complete packet (all frames in the packet) is sent to the
application.

If application reads a batch of descriptors, using for example the libxdp
interfaces, it is not guaranteed that the batch will end with a full
packet. It might end in the middle of a packet and the rest of the
buffers of that packet will arrive at the beginning of the next batch,
since the libxdp interface does not read the whole ring (unless you
have an enormous batch size or a very small ring size).

Here is a simple Rx path pseudo-code example (using libxdp interfaces for
simplicity). Error paths have been excluded for simplicity:

void rx_packets(struct xsk_socket_info *xsk)
{
static bool new_packet = true;
u32 idx_rx = 0, idx_fq = 0;
static char *pkt;

int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);

xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);

for (int i = 0; i < rcvd; i++) {
struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++);
char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr);
bool eop = !(desc->options & XDP_PKT_CONTD);

if (new_packet)
pkt = frag;
else
add_frag_to_pkt(pkt, frag);

if (eop)
process_pkt(pkt);

new_packet = eop;

*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr;
}

xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
xsk_ring_cons__release(&xsk->rx, rcvd);
}

We had to introduce a new bind flag (XDP_USE_SG) on the AF_XDP level to
enable multi-buffer support. The reason we need to differentiate between
non multi-buffer and multi-buffer is the behaviour when the kernel gets
a packet that is larger than the frame size. Without multi-buffer, this
packet is dropped and marked in the stats. With multi-buffer on, we want
to split it up into multiple frames instead.

At the start, we thought that riding on the .frags section name of
the XDP program was a good idea. You do not have to introduce yet
another flag and all AF_XDP users must load an XDP program anyway
to get any traffic up to the socket, so why not just say that the XDP
program decides if the AF_XDP socket should get multi-buffer packets
or not? The problem is that we can create an AF_XDP socket that is Tx
only and that works without having to load an XDP program at
all. Another problem is that the XDP program might change during the
execution, so we would have to check this for every single packet.

Here is the observed throughput when compared to a codebase without any
multi-buffer changes and measured with xdpsock for 64B packets.
Apparently ZC Tx takes a hit from explicit zero length descriptors
validation. Overall, in terms of ZC performance, there is a room for
improvement, but for now we think this work is in a good shape in terms
of correctness and functionality. We were targetting for up to 5%
overhead though. Note that ZC performance drops come from core + driver
support being combined, whereas copy mode had already driver support in
place.

Mode rxdrop l2fwd txonly
ice-zc -4% -7% -6%
i40e-zc -7% -6% -7%
drv -1.2% 0% +2%
skb -0.6% -1% +2%

Thank you,
Tirthendu, Magnus and Maciej
====================

Link: https://lore.kernel.org/r/20230719132421.584801-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


Revision tags: v6.1.39
# eeb2b538 19-Jul-2023 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: xsk: Tx multi-buffer support

Most of this patch is about actually supporting XDP_TX action. Pure Tx
ZC support is only about looking at XDP_PKT_CONTD presence at options
field and based on that

ice: xsk: Tx multi-buffer support

Most of this patch is about actually supporting XDP_TX action. Pure Tx
ZC support is only about looking at XDP_PKT_CONTD presence at options
field and based on that generating EOP bit on Tx HW descriptor. This is
that simple due to the implementation on
xsk_tx_peek_release_desc_batch() where we are making sure that last
produced descriptor is an EOP one.

Overwrite xdp_zc_max_segs with a value that defines max scatter-gatter
count on Tx side that HW can handle.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-16-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 1bbc04de 19-Jul-2023 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: xsk: add RX multi-buffer support

This support is strongly inspired by work that introduced multi-buffer
support to regular Rx data path in ice. There are some differences,
though. When adding a

ice: xsk: add RX multi-buffer support

This support is strongly inspired by work that introduced multi-buffer
support to regular Rx data path in ice. There are some differences,
though. When adding a frag, besides adding it to skb_shared_info, use
also fresh xsk_buff_add_frag() helper. Reason for doing both things is
that we can not rule out the fact that AF_XDP pipeline could use XDP
program that needs to access frame fragments. Without them being in
skb_shared_info it will not be possible. Another difference is that
XDP_PASS has to allocate a new pages for each frags and copy contents
from memory backed by xsk_buff_pool.

chain_len that is used for programming HW Rx descriptors no longer has
to be limited to 1 when xsk_pool is present - remove this restriction.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230719132421.584801-13-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 50501936 17-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.4' into next

Sync up with mainline to bring in updates to shared infrastructure.


# 0791faeb 17-Jul-2023 Mark Brown <broonie@kernel.org>

ASoC: Merge v6.5-rc2

Get a similar baseline to my other branches, and fixes for people using
the branch.


12345678910>>...22