History log of /openbmc/linux/include/net/ip6_fib.h (Results 1 – 25 of 1225)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.6.67
# 278002ed 15-Dec-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.66' into for/openbmc/dev-6.6

This is the 6.6.66 stable release


Revision tags: v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29
# 797a4c1f 26-Apr-2024 Eric Dumazet <edumazet@google.com>

ipv6: introduce dst_rt6_info() helper

[ Upstream commit e8dfd42c17faf183415323db1ef0c977be0d6489 ]

Instead of (struct rt6_info *)dst casts, we can use :

#define dst_rt6_info(_ptr) \
cont

ipv6: introduce dst_rt6_info() helper

[ Upstream commit e8dfd42c17faf183415323db1ef0c977be0d6489 ]

Instead of (struct rt6_info *)dst casts, we can use :

#define dst_rt6_info(_ptr) \
container_of_const(_ptr, struct rt6_info, dst)

Some places needed missing const qualifiers :

ip6_confirm_neigh(), ipv6_anycast_destination(),
ipv6_unicast_destination(), has_gateway()

v2: added missing parts (David Ahern)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3301ab7d5aeb ("net/ipv6: release expired exception dst cached in socket")
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23
# 3e7759b9 10-Feb-2024 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.9' into dev-6.6

This is the 6.6.9 stable release


Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8
# b577b9aa 18-Dec-2023 David Ahern <dsahern@kernel.org>

net/ipv6: Revert remove expired routes with a separated list of routes

[ Upstream commit dade3f6a1e4e35a5ae916d5e78b3229ec34c78ec ]

This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87.

Th

net/ipv6: Revert remove expired routes with a separated list of routes

[ Upstream commit dade3f6a1e4e35a5ae916d5e78b3229ec34c78ec ]

This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87.

The commit has some race conditions given how expires is managed on a
fib6_info in relation to gc start, adding the entry to the gc list and
setting the timer value leading to UAF. Revert the commit and try again
in a later release.

Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes")
Cc: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3
# c900529f 12-Sep-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Forwarding to v6.6-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 73be7fb1 07-Sep-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.

Current release - regres

Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.

Current release - regressions:

- eth: stmmac: fix failure to probe without MAC interface specified

Current release - new code bugs:

- docs: netlink: fix missing classic_netlink doc reference

Previous releases - regressions:

- deal with integer overflows in kmalloc_reserve()

- use sk_forward_alloc_get() in sk_get_meminfo()

- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc

- fib: avoid warn splat in flow dissector after packet mangling

- skb_segment: call zero copy functions before using skbuff frags

- eth: sfc: check for zero length in EF10 RX prefix

Previous releases - always broken:

- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT

- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()

- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release

- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU

- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t

- handshake: fix null-deref in handshake_nl_done_doit()

- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops

- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata

Misc:

- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations

- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"

* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...

show more ...


Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50
# 8aae7625 30-Aug-2023 Florian Westphal <fw@strlen.de>

net: fib: avoid warn splat in flow dissector

New skbs allocated via nf_send_reset() have skb->dev == NULL.

fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not pas

net: fib: avoid warn splat in flow dissector

New skbs allocated via nf_send_reset() have skb->dev == NULL.

fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not passed down to the flow dissector core, which
will then WARN as it can't derive a net namespace to use:

WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0
[..]
ip_route_me_harder+0x143/0x330
nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4]
nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet]
nft_do_chain+0x198/0x5d0 [nf_tables]
nft_do_chain_inet+0xa4/0x110 [nf_tables]
nf_hook_slow+0x41/0xc0
ip_local_deliver+0xce/0x110
..

Cc: Stanislav Fomichev <sdf@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Ido Schimmel <idosch@nvidia.com>
Fixes: 812fa71f0d96 ("netfilter: Dissect flow after packet mangling")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217826
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230830110043.30497-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


# 1ac731c5 30-Aug-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.6 merge window.


# bd6c11bc 29-Aug-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocat

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations

- Store netdevs in an xarray, to simplify iterating over netdevs

- Refactor nexthop selection for multipath routes

- Improve sched class lifetime handling

- Add backup nexthop ID support for bridge

- Implement drop reasons support in openvswitch

- Several data races annotations and fixes

- Constify the sk parameter of routing functions

- Prepend kernel version to netconsole message

Protocols:

- Implement support for TCP probing the peer being under memory
pressure

- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct

- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor

- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes

- In-kernel support for the TLS alert protocol

- Better support for UDP reuseport with connected sockets

- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size

- Get rid of additional ancillary per MPTCP connection struct socket

- Implement support for BPF-based MPTCP packet schedulers

- Format MPTCP subtests selftests results in TAP

- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation

BPF:

- Multi-buffer support in AF_XDP

- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds

- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it

- Add SO_REUSEPORT support for TC bpf_sk_assign

- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64

- Support defragmenting IPv(4|6) packets in BPF

- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling

- Introduce bpf map element count and enable it for all program types

- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

- Add uprobe support for the bpf_get_func_ip helper

- Check skb ownership against full socket

- Support for up to 12 arguments in BPF trampoline

- Extend link_info for kprobe_multi and perf_event links

Netfilter:

- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending

- Allow NLA_POLICY_MASK to be used with BE16/BE32 types

Driver API:

- Page pool optimizations, to improve data locality and cache usage

- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers

- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info

- Extend and use the yaml devlink specs to [re]generate the split ops

- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes

- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool

- Remove phylink legacy mode support

- Support offload LED blinking to phy

- Add devlink port function attributes for IPsec

New hardware / drivers:

- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy

- WiFi:
- MediaTek mt7981 support

- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips

- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925

Drivers:

- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution

- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support

- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers

- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support

- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs

- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support

- Connector:
- support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...

show more ...


Revision tags: v6.5, v6.1.49, v6.1.48, v6.1.46
# 950fe358 16-Aug-2023 David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-expired-routes'

Kui-Feng Lee says:

====================
Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walki

Merge branch 'ipv6-expired-routes'

Kui-Feng Lee says:

====================
Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Background
==========

The size of a Linux IPv6 routing table can become a big problem if not
managed appropriately. Now, Linux has a garbage collector to remove
expired routes periodically. However, this may lead to a situation in
which the routing path is blocked for a long period due to an
excessive number of routes.

For example, years ago, there is a commit c7bb4b89033b ("ipv6: tcp:
drop silly ICMPv6 packet too big messages"). The root cause is that
malicious ICMPv6 packets were sent back for every small packet sent to
them. These packets add routes with an expiration time that prompts
the GC to periodically check all routes in the tables, including
permanent ones.

Why Route Expires
=================

Users can add IPv6 routes with an expiration time manually. However,
the Neighbor Discovery protocol may also generate routes that can
expire. For example, Router Advertisement (RA) messages may create a
default route with an expiration time. [RFC 4861] For IPv4, it is not
possible to set an expiration time for a route, and there is no RA, so
there is no need to worry about such issues.

Create Routes with Expires
==========================

You can create routes with expires with the command.

For example,

ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \
dev enp0s3 expires 30

The route that has been generated will be deleted automatically in 30
seconds.

GC of FIB6
==========

The function called fib6_run_gc() is responsible for performing
garbage collection (GC) for the Linux IPv6 stack. It checks for the
expiration of every route by traversing the trees of routing
tables. The time taken to traverse a routing table increases with its
size. Holding the routing table lock during traversal is particularly
undesirable. Therefore, it is preferable to keep the lock for the
shortest possible duration.

Solution
========

The cause of the issue is keeping the routing table locked during the
traversal of large trees. To solve this problem, we can create a separate
list of routes that have expiration. This will prevent GC from checking
permanent routes.

Result
======

We conducted a test to measure the execution times of fib6_gc_timer_cb()
and observed that it enhances the GC of FIB6. During the test, we added
permanent routes with the following numbers: 1000, 3000, 6000, and
9000. Additionally, we added a route with an expiration time.

Here are the average execution times for the kernel without the patch.
- 120020 ns with 1000 permanent routes
- 308920 ns with 3000 ...
- 581470 ns with 6000 ...
- 855310 ns with 9000 ...

The kernel with the patch consistently takes around 14000 ns to execute,
regardless of the number of permanent routes that are installed.

Major changes from v7:

- Fix warings raised by the patchwork.

Major changes from v6:

- Remove unnecessary check of tb6 in fib6_clean_expires_locked().

- Use ib6_clean_expires_locked() instead in fib6_purge_rt().

Major changes from v5:

- Change the order of adding new routes to the GC list and starting
GC timer.

- Remove time measurements from the test case.

- Stop forcing GC flush.

Major changes from v4:

- Detect existence of 'strace' in the test case.

Major changes from v3:

- Fix the type of arg according to feedback.

- Add 1k temporary routes and 5K permanent routes in the test case.
Measure time spending on GC with strace.

Major changes from v2:

- Remove unnecessary and incorrect sysctl restoring in the test case.

Major changes from v1:

- Moved gc_link to avoid creating a hole in fib6_info.

- Moved fib6_set_expires*() and fib6_clean_expires*() to the header
file and inlined. And removed duplicated lines.

- Added a test case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 3dec89b1 15-Aug-2023 Kui-Feng Lee <thinker.li@gmail.com>

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is bi

net/ipv6: Remove expired routes with a separated list of routes.

FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree
can be expensive if the number of routes in a table is big, even if most of
them are permanent. Checking routes in a separated list of routes having
expiration will avoid this potential issue.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


Revision tags: v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39
# 50501936 17-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.4' into next

Sync up with mainline to bring in updates to shared infrastructure.


Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35
# db6da59c 15-Jun-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next-fixes

Backmerging to sync drm-misc-next-fixes with drm-misc-next.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.1.34
# 03c60192 12-Jun-2023 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patche

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

show more ...


Revision tags: v6.1.33
# 5c680050 06-Jun-2023 Miquel Raynal <miquel.raynal@bootlin.com>

Merge tag 'v6.4-rc4' into wpan-next/staging

Linux 6.4-rc4


# 9ff17e6b 05-Jun-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Merge drm/drm-next into drm-intel-gt-next

For conflict avoidance we need the following commit:

c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers

Signed-off-by: Tvrtko Ursulin <tvrtko

Merge drm/drm-next into drm-intel-gt-next

For conflict avoidance we need the following commit:

c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

show more ...


Revision tags: v6.1.32, v6.1.31, v6.1.30
# 9c3a985f 17-May-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to get some hwmon dependencies.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.1.29
# 50282fd5 12-May-2023 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-fixes into drm-misc-fixes

Let's bring 6.4-rc1 in drm-misc-fixes to start the new fix cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


Revision tags: v6.1.28
# ff32fcca 09-May-2023 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-next into drm-misc-next

Start the 6.5 release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


Revision tags: v6.1.27
# 6e98b09d 26-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Introduce a config option to tweak MAX_SKB_FRAGS. In

Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances

- Reduce compound page head access for zero-copy data transfers

- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible

- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance

- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking

- Add lockless accesses annotation to sk_err[_soft]

- Optimize again the skb struct layout

- Extends the skb drop reasons to make it usable by multiple
subsystems

- Better const qualifier awareness for socket casts

BPF:

- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses

- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward

- Add more precise memory usage reporting for all BPF map types

- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params

- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton

- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities

- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc

- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps

- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps

- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree

- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them

- Add ARM32 USDT support to libbpf

- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations

Protocols:

- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address

- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition

- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf

- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures

- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers

- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction

- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore

- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support

Netfilter:

- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged

- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support

- The iptables 32bit compat interface isn't compiled in by default
anymore

- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used

- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device

Driver API:

- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time

- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them

- Allow the page_pool to directly recycle the pages from safely
localized NAPI

- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization

- Add YNL support for user headers and struct attrs

- Add partial YNL specification for devlink

- Add partial YNL specification for ethtool

- Add tc-mqprio and tc-taprio support for preemptible traffic classes

- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device

- Add basic LED support for switch/phy

- Add NAPI documentation, stop relaying on external links

- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space

- Add transceiver support and improve the error messages for CAN-FD
controllers

New hardware / drivers:

- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY

- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset

- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997

- Can:
- STMicroelectronics bxcan stm32f429

Drivers:

- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP

- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame

- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates

- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E

- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware

- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets

- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)

- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility

- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"

* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...

show more ...


Revision tags: v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22
# 9380d891 29-Mar-2023 David S. Miller <davem@davemloft.net>

Merge branch 'in6addr_any-cleanups'

Kuniyuki Iwashima says:

====================
ipv6: Random cleanup for in6addr_any.

The first patch removes in6addr_any alternatives and the second
removes redun

Merge branch 'in6addr_any-cleanups'

Kuniyuki Iwashima says:

====================
ipv6: Random cleanup for in6addr_any.

The first patch removes in6addr_any alternatives and the second
removes redundant initialisation of a local variable.

Changes:
v2: Use ipv6_addr_any() in patch 1. (David Ahern)

v1: https://lore.kernel.org/netdev/20230322012204.33157-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 8cdc3223 27-Mar-2023 Kuniyuki Iwashima <kuniyu@amazon.com>

ipv6: Remove in6addr_any alternatives.

Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() inste

ipv6: Remove in6addr_any alternatives.

Some code defines the IPv6 wildcard address as a local variable and
use it with memcmp() or ipv6_addr_equal().

Let's use in6addr_any and ipv6_addr_any() instead.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

show more ...


# 2600badf 28-Mar-2023 Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-refcount-address-dst_entry-reference-count-scalability-issues'

Thomas Gleixner says:

====================
net, refcount: Address dst_entry reference count scalability issues

This

Merge branch 'net-refcount-address-dst_entry-reference-count-scalability-issues'

Thomas Gleixner says:

====================
net, refcount: Address dst_entry reference count scalability issues

This is version 3 of this series. Version 2 can be found here:

https://lore.kernel.org/lkml/20230307125358.772287565@linutronix.de

Wangyang and Arjan reported a bottleneck in the networking code related to
struct dst_entry::__refcnt. Performance tanks massively when concurrency on
a dst_entry increases.

This happens when there are a large amount of connections to or from the
same IP address. The memtier benchmark when run on the same host as
memcached amplifies this massively. But even over real network connections
this issue can be observed at an obviously smaller scale (due to the
network bandwith limitations in my setup, i.e. 1Gb). How to reproduce:

Run memcached with -t $N and memtier_benchmark with -t $M and --ratio=1:100
on the same machine. localhost connections amplify the problem.

Start with the defaults for $N and $M and increase them. Depending on
your machine this will tank at some point. But even in reasonably small
$N, $M scenarios the refcount operations and the resulting false sharing
fallout becomes visible in perf top. At some point it becomes the
dominating issue.

There are two factors which make this reference count a scalability issue:

1) False sharing

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts
it into a seperate cacheline vs. the read mostly members located at
the beginning of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

dst_entry::lwtstate

which is located after the reference count and in the same cache
line. This member is read after a reference count has been acquired.

The other problem is struct rtable, which embeds a struct dst_entry
at offset 0. struct dst_entry has a size of 112 bytes, which means
that the struct members of rtable which follow the dst member share
the same cache line as dst_entry::__refcnt. Especially

rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall and contribute to the performance
problem.

2) atomic_inc_not_zero()

A reference on dst_entry:__refcnt is acquired via
atomic_inc_not_zero() and released via atomic_dec_return().

atomic_inc_not_zero() is implemted via a atomic_try_cmpxchg() loop,
which exposes O(N^2) behaviour under contention with N concurrent
operations. Contention scalability is degrading with even a small
amount of contenders and gets worse from there.

Lightweight instrumentation exposed an average of 8!! retry loops per
atomic_inc_not_zero() invocation in a inc()/dec() loop running
concurrently on 112 CPUs.

There is nothing which can be done to make atomic_inc_not_zero() more
scalable.

The following series addresses these issues:

1) Reorder and pad struct dst_entry to prevent the false sharing.

2) Implement and use a reference count implementation which avoids the
atomic_inc_not_zero() problem.

It is slightly less performant in the case of the final 0 -> -1
transition, but the deconstruction of these objects is a low
frequency event. get()/put() pairs are in the hotpath and that's
what this implementation optimizes for.

The algorithm of this reference count is only suitable for RCU
managed objects. Therefore it cannot replace the refcount_t
algorithm, which is also based on atomic_inc_not_zero(), due to a
subtle race condition related to the 0 -> -1 transition and the final
verdict to mark the reference count dead. See details in patch 2/3.

It might be just my lack of imagination which declares this to be
impossible and I'd be happy to be proven wrong.

As a bonus the new rcuref implementation provides underflow/overflow
detection and mitigation while being performance wise on par with
open coded atomic_inc_not_zero() / atomic_dec_return() pairs even in
the non-contended case.

The combination of these two changes results in performance gains in micro
benchmarks and also localhost and networked memtier benchmarks talking to
memcached. It's hard to quantify the benchmark results as they depend
heavily on the micro-architecture and the number of concurrent operations.

The overall gain of both changes for localhost memtier ranges from 1.2X to
3.2X and from +2% to %5% range for networked operations on a 1Gb connection.

A micro benchmark which enforces maximized concurrency shows a gain between
1.2X and 4.7X!!!

Obviously this is focussed on a particular problem and therefore needs to
be discussed in detail. It also requires wider testing outside of the cases
which this is focussed on.

Though the false sharing issue is obvious and should be addressed
independent of the more focussed reference count changes.

The series is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rcuref

Changes vs. V2:

- Rename __refcnt to __rcuref (Linus)

- Fix comments and changelogs (Mark, Qiuxu)

- Fixup kernel doc of generated atomic_add_negative() variants

I want to say thanks to Wangyang who analyzed the issue and provided the
initial fix for the false sharing problem. Further thanks go to Arjan
Peter, Marc, Will and Borislav for valuable input and providing test
results on machines which I do not have access to, and to Linus and
Eric, Qiuxu and Mark for helpful feedback.
====================

Link: https://lore.kernel.org/r/20230323102649.764958589@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# d288a162 23-Mar-2023 Wangyang Guo <wangyang.guo@intel.com>

net: dst: Prevent false sharing vs. dst_entry:: __refcnt

dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_

net: dst: Prevent false sharing vs. dst_entry:: __refcnt

dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13
# 4f2c0a4a 13-Dec-2022 Nick Terrell <terrelln@fb.com>

Merge branch 'main' into zstd-linus


12345678910>>...49