History log of /openbmc/linux/kernel/bpf/bpf_cgrp_storage.c (Results 1 – 25 of 38)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# d37cf9b6 27-Feb-2025 Andrew Jeffery <andrew@codeconstruct.com.au>

Merge tag 'v6.6.80' into for/openbmc/dev-6.6

This is the 6.6.80 stable release

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmfAV+kACgkQONu9yGCS
# aT6Zmg/8CttkpI/

Merge tag 'v6.6.80' into for/openbmc/dev-6.6

This is the 6.6.80 stable release

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmfAV+kACgkQONu9yGCS
# aT6Zmg/8CttkpI/jr/fq7mVbxHBmz06uZndOgjCt4Qocoy06hIYrV/zKCLXqEHSo
# yPzPu6p57fQsJXKZbC6K3H4ZCjkArxT9xXy08jxxYmV091OhZi7fo+otAWgNu2ht
# A8cjRkO4MjQ7WfWlPYL8FBuRgv2aPplhCxAtnQy+Lzos3Mk9PyIcgKKGeDQJ/XcE
# ecIUxR8ftzAb6X0KDYcM0PhXtKTKZwAo1MiHa18TaXz3cIHmvYfTEOk7om7rtPC/
# n7WxJSM9i0NC0AcIpddlX9e6HqOgjk3sKOlj+aHPin2sl9cWgEKixIsb/PJmWVvi
# zNPgfX2MFjX0287K0+TmTPy2rM07bnpv9e0iJDkbc411fEYb2XPxjvp4wPjf5Io3
# 6qOj9OrgRPfTama54j6Tis4cxAtk2gppOA8rxC16okMG39CwEEAbaiZvUezqK4ym
# m8fiKccYMCgYxPXVgn0vLSnbzb+zXv0SB9rKuImZstOiiKbVz+xxBTh+0GFezhto
# TFzmjEBdh2LYbtQRi/TWn+e6nEjE+LJZgmiq886NRSr9wNAGheYLKSRC1mzwyRKJ
# yiA+ue7d1wB+clE+cAcbdVF5YSosV5TggH09Wq64O/pEemC9F2bNWL+oJXODf3NG
# JYkjr7FzD1Ka/oCwDJeloDQqwUc4q5cyS+Fb3fK/zrAJZJ2YGTM=
# =2PCM
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 27 Feb 2025 22:47:45 ACDT
# gpg: using RSA key 647F28654894E3BD457199BE38DBBDC86092693E
# gpg: Good signature from "Greg Kroah-Hartman <gregkh@kernel.org>" [marginal]
# gpg: gregkh@kernel.org: Verified 9 signatures in the past 5 weeks. Encrypted
# 0 messages.
# gpg: Warning: you have yet to encrypt a message to this key!
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 647F 2865 4894 E3BD 4571 99BE 38DB BDC8 6092 693E

show more ...


Revision tags: v6.6.80, v6.6.79, v6.6.78, v6.6.77, v6.6.76, v6.6.75, v6.6.74, v6.6.73, v6.6.72, v6.6.71, v6.12.9, v6.6.70, v6.12.8, v6.6.69, v6.12.7, v6.6.68
# 6ecb9fa1 21-Dec-2024 Abel Wu <wuyun.abel@bytedance.com>

bpf: Fix deadlock when freeing cgroup storage

[ Upstream commit c78f4afbd962f43a3989f45f3ca04300252b19b5 ]

The following commit
bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_

bpf: Fix deadlock when freeing cgroup storage

[ Upstream commit c78f4afbd962f43a3989f45f3ca04300252b19b5 ]

The following commit
bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
first introduced deadlock prevention for fentry/fexit programs attaching
on bpf_task_storage helpers. That commit also employed the logic in map
free path in its v6 version.

Later bpf_cgrp_storage was first introduced in
c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
which faces the same issue as bpf_task_storage, instead of its busy
counter, NULL was passed to bpf_local_storage_map_free() which opened
a window to cause deadlock:

<TASK>
(acquiring local_storage->lock)
_raw_spin_lock_irqsave+0x3d/0x50
bpf_local_storage_update+0xd1/0x460
bpf_cgrp_storage_get+0x109/0x130
bpf_prog_a4d4a370ba857314_cgrp_ptr+0x139/0x170
? __bpf_prog_enter_recur+0x16/0x80
bpf_trampoline_6442485186+0x43/0xa4
cgroup_storage_ptr+0x9/0x20
(holding local_storage->lock)
bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
bpf_selem_unlink_storage+0x6f/0x110
bpf_local_storage_map_free+0xa2/0x110
bpf_map_free_deferred+0x5b/0x90
process_one_work+0x17c/0x390
worker_thread+0x251/0x360
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>

Progs:
- A: SEC("fentry/cgroup_storage_ptr")
- cgid (BPF_MAP_TYPE_HASH)
Record the id of the cgroup the current task belonging
to in this hash map, using the address of the cgroup
as the map key.
- cgrpa (BPF_MAP_TYPE_CGRP_STORAGE)
If current task is a kworker, lookup the above hash
map using function parameter @owner as the key to get
its corresponding cgroup id which is then used to get
a trusted pointer to the cgroup through
bpf_cgroup_from_id(). This trusted pointer can then
be passed to bpf_cgrp_storage_get() to finally trigger
the deadlock issue.
- B: SEC("tp_btf/sys_enter")
- cgrpb (BPF_MAP_TYPE_CGRP_STORAGE)
The only purpose of this prog is to fill Prog A's
hash map by calling bpf_cgrp_storage_get() for as
many userspace tasks as possible.

Steps to reproduce:
- Run A;
- while (true) { Run B; Destroy B; }

Fix this issue by passing its busy counter to the free procedure so
it can be properly incremented before storage/smap locking.

Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241221061018.37717-1-wuyun.abel@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


Revision tags: v6.12.6, v6.6.67, v6.12.5, v6.6.66, v6.6.65, v6.12.4, v6.6.64, v6.12.3, v6.12.2, v6.6.63, v6.12.1, v6.12, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1
# 1ac731c5 30-Aug-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.6 merge window.


Revision tags: v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39
# 50501936 17-Jul-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.4' into next

Sync up with mainline to bring in updates to shared infrastructure.


Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35
# db6da59c 15-Jun-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next-fixes

Backmerging to sync drm-misc-next-fixes with drm-misc-next.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.1.34
# 03c60192 12-Jun-2023 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patche

Merge branch 'drm-next' of git://anongit.freedesktop.org/drm/drm into msm-next-lumag-base

Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

show more ...


Revision tags: v6.1.33
# 5c680050 06-Jun-2023 Miquel Raynal <miquel.raynal@bootlin.com>

Merge tag 'v6.4-rc4' into wpan-next/staging

Linux 6.4-rc4


# 9ff17e6b 05-Jun-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Merge drm/drm-next into drm-intel-gt-next

For conflict avoidance we need the following commit:

c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers

Signed-off-by: Tvrtko Ursulin <tvrtko

Merge drm/drm-next into drm-intel-gt-next

For conflict avoidance we need the following commit:

c9a9f18d3ad8 drm/i915/huc: use const struct bus_type pointers

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

show more ...


Revision tags: v6.1.32, v6.1.31, v6.1.30
# 9c3a985f 17-May-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to get some hwmon dependencies.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.1.29
# 50282fd5 12-May-2023 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-fixes into drm-misc-fixes

Let's bring 6.4-rc1 in drm-misc-fixes to start the new fix cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


Revision tags: v6.1.28
# ff32fcca 09-May-2023 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-next into drm-misc-next

Start the 6.5 release cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


Revision tags: v6.1.27
# 6e98b09d 26-Apr-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Introduce a config option to tweak MAX_SKB_FRAGS. In

Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core:

- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances

- Reduce compound page head access for zero-copy data transfers

- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible

- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance

- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking

- Add lockless accesses annotation to sk_err[_soft]

- Optimize again the skb struct layout

- Extends the skb drop reasons to make it usable by multiple
subsystems

- Better const qualifier awareness for socket casts

BPF:

- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses

- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward

- Add more precise memory usage reporting for all BPF map types

- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params

- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton

- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities

- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc

- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps

- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps

- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree

- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them

- Add ARM32 USDT support to libbpf

- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations

Protocols:

- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address

- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition

- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf

- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures

- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers

- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction

- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore

- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support

Netfilter:

- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged

- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support

- The iptables 32bit compat interface isn't compiled in by default
anymore

- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used

- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device

Driver API:

- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time

- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them

- Allow the page_pool to directly recycle the pages from safely
localized NAPI

- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization

- Add YNL support for user headers and struct attrs

- Add partial YNL specification for devlink

- Add partial YNL specification for ethtool

- Add tc-mqprio and tc-taprio support for preemptible traffic classes

- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device

- Add basic LED support for switch/phy

- Add NAPI documentation, stop relaying on external links

- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space

- Add transceiver support and improve the error messages for CAN-FD
controllers

New hardware / drivers:

- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY

- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset

- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997

- Can:
- STMicroelectronics bxcan stm32f429

Drivers:

- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP

- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame

- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates

- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E

- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware

- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets

- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)

- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility

- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"

* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...

show more ...


Revision tags: v6.1.26, v6.3, v6.1.25
# c2865b11 13-Apr-2023 Jakub Kicinski <kuba@kernel.org>

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-13

We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-13

We've added 260 non-merge commits during the last 36 day(s) which contain
a total of 356 files changed, 21786 insertions(+), 11275 deletions(-).

The main changes are:

1) Rework BPF verifier log behavior and implement it as a rotating log
by default with the option to retain old-style fixed log behavior,
from Andrii Nakryiko.

2) Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params, from Christian Ehrig.

3) Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton,
from Alexei Starovoitov.

4) Optimize hashmap lookups when key size is multiple of 4,
from Anton Protopopov.

5) Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps, from David Vernet.

6) Add support for stashing local BPF kptr into a map value via
bpf_kptr_xchg(). This is useful e.g. for rbtree node creation
for new cgroups, from Dave Marchevsky.

7) Fix BTF handling of is_int_ptr to skip modifiers to work around
tracing issues where a program cannot be attached, from Feng Zhou.

8) Migrate a big portion of test_verifier unit tests over to
test_progs -a verifier_* via inline asm to ease {read,debug}ability,
from Eduard Zingerman.

9) Several updates to the instruction-set.rst documentation
which is subject to future IETF standardization
(https://lwn.net/Articles/926882/), from Dave Thaler.

10) Fix BPF verifier in the __reg_bound_offset's 64->32 tnum sub-register
known bits information propagation, from Daniel Borkmann.

11) Add skb bitfield compaction work related to BPF with the overall goal
to make more of the sk_buff bits optional, from Jakub Kicinski.

12) BPF selftest cleanups for build id extraction which stand on its own
from the upcoming integration work of build id into struct file object,
from Jiri Olsa.

13) Add fixes and optimizations for xsk descriptor validation and several
selftest improvements for xsk sockets, from Kal Conley.

14) Add BPF links for struct_ops and enable switching implementations
of BPF TCP cong-ctls under a given name by replacing backing
struct_ops map, from Kui-Feng Lee.

15) Remove a misleading BPF verifier env->bypass_spec_v1 check on variable
offset stack read as earlier Spectre checks cover this,
from Luis Gerhorst.

16) Fix issues in copy_from_user_nofault() for BPF and other tracers
to resemble copy_from_user_nmi() from safety PoV, from Florian Lehner
and Alexei Starovoitov.

17) Add --json-summary option to test_progs in order for CI tooling to
ease parsing of test results, from Manu Bretelle.

18) Batch of improvements and refactoring to prep for upcoming
bpf_local_storage conversion to bpf_mem_cache_{alloc,free} allocator,
from Martin KaFai Lau.

19) Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations,
from Quentin Monnet.

20) Fix attaching fentry/fexit/fmod_ret/lsm to modules by extracting
the module name from BTF of the target and searching kallsyms of
the correct module, from Viktor Malik.

21) Improve BPF verifier handling of '<const> <cond> <non_const>'
to better detect whether in particular jmp32 branches are taken,
from Yonghong Song.

22) Allow BPF TCP cong-ctls to write app_limited of struct tcp_sock.
A built-in cc or one from a kernel module is already able to write
to app_limited, from Yixin Shen.

Conflicts:

Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
0f10f647f455 ("bpf, docs: Use internal linking for link to netdev subsystem doc")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

include/net/ip_tunnels.h
bc9d003dc48c3 ("ip_tunnel: Preserve pointer const in ip_tunnel_info_opts")
ac931d4cdec3d ("ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices")
https://lore.kernel.org/all/20230413161235.4093777-1-broonie@kernel.org/

net/bpf/test_run.c
e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
294635a8165a ("bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES")
https://lore.kernel.org/all/20230320102619.05b80a98@canb.auug.org.au/
====================

Link: https://lore.kernel.org/r/20230413191525.7295-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


Revision tags: v6.1.24, v6.1.23
# e8f59d84 04-Apr-2023 Andrii Nakryiko <andrii@kernel.org>

Merge branch 'bpf: Follow up to RCU enforcement in the verifier.'

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

The patch set is addressing a fallout fro

Merge branch 'bpf: Follow up to RCU enforcement in the verifier.'

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

The patch set is addressing a fallout from
commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.")
It was too aggressive with PTR_UNTRUSTED marks.
Patches 1-6 are cleanup and adding verifier smartness to address real
use cases in bpf programs that broke with too aggressive PTR_UNTRUSTED.
The partial revert is done in patch 7 anyway.
====================

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

show more ...


# 91571a51 03-Apr-2023 Alexei Starovoitov <ast@kernel.org>

bpf: Teach verifier that certain helpers accept NULL pointer.

bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers
perform run-time check that sk|inode|task|cgrp point

bpf: Teach verifier that certain helpers accept NULL pointer.

bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers
perform run-time check that sk|inode|task|cgrp pointer != NULL.
Teach verifier about this fact and allow bpf programs to pass
PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers.
It will be used in the subsequent patch that will do
bpf_sk_storage_get(.., skb->sk, ...);
Even when 'skb' pointer is trusted the 'sk' pointer may be NULL.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com

show more ...


Revision tags: v6.1.22
# 8d275960 25-Mar-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage'

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

This set is a continuation of the

Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage'

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

This set is a continuation of the effort in using
bpf_mem_cache_alloc/free in bpf_local_storage [1]

Major change is only using bpf_mem_alloc for task and cgrp storage
while sk and inode stay with kzalloc/kfree. The details is
in patch 2.

[1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/

v3:
- Only use bpf_mem_alloc for task and cgrp storage.
- sk and inode storage stay with kzalloc/kfree.
- Check NULL and add comments in bpf_mem_cache_raw_free() in patch 1.
- Added test and benchmark for task storage.

v2:
- Added bpf_mem_cache_alloc_flags() and bpf_mem_cache_raw_free()
to hide the internal data structure of the bpf allocator.
- Fixed a typo bug in bpf_selem_free()
- Simplified the test_local_storage test by directly using
err returned from libbpf
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 08a7ce38 22-Mar-2023 Martin KaFai Lau <martin.lau@kernel.org>

bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem

This patch uses bpf_mem_alloc for the task and cgroup local storage that
the bpf prog can easily get a hold of the storage owner's PTR_TO_

bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem

This patch uses bpf_mem_alloc for the task and cgroup local storage that
the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID.
eg. bpf_get_current_task_btf() can be used in some of the kmalloc code
path which will cause deadlock/recursion. bpf_mem_cache_alloc is
deadlock free and will solve a legit use case in [1].

For sk storage, its batch creation benchmark shows a few percent
regression when the sk create/destroy batch size is larger than 32.
The sk creation/destruction happens much more often and
depends on external traffic. Considering it is hypothetical
to be able to cause deadlock with sk storage, it can cross
the bridge to use bpf_mem_alloc till a legit (ie. useful)
use case comes up.

For inode storage, bpf_local_storage_destroy() is called before
waiting for a rcu gp and its memory cannot be reused immediately.
inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.

A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc().
Only task and cgroup storage have 'bpf_ma == true' which
means to use bpf_mem_cache_alloc/free(). This patch only changes
selem to use bpf_mem_alloc for task and cgroup. The next patch
will change the local_storage to use bpf_mem_alloc also for
task and cgroup.

Here is some more details on the changes:

* memory allocation:
After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because
bpf_mem_cache_alloc() could return a reused selem. It is to keep
the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data
is zero-ed. SDATA(selem)->data is the visible part to the bpf prog.
No need to use zero_map_value() to do the zeroing because
bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using
the selem before returning the selem through bpf_mem_cache_free().
For the internal fields of selem, they will be initialized when
linking to the new smap and the new local_storage.

When 'bpf_ma == false', nothing changes in this patch. It will
stay with the bpf_map_kzalloc().

* memory free:
The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle
the bpf_ma == true case.

For the common selem free path where its owner is also being destroyed,
the mem is freed in bpf_local_storage_destroy(), the owner (task
and cgroup) has gone through a rcu gp. The memory can be reused
immediately, so bpf_local_storage_destroy() will call
bpf_selem_free(..., reuse_now = true) which will do
bpf_mem_cache_free() for immediate reuse consideration.

An exception is the delete elem code path. The delete elem code path
is called from the helper bpf_*_storage_delete() and the syscall
bpf_map_delete_elem(). This path is an unusual case for local
storage because the common use case is to have the local storage
staying with its owner life time so that the bpf prog and the user
space does not have to monitor the owner's destruction. For the delete
elem path, the selem cannot be reused immediately because there could
be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false)
and it will wait for a rcu tasks trace gp before freeing the elem. The
rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().

When 'bpf_ma == false', it should be the same as before.
__bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu().
A few words on the 'reuse_now == true'. When 'reuse_now == true',
it is still racing with bpf_local_storage_map_free which is under rcu
protection, so it still needs to wait for a rcu gp instead of kfree().
Otherwise, the selem may be reused by slab for a totally different struct
while the bpf_local_storage_map_free() is still using it (as a
rcu reader). For the inode case, there may be other rcu readers also.
In short, when bpf_ma == false and reuse_now == true => vanilla rcu.

[1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/

Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 02adf9e9 22-Mar-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'error checking where helpers call bpf_map_ops'

JP Kobryn says:

====================

Within bpf programs, the bpf helper functions can make inline calls to
kernel functions. In this s

Merge branch 'error checking where helpers call bpf_map_ops'

JP Kobryn says:

====================

Within bpf programs, the bpf helper functions can make inline calls to
kernel functions. In this scenario there can be a disconnect between the
register the kernel function writes a return value to and the register the
bpf program uses to evaluate that return value.

As an example, this bpf code:

long err = bpf_map_update_elem(...);
if (err && err != -EEXIST)
// got some error other than -EEXIST

...can result in the bpf assembly:

; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
37: movabs $0xffff976a10730400,%rdi
41: mov $0x1,%ecx
46: call 0xffffffffe103291c ; htab_map_update_elem
; if (err && err != -EEXIST) {
4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
4f: je 0x000000000000008e
51: test %rax,%rax
54: je 0x000000000000008e

The compare operation here evaluates %rax, while in the preceding call to
htab_map_update_elem the corresponding assembly returns -EEXIST via %eax
(the lower 32 bits of %rax):

movl $0xffffffef, %r9d
...
movl %r9d, %eax

...since it's returning int (32-bit). So the resulting comparison becomes:

cmp $0xffffffffffffffef, $0x00000000ffffffef

...making it not possible to check for negative errors or specific errors,
since the sign value is left at the 32nd bit. It means in the original
example, the conditional branch will be entered even when the error is
-EEXIST, which was not intended.

The selftests added cover these cases for the different bpf_map_ops
functions. When the second patch is applied, changing the return type of
those functions to long, the comparison works as intended and the tests
pass.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# d7ba4cc9 22-Mar-2023 JP Kobryn <inwardvessel@gmail.com>

bpf: return long from bpf_map_ops funcs

This patch changes the return types of bpf_map_ops functions to long, where
previously int was returned. Using long allows for bpf programs to maintain
the si

bpf: return long from bpf_map_ops funcs

This patch changes the return types of bpf_map_ops functions to long, where
previously int was returned. Using long allows for bpf programs to maintain
the sign bit in the absence of sign extension during situations where
inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
error is returned.

The definitions of the helper funcs are generated from comments in the bpf
uapi header at `include/uapi/linux/bpf.h`. The return type of these
helpers was previously changed from int to long in commit bdb7b79b4ce8. For
any case where one of the map helpers call the bpf_map_ops funcs that are
still returning 32-bit int, a compiler might not include sign extension
instructions to properly convert the 32-bit negative value a 64-bit
negative value.

For example:
bpf assembly excerpt of an inlined helper calling a kernel function and
checking for a specific error:

; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
...
46: call 0xffffffffe103291c ; htab_map_update_elem
; if (err && err != -EEXIST) {
4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax

kernel function assembly excerpt of return value from
`htab_map_update_elem` returning 32-bit int:

movl $0xffffffef, %r9d
...
movl %r9d, %eax

...results in the comparison:
cmp $0xffffffffffffffef, $0x00000000ffffffef

Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20230322194754.185781-3-inwardvessel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16
# a47eabf2 08-Mar-2023 Martin KaFai Lau <martin.lau@kernel.org>

bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage

This patch re-purpose the use_trace_rcu to mean
if the freed memory can be reused immediately or not.
The use_trace_rcu is renamed to r

bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage

This patch re-purpose the use_trace_rcu to mean
if the freed memory can be reused immediately or not.
The use_trace_rcu is renamed to reuse_now. Other than
the boolean test is reversed, it should be a no-op.

The following explains the reason for the rename and how it will
be used in a later patch.

In a later patch, bpf_mem_cache_alloc/free will be used
in the bpf_local_storage. The bpf mem allocator will reuse
the freed memory immediately. Some of the free paths in
bpf_local_storage does not support memory to be reused immediately.
These paths are the "delete" elem cases from the bpf_*_storage_delete()
helper and the map_delete_elem() syscall. Note that "delete" elem
before the owner's (sk/task/cgrp/inode) lifetime ended is not
the common usage for the local storage.

The common free path, bpf_local_storage_destroy(), can reuse the
memory immediately. This common path means the storage stays with
its owner until the owner is destroyed.

The above mentioned "delete" elem paths that cannot
reuse immediately always has the 'use_trace_rcu == true'.
The cases that is safe for immediate reuse always have
'use_trace_rcu == false'. Instead of adding another arg
in a later patch, this patch re-purpose this arg
to reuse_now and have the test logic reversed.

In a later patch, 'reuse_now == true' will free to the
bpf_mem_cache_free() where the memory can be reused
immediately. 'reuse_now == false' will go through the
call_rcu_tasks_trace().

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 2ffcb6fc 08-Mar-2023 Martin KaFai Lau <martin.lau@kernel.org>

bpf: Refactor codes into bpf_local_storage_destroy

This patch first renames bpf_local_storage_unlink_nolock to
bpf_local_storage_destroy(). It better reflects that it is only
used when the storage's

bpf: Refactor codes into bpf_local_storage_destroy

This patch first renames bpf_local_storage_unlink_nolock to
bpf_local_storage_destroy(). It better reflects that it is only
used when the storage's owner (sk/task/cgrp/inode) is being kfree().

All bpf_local_storage_destroy's caller is taking the spin lock and
then free the storage. This patch also moves these two steps into
the bpf_local_storage_destroy.

This is a preparation work for a later patch that uses
bpf_mem_cache_alloc/free in the bpf_local_storage.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# ed69e066 08-Mar-2023 Jakub Kicinski <kuba@kernel.org>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Andrii Nakryiko says:

====================
pull-request: bpf-next 2023-03-08

We've added 23 non-merge commits during the last 2 d

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Andrii Nakryiko says:

====================
pull-request: bpf-next 2023-03-08

We've added 23 non-merge commits during the last 2 day(s) which contain
a total of 28 files changed, 414 insertions(+), 104 deletions(-).

The main changes are:

1) Add more precise memory usage reporting for all BPF map types,
from Yafang Shao.

2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.

3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF,
from Nathan Chancellor.

4) IMA selftests fix, from Roberto Sassu.

5) libbpf fix in APK support code, from Daniel Müller.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Fix IMA test
libbpf: USDT arm arg parsing support
libbpf: Refactor parse_usdt_arg() to re-use code
libbpf: Fix theoretical u32 underflow in find_cd() function
bpf: enforce all maps having memory usage callback
bpf: offload map memory usage
bpf, net: xskmap memory usage
bpf, net: sock_map memory usage
bpf, net: bpf_local_storage memory usage
bpf: local_storage memory usage
bpf: bpf_struct_ops memory usage
bpf: queue_stack_maps memory usage
bpf: devmap memory usage
bpf: cpumap memory usage
bpf: bloom_filter memory usage
bpf: ringbuf memory usage
bpf: reuseport_array memory usage
bpf: stackmap memory usage
bpf: arraymap memory usage
bpf: hashtab memory usage
...
====================

Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# a73dc912 07-Mar-2023 Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf: bpf memory usage'

Yafang Shao says:

====================

Currently we can't get bpf memory usage reliably either from memcg or
from bpftool.

In memcg, there's not a 'bpf' item

Merge branch 'bpf: bpf memory usage'

Yafang Shao says:

====================

Currently we can't get bpf memory usage reliably either from memcg or
from bpftool.

In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel',
'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With
these items we still can't get the bpf memory usage, because bpf memory
usage may far less than the kmem in a memcg, for example, the dentry may
consume lots of kmem.

bpftool now shows the bpf memory footprint, which is difference with bpf
memory usage. The difference can be quite great in some cases, for example,

- non-preallocated bpf map
The non-preallocated bpf map memory usage is dynamically changed. The
allocated elements count can be from 0 to the max entries. But the
memory footprint in bpftool only shows a fixed number.

- bpf metadata consumes more memory than bpf element
In some corner cases, the bpf metadata can consumes a lot more memory
than bpf element consumes. For example, it can happen when the element
size is quite small.

- some maps don't have key, value or max_entries
For example the key_size and value_size of ringbuf is 0, so its
memlock is always 0.

We need a way to show the bpf memory usage especially there will be more
and more bpf programs running on the production environment and thus the
bpf memory usage is not trivial.

This patchset introduces a new map ops ->map_mem_usage to calculate the
memory usage. Note that we don't intend to make the memory usage 100%
accurate, while our goal is to make sure there is only a small difference
between what bpftool reports and the real memory. That small difference
can be ignored compared to the total usage. That is enough to monitor
the bpf memory usage. For example, the user can rely on this value to
monitor the trend of bpf memory usage, compare the difference in bpf
memory usage between different bpf program versions, figure out which
maps consume large memory, and etc.

This patchset implements the bpf memory usage for all maps, and yet there's
still work to do. We don't want to introduce runtime overhead in the
element update and delete path, but we have to do it for some
non-preallocated maps,
- devmap, xskmap
When we update or delete an element, it will allocate or free memory.
In order to track this dynamic memory, we have to track the count in
element update and delete path.

- cpumap
The element size of each cpumap element is not determinated. If we
want to track the usage, we have to count the size of all elements in
the element update and delete path. So I just put it aside currently.

- local_storage, bpf_local_storage
When we attach or detach a cgroup, it will allocate or free memory. If
we want to track the dynamic memory, we also need to do something in
the update and delete path. So I just put it aside currently.

- offload map
The element update and delete of offload map is via the netdev dev_ops,
in which it may dynamically allocate or free memory, but this dynamic
memory isn't counted in offload map memory usage currently.

The result of each map can be found in the individual patch.

We may also need to track per-container bpf memory usage, that will be
addressed by a different patchset.

Changes:
v3->v4: code improvement on ringbuf (Andrii)
use READ_ONCE() to read lpm_trie (Tao)
explain why we can't get bpf memory usage from memcg.
v2->v3: check callback at map creation time and avoid warning (Alexei)
fix build error under CONFIG_BPF=n (lkp@intel.com)
v1->v2: calculate the memory usage within bpf (Alexei)
- [v1] bpf, mm: bpf memory usage
https://lwn.net/Articles/921991/
- [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/919848/
- [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo
https://lwn.net/Articles/917647/
- [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat
https://lore.kernel.org/bpf/20220921170002.29557-1-laoar.shao@gmail].com/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


# 7490b7f1 05-Mar-2023 Yafang Shao <laoar.shao@gmail.com>

bpf, net: bpf_local_storage memory usage

A new helper is introduced into bpf_local_storage map to calculate the
memory usage. This helper is also used by other maps like
bpf_cgrp_storage, bpf_inode_

bpf, net: bpf_local_storage memory usage

A new helper is introduced into bpf_local_storage map to calculate the
memory usage. This helper is also used by other maps like
bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.

Note that currently the dynamically allocated storage elements are not
counted in the usage, since it will take extra runtime overhead in the
elements update or delete path. So let's put it aside now, and implement
it in the future when someone really need it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

show more ...


Revision tags: v6.1.15, v6.1.14, v6.1.13
# 7ae9fb1b 21-Feb-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.3 merge window.


12