Revision tags: v6.6.67, v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
1a931707 |
| 16-Dec-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To resolve a trivial merge conflict with c302378bc157f6a7 ("libbpf: Hashmap interface update to allow both long and void* keys/values"),
Merge remote-tracking branch 'torvalds/master' into perf/core
To resolve a trivial merge conflict with c302378bc157f6a7 ("libbpf: Hashmap interface update to allow both long and void* keys/values"), where a function present upstream was removed in the perf tools development tree.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v6.0.13 |
|
#
4f2c0a4a |
| 13-Dec-2022 |
Nick Terrell <terrelln@fb.com> |
Merge branch 'main' into zstd-linus
|
#
7e68dd7d |
| 13-Dec-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Allow live renaming when an interface is up
- Ad
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing data races
- De-duplicate common code between OVS and conntrack offloading infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF programs
- Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting, and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error reporting
- RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking
- IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and migratable
- Extend devlink-rate to support strict prioriry and weighted fair queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter
- PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S
- PTP: - Orolia ART-CARD
- WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices
- Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support
- Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default
- Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support
- Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan
- MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support
- Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support
- RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
show more ...
|
#
26f708a2 |
| 12-Dec-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2022-12-11
We've added 74 non-merge comm
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
==================== pull-request: bpf-next 2022-12-11
We've added 74 non-merge commits during the last 11 day(s) which contain a total of 88 files changed, 3362 insertions(+), 789 deletions(-).
The main changes are:
1) Decouple prune and jump points handling in the verifier, from Andrii.
2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin. Merged from hid tree.
3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs, from Björn.
4) Don't use rcu_users to refcount in task kfuncs, from David.
5) Three reg_state->id fixes in the verifier, from Eduard.
6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou.
7) Refactor dynptr handling in the verifier, from Kumar.
8) Remove the "/sys" mount and umount dance in {open,close}_netns in bpf selftests, from Martin.
9) Enable sleepable support for cgrp local storage, from Yonghong.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits) selftests/bpf: test case for relaxed prunning of active_lock.id selftests/bpf: Add pruning test case for bpf_spin_lock bpf: use check_ids() for active_lock comparison selftests/bpf: verify states_equal() maintains idmap across all frames bpf: states_equal() must build idmap for all function frames selftests/bpf: test cases for regsafe() bug skipping check_id() bpf: regsafe() must not skip check_ids() docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE selftests/bpf: Add test for dynptr reinit in user_ringbuf callback bpf: Use memmove for bpf_dynptr_{read,write} bpf: Move PTR_TO_STACK alignment check to process_dynptr_func bpf: Rework check_func_arg_reg_off bpf: Rework process_dynptr_func bpf: Propagate errors from process_* checks in check_func_arg bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true bpf: Reuse freed element in free_by_rcu during allocation selftests/bpf: Bring test_offload.py back to life bpf: Fix comment error in fixup_kfunc_call function bpf: Do not zero-extend kfunc return values ... ====================
Link: https://lore.kernel.org/r/20221212024701.73809-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e291c116 |
| 12-Dec-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.2 merge window.
|
Revision tags: v6.1 |
|
#
26d6506a |
| 08-Dec-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Dynptr refactorings'
Kumar Kartikeya Dwivedi says:
====================
This is part 1 of https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com. This thread also gives
Merge branch 'Dynptr refactorings'
Kumar Kartikeya Dwivedi says:
====================
This is part 1 of https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com. This thread also gives some background on why the refactor is being done: https://lore.kernel.org/bpf/CAEf4Bzb4beTHgVo+G+jehSj8oCeAjRbRcm6MRe=Gr+cajRBwEw@mail.gmail.com
As requested in patch 6 by Alexei, it only includes patches which refactors the code, on top of which further fixes will be made in part 2. The refactor itself fixes another issue as a side effect. No functional change is intended (except a few modified log messages).
Changelog: ---------- v1 -> v2 v1: https://lore.kernel.org/bpf/20221115000130.1967465-1-memxor@gmail.com
* Address feedback from Joanne and David, add acks
Fixes v1 -> v1 Fixes v1: https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com
* Collect acks from Joanne and David * Fix misc nits pointed out by Joanne, David * Split move of reg->off alignment check for dynptr into separate change (Alexei) ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.0.12 |
|
#
76d16077 |
| 07-Dec-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Use memmove for bpf_dynptr_{read,write}
It may happen that destination buffer memory overlaps with memory dynptr points to. Hence, we must use memmove to correctly copy from dynptr to destinati
bpf: Use memmove for bpf_dynptr_{read,write}
It may happen that destination buffer memory overlaps with memory dynptr points to. Hence, we must use memmove to correctly copy from dynptr to destination buffer, or source buffer to dynptr.
This actually isn't a problem right now, as memcpy implementation falls back to memmove on detecting overlap and warns about it, but we shouldn't be relying on that.
Acked-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: David Vernet <void@manifault.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221207204141.308952-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
27060531 |
| 07-Dec-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Rework process_dynptr_func
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type for use in callback state, because in case of user ringbuf helpers, there is no dynptr on the
bpf: Rework process_dynptr_func
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type for use in callback state, because in case of user ringbuf helpers, there is no dynptr on the stack that is passed into the callback. To reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition of this feature. First, for arg_type with MEM_UNINIT flag which initialize a dynptr, they must be rejected for such register type. Secondly, in the future, there are plans to add dynptr helpers that operate on the dynptr itself and may change its offset and other properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way that will allow fitting in all classes of helpers and have a coherent model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this fact. To make the distinction clear, use MEM_RDONLY flag to indicate that the helper only operates on the memory pointed to by the dynptr, not the dynptr itself. In C parlance, it would be equivalent to taking the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to mutate both the dynptr itself and also the memory it points to. Currently, the read only status of the memory is not tracked in the dynptr, but it would be trivial to add this support inside dynptr state of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to better reflect its usage, it can no longer be passed to helpers that initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr, and unmark_stack_slots_dynptr, we implicitly rely on the fact that PTR_TO_STACK reg is the only case that can reach that code path, as one cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this doesn't break.
Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper") Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
2d141236 |
| 07-Dec-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Document some recent core kfunc additions'
David Vernet says:
====================
A series of recent patch sets introduced kfuncs that allowed struct task_struct and struct cgroup o
Merge branch 'Document some recent core kfunc additions'
David Vernet says:
====================
A series of recent patch sets introduced kfuncs that allowed struct task_struct and struct cgroup objects to be used as kptrs. These were introduced in [0], [1], and [2].
[0]: https://lore.kernel.org/lkml/20221120051004.3605026-1-void@manifault.com/ [1]: https://lore.kernel.org/lkml/20221122145300.251210-2-void@manifault.com/T/ [2]: https://lore.kernel.org/lkml/20221122055458.173143-1-void@manifault.com/
These are "core" kfuncs, in that they may be used by a wide variety of possible BPF tracepoint or struct_ops programs, and are defined in kernel/bpf/helpers.c. Even though as kfuncs they have no ABI stability guarantees, they should still be properly documented. This patch set adds that documentation.
Some other kfuncs were added recently as well, such as bpf_rcu_read_lock() and bpf_rcu_read_unlock(). Those could and should be added to this "Core kfuncs" section as well in subsequent patch sets.
Note that this patch set does not contain documentation for bpf_task_acquire_not_zero(), or bpf_task_kptr_get(). As discussed in [3], those kfuncs currently always return NULL pending resolution on how to properly protect their arguments using RCU.
[3]: https://lore.kernel.org/all/20221206210538.597606-1-void@manifault.com/ --- Changelog: v2 -> v3: - Don't document bpf_task_kptr_get(), and instead provide a more substantive example for bpf_cgroup_kptr_get(). - Further clarify expected behavior of bpf_task_from_pid() in comments (Alexei)
v1 -> v2: - Expand comment to specify that a map holds a reference to a task kptr if we don't end up releasing it (Alexei) - Just read task->pid instead of using a probed read (Alexei) ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
36aa10ff |
| 07-Dec-2022 |
David Vernet <void@manifault.com> |
bpf/docs: Document struct cgroup * kfuncs
bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and bpf_cgroup_ancestor(), are kfuncs that were recently added to kernel/bpf/helpers.c. T
bpf/docs: Document struct cgroup * kfuncs
bpf_cgroup_acquire(), bpf_cgroup_release(), bpf_cgroup_kptr_get(), and bpf_cgroup_ancestor(), are kfuncs that were recently added to kernel/bpf/helpers.c. These are "core" kfuncs in that they're available for use in any tracepoint or struct_ops BPF program. Though they have no ABI stability guarantees, we should still document them. This patch adds a struct cgroup * subsection to the Core kfuncs section which describes each of these kfuncs.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221207204911.873646-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
25c5e92d |
| 07-Dec-2022 |
David Vernet <void@manifault.com> |
bpf/docs: Document struct task_struct * kfuncs
bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are kfuncs that were recently added to kernel/bpf/helpers.c. These are "core" kfuncs in
bpf/docs: Document struct task_struct * kfuncs
bpf_task_acquire(), bpf_task_release(), and bpf_task_from_pid() are kfuncs that were recently added to kernel/bpf/helpers.c. These are "core" kfuncs in that they're available for use for any tracepoint or struct_ops BPF program. Though they have no ABI stability guarantees, we should still document them. This patch adds a new Core kfuncs section to the BPF kfuncs doc, and adds entries for all of these task kfuncs.
Note that bpf_task_kptr_get() is not documented, as it still returns NULL while we're working to resolve how it can use RCU to ensure struct task_struct * lifetime.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221207204911.873646-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
156ed20d |
| 06-Dec-2022 |
David Vernet <void@manifault.com> |
bpf: Don't use rcu_users to refcount in task kfuncs
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rc
bpf: Don't use rcu_users to refcount in task kfuncs
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime.
A struct task_struct is refcounted by two different refcount_t fields:
1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0.
2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount.
Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled.
If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields.
For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem.
In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior.
Fixes: 90660309b0c7 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa75518c ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <matus.jokay@stuba.sk> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
1910676c |
| 04-Dec-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf: Handle MEM_RCU type properly'
Yonghong Song says:
====================
Patch set [1] added rcu support for bpf programs. In [1], a rcu pointer is considered to be trusted and no
Merge branch 'bpf: Handle MEM_RCU type properly'
Yonghong Song says:
====================
Patch set [1] added rcu support for bpf programs. In [1], a rcu pointer is considered to be trusted and not null. This is actually not true in some cases. The rcu pointer could be null, and for non-null rcu pointer, it may have reference count of 0. This small patch set fixed this problem. Patch 1 is the kernel fix. Patch 2 adjusted selftests properly. Patch 3 added documentation for newly-introduced KF_RCU flag.
[1] https://lore.kernel.org/all/20221124053201.2372298-1-yhs@fb.com/
Changelogs: v1 -> v2: - rcu ptr could be NULL. - non_null_rcu_ptr->rcu_field can be marked as MEM_RCU as well. - Adjust the code to avoid existing error message change. ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
fca1aa75 |
| 03-Dec-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Handle MEM_RCU type properly
Commit 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that commit, a rcu pointer is tagged
bpf: Handle MEM_RCU type properly
Commit 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED so that it can be passed into kfuncs or helpers as an argument.
Martin raised a good question in [1] such that the rcu pointer, although being able to accessing the object, might have reference count of 0. This might cause a problem if the rcu pointer is passed to a kfunc which expects trusted arguments where ref count should be greater than 0.
This patch makes the following changes related to MEM_RCU pointer: - MEM_RCU pointer might be NULL (PTR_MAYBE_NULL). - Introduce KF_RCU so MEM_RCU ptr can be acquired with a KF_RCU tagged kfunc which assumes ref count of rcu ptr could be zero. - For mem access 'b = ptr->a', say 'ptr' is a MEM_RCU ptr, and 'a' is tagged with __rcu as well. Let us mark 'b' as MEM_RCU | PTR_MAYBE_NULL.
[1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/
Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203184602.477272-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.0.11 |
|
#
d6dc62fc |
| 28-Nov-2022 |
Jakub Kicinski <kuba@kernel.org> |
Daniel Borkmann says:
==================== bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain a total of 109 files changed, 8827 insertions(+), 1129 delet
Daniel Borkmann says:
==================== bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).
The main changes are:
1) Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF, from Kumar Kartikeya Dwivedi.
2) Add bpf_rcu_read_{,un}lock() support for sleepable programs, from Yonghong Song.
3) Add support storing struct task_struct objects as kptrs in maps, from David Vernet.
4) Batch of BPF map documentation improvements, from Maryam Tahhan and Donald Hunter.
5) Improve BPF verifier to propagate nullness information for branches of register to register comparisons, from Eduard Zingerman.
6) Fix cgroup BPF iter infra to hold reference on the start cgroup, from Hou Tao.
7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted given it is not the case for them, from Alexei Starovoitov.
8) Improve BPF verifier's realloc handling to better play along with dynamic runtime analysis tools like KASAN and friends, from Kees Cook.
9) Remove legacy libbpf mode support from bpftool, from Sahid Orentino Ferdjaoui.
10) Rework zero-len skb redirection checks to avoid potentially breaking existing BPF test infra users, from Stanislav Fomichev.
11) Two small refactorings which are independent and have been split out of the XDP queueing RFC series, from Toke Høiland-Jørgensen.
12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.
13) Documentation on how to run BPF CI without patch submission, from Daniel Müller.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> ====================
Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.0.10, v5.15.80 |
|
#
6099754a |
| 24-Nov-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf: Add bpf_rcu_read_lock() support'
Yonghong Song says:
====================
Currently, without rcu attribute info in BTF, the verifier treats rcu tagged pointer as a normal pointe
Merge branch 'bpf: Add bpf_rcu_read_lock() support'
Yonghong Song says:
====================
Currently, without rcu attribute info in BTF, the verifier treats rcu tagged pointer as a normal pointer. This might be a problem for sleepable program where rcu_read_lock()/unlock() is not available. For example, for a sleepable fentry program, if rcu protected memory access is interleaved with a sleepable helper/kfunc, it is possible the memory access after the sleepable helper/kfunc might be invalid since the object might have been freed then. Even without a sleepable helper/kfunc, without rcu_read_lock() protection, it is possible that the rcu protected object might be release in the middle of bpf program execution which may cause incorrect result.
To prevent above cases, enable btf_type_tag("rcu") attributes, introduce new bpf_rcu_read_lock/unlock() kfuncs and add verifier support.
In the rest of patch set, Patch 1 enabled btf_type_tag for __rcu attribute. Patche 2 added might_sleep in bpf_func_proto. Patch 3 added new bpf_rcu_read_lock/unlock() kfuncs and verifier support. Patch 4 added some tests for these two new kfuncs.
Changelogs: v9 -> v10: . if no rcu tag support in vmlinux btf, using bpf_rcu_read_lock/unlock() will cause verification error. . at bpf_rcu_read_unlock(), invalidate rcu ptr to PTR_UNTRUSTED instead of SCALAR_VALUE. . a few other comment changes and other minor changes. v8 -> v9: . remove sleepable prog check for ld_abs/ind checking in rcu read lock region. . fix a test failure with gcc-compiled kernel. . a couple of other minor fixes. v7 -> v8: . add might_sleep in bpf_func_proto so we can easily identify whether a helper is sleepable or not. . do not enforce rcu rules for sleepable, e.g., rcu dereference must be in a bpf_rcu_read_lock region. This is to keep old code working fine. . Mark 'b' in 'b = a->b' (b is tagged with __rcu) as MEM_RCU only if 'b = a->b' in rcu read region and 'a' is trusted. This adds safety guarantee for 'b' inside the rcu read region. v6 -> v7: . rebase on top of bpf-next. . remove the patch which enables sleepable program using cgrp_local_storage map. This is orthogonal to this patch set and will be addressed separately. . mark the rcu pointer dereference result as UNTRUSTED if inside a bpf_rcu_read_lock() region. v5 -> v6: . fix selftest prog miss_unlock which tested nested locking. . add comments in selftest prog cgrp_succ to explain how to handle nested memory access after rcu memory load. v4 -> v5: . add new test to aarch64 deny list. v3 -> v4: . fix selftest failures when built with gcc. gcc doesn't support btf_type_tag yet and some tests relies on that. skip these tests if vmlinux BTF does not have btf_type_tag("rcu"). v2 -> v3: . went back to MEM_RCU approach with invalidate rcu ptr registers at bpf_rcu_read_unlock() place. . remove KF_RCU_LOCK/UNLOCK flag and compare btf_id at verification time instead. v1 -> v2: . use kfunc instead of helper for bpf_rcu_read_lock/unlock. . not use MEM_RCU bpf_type_flag, instead use active_rcu_lock in reg state to identify rcu ptr's. . Add more self tests. . add new test to s390x deny list. ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
9bb00b28 |
| 23-Nov-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Add kfunc bpf_rcu_read_lock/unlock()
Add two kfunc's bpf_rcu_read_lock() and bpf_rcu_read_unlock(). These two kfunc's can be used for all program types. The following is an example about how rc
bpf: Add kfunc bpf_rcu_read_lock/unlock()
Add two kfunc's bpf_rcu_read_lock() and bpf_rcu_read_unlock(). These two kfunc's can be used for all program types. The following is an example about how rcu pointer are used w.r.t. bpf_rcu_read_lock()/bpf_rcu_read_unlock().
struct task_struct { ... struct task_struct *last_wakee; struct task_struct __rcu *real_parent; ... };
Let us say prog does 'task = bpf_get_current_task_btf()' to get a 'task' pointer. The basic rules are: - 'real_parent = task->real_parent' should be inside bpf_rcu_read_lock region. This is to simulate rcu_dereference() operation. The 'real_parent' is marked as MEM_RCU only if (1). task->real_parent is inside bpf_rcu_read_lock region, and (2). task is a trusted ptr. So MEM_RCU marked ptr can be 'trusted' inside the bpf_rcu_read_lock region. - 'last_wakee = real_parent->last_wakee' should be inside bpf_rcu_read_lock region since it tries to access rcu protected memory. - the ptr 'last_wakee' will be marked as PTR_UNTRUSTED since in general it is not clear whether the object pointed by 'last_wakee' is valid or not even inside bpf_rcu_read_lock region.
The verifier will reset all rcu pointer register states to untrusted at bpf_rcu_read_unlock() kfunc call site, so any such rcu pointer won't be trusted any more outside the bpf_rcu_read_lock() region.
The current implementation does not support nested rcu read lock region in the prog.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221124053217.2373910-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
01685c5b |
| 23-Nov-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Introduce might_sleep field in bpf_func_proto
Introduce bpf_func_proto->might_sleep to indicate a particular helper might sleep. This will make later check whether a helper might be sleepable o
bpf: Introduce might_sleep field in bpf_func_proto
Introduce bpf_func_proto->might_sleep to indicate a particular helper might sleep. This will make later check whether a helper might be sleepable or not easier.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221124053211.2373553-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
3f0e6f2b |
| 22-Nov-2022 |
David Vernet <void@manifault.com> |
bpf: Add bpf_task_from_pid() kfunc
Callers can currently store tasks as kptrs using bpf_task_acquire(), bpf_task_kptr_get(), and bpf_task_release(). These are useful if a caller already has a struct
bpf: Add bpf_task_from_pid() kfunc
Callers can currently store tasks as kptrs using bpf_task_acquire(), bpf_task_kptr_get(), and bpf_task_release(). These are useful if a caller already has a struct task_struct *, but there may be some callers who only have a pid, and want to look up the associated struct task_struct * from that to e.g. find task->comm.
This patch therefore adds a new bpf_task_from_pid() kfunc which allows BPF programs to get a struct task_struct * kptr from a pid.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122145300.251210-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
2fcc6081 |
| 23-Nov-2022 |
David Vernet <void@manifault.com> |
bpf: Don't use idx variable when registering kfunc dtors
In commit fda01efc6160 ("bpf: Enable cgroups to be used as kptrs"), I added an 'int idx' variable to kfunc_init() which was meant to dynamica
bpf: Don't use idx variable when registering kfunc dtors
In commit fda01efc6160 ("bpf: Enable cgroups to be used as kptrs"), I added an 'int idx' variable to kfunc_init() which was meant to dynamically set the index of the btf id entries of the 'generic_dtor_ids' array. This was done to make the code slightly less brittle as the struct cgroup * kptr kfuncs such as bpf_cgroup_aquire() are compiled out if CONFIG_CGROUPS is not defined. This, however, causes an lkp build warning:
>> kernel/bpf/helpers.c:2005:40: warning: multiple unsequenced modifications to 'idx' [-Wunsequenced] .btf_id = generic_dtor_ids[idx++],
Fix the warning by just hard-coding the indices.
Fixes: fda01efc6160 ("bpf: Enable cgroups to be used as kptrs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221123135253.637525-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
8a2162a9 |
| 22-Nov-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Support storing struct cgroup * objects as kptrs'
David Vernet says:
====================
In [0], we added support for storing struct task_struct * objects as kptrs. This patch set e
Merge branch 'Support storing struct cgroup * objects as kptrs'
David Vernet says:
====================
In [0], we added support for storing struct task_struct * objects as kptrs. This patch set extends this effort to also include storing struct cgroup * object as kptrs.
As with tasks, there are many possible use cases for storing cgroups in maps. During tracing, for example, it could be useful to query cgroup statistics such as PSI averages, or tracking which tasks are migrating to and from the cgroup.
[0]: https://lore.kernel.org/all/20221120051004.3605026-1-void@manifault.com/ ====================
Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
5ca78670 |
| 21-Nov-2022 |
David Vernet <void@manifault.com> |
bpf: Add bpf_cgroup_ancestor() kfunc
struct cgroup * objects have a variably sized struct cgroup *ancestors[] field which stores pointers to their ancestor cgroups. If using a cgroup as a kptr, it c
bpf: Add bpf_cgroup_ancestor() kfunc
struct cgroup * objects have a variably sized struct cgroup *ancestors[] field which stores pointers to their ancestor cgroups. If using a cgroup as a kptr, it can be useful to access these ancestors, but doing so requires variable offset accesses for PTR_TO_BTF_ID, which is currently unsupported.
This is a very useful field to access for cgroup kptrs, as programs may wish to walk their ancestor cgroups when determining e.g. their proportional cpu.weight. So as to enable this functionality with cgroup kptrs before var_off is supported for PTR_TO_BTF_ID, this patch adds a bpf_cgroup_ancestor() kfunc which accesses the cgroup node on behalf of the caller, and acquires a reference on it. Once var_off is supported for PTR_TO_BTF_ID, and fields inside a struct can be marked as trusted so they retain the PTR_TRUSTED modifier when walked, this can be removed.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
fda01efc |
| 21-Nov-2022 |
David Vernet <void@manifault.com> |
bpf: Enable cgroups to be used as kptrs
Now that tasks can be used as kfuncs, and the PTR_TRUSTED flag is available for us to easily add basic acquire / get / release kfuncs, we can do the same for
bpf: Enable cgroups to be used as kptrs
Now that tasks can be used as kfuncs, and the PTR_TRUSTED flag is available for us to easily add basic acquire / get / release kfuncs, we can do the same for cgroups. This patch set adds the following kfuncs which enable using cgroups as kptrs:
struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp); struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp); void bpf_cgroup_release(struct cgroup *cgrp);
A follow-on patch will add a selftest suite which validates these kfuncs.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
29583dfc |
| 21-Nov-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next-fixes
Backmerging to update drm-misc-next-fixes for the final phase of the release cycle.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
99429b22 |
| 20-Nov-2022 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf: Implement two type cast kfuncs'
Yonghong Song says:
====================
Currenty, a non-tracing bpf program typically has a single 'context' argument with predefined uapi struc
Merge branch 'bpf: Implement two type cast kfuncs'
Yonghong Song says:
====================
Currenty, a non-tracing bpf program typically has a single 'context' argument with predefined uapi struct type. Following these uapi struct, user is able to access other fields defined in uapi header. Inside the kernel, the user-seen 'context' argument is replaced with 'kernel context' (or 'kctx' in short) which can access more information than what uapi header provides. To access other info not in uapi header, people typically do two things: (1). extend uapi to access more fields rooted from 'context'. (2). use bpf_probe_read_kernl() helper to read particular field based on kctx. Using (1) needs uapi change and using (2) makes code more complex since direct memory access is not allowed.
There are already a few instances trying to access more information from kctx: . trying to access some fields from perf_event kctx ([1]). . trying to access some fields from xdp kctx ([2]).
This patch set tried to allow direct memory access for kctx fields by introducing bpf_cast_to_kern_ctx() kfunc.
Martin mentioned a use case like type casting below: #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) basically a 'unsigned char *" casted to 'struct skb_shared_info *'. This patch set tries to support such a use case as well with bpf_rdonly_cast().
For the patch series, Patch 1 added support for a kfunc available to all prog types. Patch 2 added bpf_cast_to_kern_ctx() kfunc. Patch 3 added bpf_rdonly_cast() kfunc. Patch 4 added a few positive and negative tests.
[1] https://lore.kernel.org/bpf/ad15b398-9069-4a0e-48cb-4bb651ec3088@meta.com/ [2] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/
Changelog: v3 -> v4: - remove unnecessary bpf_ctx_convert.t error checking - add and use meta.ret_btf_id instead of meta.arg_constant.value for bpf_cast_to_kern_ctx(). - add PTR_TRUSTED to the return PTR_TO_BTF_ID type for bpf_cast_to_kern_ctx(). v2 -> v3: - rebase on top of bpf-next (for merging conflicts) - add the selftest to s390x deny list rfcv1 -> v2: - break original one kfunc into two. - add missing error checks and error logs. - adapt to the new conventions in https://lore.kernel.org/all/20221118015614.2013203-1-memxor@gmail.com/ for example, with __ign and __k suffix. - added support in fixup_kfunc_call() to replace kfunc calls with a single mov. ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|