xref: /openbmc/linux/Documentation/bpf/kfuncs.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1bdbda395SDavid Vernet.. SPDX-License-Identifier: GPL-2.0
2bdbda395SDavid Vernet
3bdbda395SDavid Vernet.. _kfuncs-header-label:
4bdbda395SDavid Vernet
563e564ebSKumar Kartikeya Dwivedi=============================
663e564ebSKumar Kartikeya DwivediBPF Kernel Functions (kfuncs)
763e564ebSKumar Kartikeya Dwivedi=============================
863e564ebSKumar Kartikeya Dwivedi
963e564ebSKumar Kartikeya Dwivedi1. Introduction
1063e564ebSKumar Kartikeya Dwivedi===============
1163e564ebSKumar Kartikeya Dwivedi
1263e564ebSKumar Kartikeya DwivediBPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
1363e564ebSKumar Kartikeya Dwivedikernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
1463e564ebSKumar Kartikeya Dwivedikfuncs do not have a stable interface and can change from one kernel release to
1563e564ebSKumar Kartikeya Dwivedianother. Hence, BPF programs need to be updated in response to changes in the
1616c294a6SDavid Vernetkernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
1763e564ebSKumar Kartikeya Dwivedi
1863e564ebSKumar Kartikeya Dwivedi2. Defining a kfunc
1963e564ebSKumar Kartikeya Dwivedi===================
2063e564ebSKumar Kartikeya Dwivedi
2163e564ebSKumar Kartikeya DwivediThere are two ways to expose a kernel function to BPF programs, either make an
2263e564ebSKumar Kartikeya Dwivediexisting function in the kernel visible, or add a new wrapper for BPF. In both
2363e564ebSKumar Kartikeya Dwivedicases, care must be taken that BPF program can only call such function in a
2463e564ebSKumar Kartikeya Dwivedivalid context. To enforce this, visibility of a kfunc can be per program type.
2563e564ebSKumar Kartikeya Dwivedi
2663e564ebSKumar Kartikeya DwivediIf you are not creating a BPF wrapper for existing kernel function, skip ahead
2763e564ebSKumar Kartikeya Dwivedito :ref:`BPF_kfunc_nodef`.
2863e564ebSKumar Kartikeya Dwivedi
2963e564ebSKumar Kartikeya Dwivedi2.1 Creating a wrapper kfunc
3063e564ebSKumar Kartikeya Dwivedi----------------------------
3163e564ebSKumar Kartikeya Dwivedi
3263e564ebSKumar Kartikeya DwivediWhen defining a wrapper kfunc, the wrapper function should have extern linkage.
3363e564ebSKumar Kartikeya DwivediThis prevents the compiler from optimizing away dead code, as this wrapper kfunc
3463e564ebSKumar Kartikeya Dwivediis not invoked anywhere in the kernel itself. It is not necessary to provide a
3563e564ebSKumar Kartikeya Dwivediprototype in a header for the wrapper kfunc.
3663e564ebSKumar Kartikeya Dwivedi
3763e564ebSKumar Kartikeya DwivediAn example is given below::
3863e564ebSKumar Kartikeya Dwivedi
3963e564ebSKumar Kartikeya Dwivedi        /* Disables missing prototype warnings */
4063e564ebSKumar Kartikeya Dwivedi        __diag_push();
4163e564ebSKumar Kartikeya Dwivedi        __diag_ignore_all("-Wmissing-prototypes",
4263e564ebSKumar Kartikeya Dwivedi                          "Global kfuncs as their definitions will be in BTF");
4363e564ebSKumar Kartikeya Dwivedi
4498e6ab7aSDavid Vernet        __bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
4563e564ebSKumar Kartikeya Dwivedi        {
4663e564ebSKumar Kartikeya Dwivedi                return find_get_task_by_vpid(nr);
4763e564ebSKumar Kartikeya Dwivedi        }
4863e564ebSKumar Kartikeya Dwivedi
4963e564ebSKumar Kartikeya Dwivedi        __diag_pop();
5063e564ebSKumar Kartikeya Dwivedi
5163e564ebSKumar Kartikeya DwivediA wrapper kfunc is often needed when we need to annotate parameters of the
5263e564ebSKumar Kartikeya Dwivedikfunc. Otherwise one may directly make the kfunc visible to the BPF program by
5363e564ebSKumar Kartikeya Dwivediregistering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
5463e564ebSKumar Kartikeya Dwivedi
5563e564ebSKumar Kartikeya Dwivedi2.2 Annotating kfunc parameters
5663e564ebSKumar Kartikeya Dwivedi-------------------------------
5763e564ebSKumar Kartikeya Dwivedi
5863e564ebSKumar Kartikeya DwivediSimilar to BPF helpers, there is sometime need for additional context required
5963e564ebSKumar Kartikeya Dwivediby the verifier to make the usage of kernel functions safer and more useful.
6063e564ebSKumar Kartikeya DwivediHence, we can annotate a parameter by suffixing the name of the argument of the
6163e564ebSKumar Kartikeya Dwivedikfunc with a __tag, where tag may be one of the supported annotations.
6263e564ebSKumar Kartikeya Dwivedi
6363e564ebSKumar Kartikeya Dwivedi2.2.1 __sz Annotation
6463e564ebSKumar Kartikeya Dwivedi---------------------
6563e564ebSKumar Kartikeya Dwivedi
6663e564ebSKumar Kartikeya DwivediThis annotation is used to indicate a memory and size pair in the argument list.
6763e564ebSKumar Kartikeya DwivediAn example is given below::
6863e564ebSKumar Kartikeya Dwivedi
6998e6ab7aSDavid Vernet        __bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
7063e564ebSKumar Kartikeya Dwivedi        {
7163e564ebSKumar Kartikeya Dwivedi        ...
7263e564ebSKumar Kartikeya Dwivedi        }
7363e564ebSKumar Kartikeya Dwivedi
7463e564ebSKumar Kartikeya DwivediHere, the verifier will treat first argument as a PTR_TO_MEM, and second
7563e564ebSKumar Kartikeya Dwivediargument as its size. By default, without __sz annotation, the size of the type
7663e564ebSKumar Kartikeya Dwivediof the pointer is used. Without __sz annotation, a kfunc cannot accept a void
7763e564ebSKumar Kartikeya Dwivedipointer.
7863e564ebSKumar Kartikeya Dwivedi
79a50388dbSKumar Kartikeya Dwivedi2.2.2 __k Annotation
80a50388dbSKumar Kartikeya Dwivedi--------------------
81a50388dbSKumar Kartikeya Dwivedi
82a50388dbSKumar Kartikeya DwivediThis annotation is only understood for scalar arguments, where it indicates that
83a50388dbSKumar Kartikeya Dwivedithe verifier must check the scalar argument to be a known constant, which does
84a50388dbSKumar Kartikeya Dwivedinot indicate a size parameter, and the value of the constant is relevant to the
85a50388dbSKumar Kartikeya Dwivedisafety of the program.
86a50388dbSKumar Kartikeya Dwivedi
87a50388dbSKumar Kartikeya DwivediAn example is given below::
88a50388dbSKumar Kartikeya Dwivedi
8998e6ab7aSDavid Vernet        __bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
90a50388dbSKumar Kartikeya Dwivedi        {
91a50388dbSKumar Kartikeya Dwivedi        ...
92a50388dbSKumar Kartikeya Dwivedi        }
93a50388dbSKumar Kartikeya Dwivedi
94a50388dbSKumar Kartikeya DwivediHere, bpf_obj_new uses local_type_id argument to find out the size of that type
95a50388dbSKumar Kartikeya DwivediID in program's BTF and return a sized pointer to it. Each type ID will have a
96a50388dbSKumar Kartikeya Dwivedidistinct size, hence it is crucial to treat each such call as distinct when
97a50388dbSKumar Kartikeya Dwivedivalues don't match during verifier state pruning checks.
98a50388dbSKumar Kartikeya Dwivedi
99a50388dbSKumar Kartikeya DwivediHence, whenever a constant scalar argument is accepted by a kfunc which is not a
100a50388dbSKumar Kartikeya Dwivedisize parameter, and the value of the constant matters for program safety, __k
101a50388dbSKumar Kartikeya Dwivedisuffix should be used.
102a50388dbSKumar Kartikeya Dwivedi
103d96d937dSJoanne Koong2.2.3 __uninit Annotation
104db52b587SDavid Vernet-------------------------
105d96d937dSJoanne Koong
106d96d937dSJoanne KoongThis annotation is used to indicate that the argument will be treated as
107d96d937dSJoanne Koonguninitialized.
108d96d937dSJoanne Koong
109d96d937dSJoanne KoongAn example is given below::
110d96d937dSJoanne Koong
111d96d937dSJoanne Koong        __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit)
112d96d937dSJoanne Koong        {
113d96d937dSJoanne Koong        ...
114d96d937dSJoanne Koong        }
115d96d937dSJoanne Koong
116d96d937dSJoanne KoongHere, the dynptr will be treated as an uninitialized dynptr. Without this
117d96d937dSJoanne Koongannotation, the verifier will reject the program if the dynptr passed in is
118d96d937dSJoanne Koongnot initialized.
119d96d937dSJoanne Koong
12063e564ebSKumar Kartikeya Dwivedi2.2.4 __opt Annotation
12163e564ebSKumar Kartikeya Dwivedi-------------------------
12263e564ebSKumar Kartikeya Dwivedi
12363e564ebSKumar Kartikeya DwivediThis annotation is used to indicate that the buffer associated with an __sz or __szk
12463e564ebSKumar Kartikeya Dwivediargument may be null. If the function is passed a nullptr in place of the buffer,
12563e564ebSKumar Kartikeya Dwivedithe verifier will not check that length is appropriate for the buffer. The kfunc is
12663e564ebSKumar Kartikeya Dwivediresponsible for checking if this buffer is null before using it.
12763e564ebSKumar Kartikeya Dwivedi
12863e564ebSKumar Kartikeya DwivediAn example is given below::
12963e564ebSKumar Kartikeya Dwivedi
13063e564ebSKumar Kartikeya Dwivedi        __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
13163e564ebSKumar Kartikeya Dwivedi        {
13263e564ebSKumar Kartikeya Dwivedi        ...
13363e564ebSKumar Kartikeya Dwivedi        }
13463e564ebSKumar Kartikeya Dwivedi
13563e564ebSKumar Kartikeya DwivediHere, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
13663e564ebSKumar Kartikeya DwivediEither way, the returned buffer is either NULL, or of size buffer_szk. Without this
13763e564ebSKumar Kartikeya Dwivediannotation, the verifier will reject the program if a null pointer is passed in with
13863e564ebSKumar Kartikeya Dwivedia nonzero size.
13963e564ebSKumar Kartikeya Dwivedi
14063e564ebSKumar Kartikeya Dwivedi
14163e564ebSKumar Kartikeya Dwivedi.. _BPF_kfunc_nodef:
14263e564ebSKumar Kartikeya Dwivedi
14363e564ebSKumar Kartikeya Dwivedi2.3 Using an existing kernel function
14463e564ebSKumar Kartikeya Dwivedi-------------------------------------
14598e6ab7aSDavid Vernet
14698e6ab7aSDavid VernetWhen an existing function in the kernel is fit for consumption by BPF programs,
14798e6ab7aSDavid Vernetit can be directly registered with the BPF subsystem. However, care must still
14898e6ab7aSDavid Vernetbe taken to review the context in which it will be invoked by the BPF program
14998e6ab7aSDavid Vernetand whether it is safe to do so.
15098e6ab7aSDavid Vernet
15198e6ab7aSDavid Vernet2.4 Annotating kfuncs
15298e6ab7aSDavid Vernet---------------------
15398e6ab7aSDavid Vernet
15498e6ab7aSDavid VernetIn addition to kfuncs' arguments, verifier may need more information about the
15598e6ab7aSDavid Vernettype of kfunc(s) being registered with the BPF subsystem. To do so, we define
15698e6ab7aSDavid Vernetflags on a set of kfuncs as follows::
15798e6ab7aSDavid Vernet
15898e6ab7aSDavid Vernet        BTF_SET8_START(bpf_task_set)
15963e564ebSKumar Kartikeya Dwivedi        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
16063e564ebSKumar Kartikeya Dwivedi        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
16163e564ebSKumar Kartikeya Dwivedi        BTF_SET8_END(bpf_task_set)
16263e564ebSKumar Kartikeya Dwivedi
16363e564ebSKumar Kartikeya DwivediThis set encodes the BTF ID of each kfunc listed above, and encodes the flags
16463e564ebSKumar Kartikeya Dwivedialong with it. Ofcourse, it is also allowed to specify no flags.
16563e564ebSKumar Kartikeya Dwivedi
16663e564ebSKumar Kartikeya Dwivedikfunc definitions should also always be annotated with the ``__bpf_kfunc``
16763e564ebSKumar Kartikeya Dwivedimacro. This prevents issues such as the compiler inlining the kfunc if it's a
16863e564ebSKumar Kartikeya Dwivedistatic kernel function, or the function being elided in an LTO build as it's
16963e564ebSKumar Kartikeya Dwivedinot used in the rest of the kernel. Developers should not manually add
17063e564ebSKumar Kartikeya Dwivediannotations to their kfunc to prevent these issues. If an annotation is
17163e564ebSKumar Kartikeya Dwivedirequired to prevent such an issue with your kfunc, it is a bug and should be
17263e564ebSKumar Kartikeya Dwivediadded to the definition of the macro so that other kfuncs are similarly
17363e564ebSKumar Kartikeya Dwivediprotected. An example is given below::
17463e564ebSKumar Kartikeya Dwivedi
17563e564ebSKumar Kartikeya Dwivedi        __bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
17663e564ebSKumar Kartikeya Dwivedi        {
17763e564ebSKumar Kartikeya Dwivedi        ...
17863e564ebSKumar Kartikeya Dwivedi        }
17963e564ebSKumar Kartikeya Dwivedi
18063e564ebSKumar Kartikeya Dwivedi2.4.1 KF_ACQUIRE flag
18163e564ebSKumar Kartikeya Dwivedi---------------------
1826c831c46SDavid Vernet
1836c831c46SDavid VernetThe KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
1846c831c46SDavid Vernetrefcounted object. The verifier will then ensure that the pointer to the object
1856c831c46SDavid Vernetis eventually released using a release kfunc, or transferred to a map using a
18663e564ebSKumar Kartikeya Dwivedireferenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
187*530474e6SDavid Vernetloading of the BPF program until no lingering references remain in all possible
18863e564ebSKumar Kartikeya Dwivediexplored states of the program.
18963e564ebSKumar Kartikeya Dwivedi
19063e564ebSKumar Kartikeya Dwivedi2.4.2 KF_RET_NULL flag
1913f00c523SDavid Vernet----------------------
1923f00c523SDavid Vernet
193d94cbde2SDavid VernetThe KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
194d94cbde2SDavid Vernetmay be NULL. Hence, it forces the user to do a NULL check on the pointer
195eed807f6SKumar Kartikeya Dwivedireturned from the kfunc before making use of it (dereferencing or passing to
1963f00c523SDavid Vernetanother helper). This flag is often used in pairing with KF_ACQUIRE flag, but
197eed807f6SKumar Kartikeya Dwivediboth are orthogonal to each other.
1983f00c523SDavid Vernet
199*530474e6SDavid Vernet2.4.3 KF_RELEASE flag
200eed807f6SKumar Kartikeya Dwivedi---------------------
2013f00c523SDavid Vernet
2023f00c523SDavid VernetThe KF_RELEASE flag is used to indicate that the kfunc releases the pointer
2033f00c523SDavid Vernetpassed in to it. There can be only one referenced pointer that can be passed
2043f00c523SDavid Vernetin. All copies of the pointer being released are invalidated as a result of
2053f00c523SDavid Vernetinvoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
20663e564ebSKumar Kartikeya Dwivediprotection afforded by the KF_TRUSTED_ARGS flag described below.
207d94cbde2SDavid Vernet
208d94cbde2SDavid Vernet2.4.4 KF_TRUSTED_ARGS flag
209d94cbde2SDavid Vernet--------------------------
210d94cbde2SDavid Vernet
211d94cbde2SDavid VernetThe KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
212d94cbde2SDavid Vernetindicates that the all pointer arguments are valid, and that all pointers to
213d94cbde2SDavid VernetBTF objects have been passed in their unmodified form (that is, at a zero
214d94cbde2SDavid Vernetoffset, and without having been obtained from walking another pointer, with one
215d94cbde2SDavid Vernetexception described below).
216d94cbde2SDavid Vernet
217d94cbde2SDavid VernetThere are two types of pointers to kernel objects which are considered "valid":
218d94cbde2SDavid Vernet
219d94cbde2SDavid Vernet1. Pointers which are passed as tracepoint or struct_ops callback arguments.
220d94cbde2SDavid Vernet2. Pointers which were returned from a KF_ACQUIRE kfunc.
221d94cbde2SDavid Vernet
222d94cbde2SDavid VernetPointers to non-BTF objects (e.g. scalar pointers) may also be passed to
223d94cbde2SDavid VernetKF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
224d94cbde2SDavid Vernet
225d94cbde2SDavid VernetThe definition of "valid" pointers is subject to change at any time, and has
226*530474e6SDavid Vernetabsolutely no ABI stability guarantees.
227fa96b242SBenjamin Tissoires
228fa96b242SBenjamin TissoiresAs mentioned above, a nested pointer obtained from walking a trusted pointer is
229fa96b242SBenjamin Tissoiresno longer trusted, with one exception. If a struct type has a field that is
230fa96b242SBenjamin Tissoiresguaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long
231fa96b242SBenjamin Tissoiresas its parent pointer is valid, the following macros can be used to express
232*530474e6SDavid Vernetthat to the verifier:
2334dd48c6fSArtem Savkov
2344dd48c6fSArtem Savkov* ``BTF_TYPE_SAFE_TRUSTED``
2354dd48c6fSArtem Savkov* ``BTF_TYPE_SAFE_RCU``
2364dd48c6fSArtem Savkov* ``BTF_TYPE_SAFE_RCU_OR_NULL``
2374dd48c6fSArtem Savkov
2384dd48c6fSArtem SavkovFor example,
2394dd48c6fSArtem Savkov
2404dd48c6fSArtem Savkov.. code-block:: c
241*530474e6SDavid Vernet
242f5362564SYonghong Song	BTF_TYPE_SAFE_TRUSTED(struct socket) {
243f5362564SYonghong Song		struct sock *sk;
24420c09d92SAlexei Starovoitov	};
24520c09d92SAlexei Starovoitov
24620c09d92SAlexei Starovoitovor
24720c09d92SAlexei Starovoitov
24820c09d92SAlexei Starovoitov.. code-block:: c
24920c09d92SAlexei Starovoitov
25020c09d92SAlexei Starovoitov	BTF_TYPE_SAFE_RCU(struct task_struct) {
251f5362564SYonghong Song		const cpumask_t *cpus_ptr;
25216c294a6SDavid Vernet		struct css_set __rcu *cgroups;
25316c294a6SDavid Vernet		struct task_struct __rcu *real_parent;
254*530474e6SDavid Vernet		struct task_struct *group_leader;
25516c294a6SDavid Vernet	};
25616c294a6SDavid Vernet
25716c294a6SDavid VernetIn other words, you must:
25816c294a6SDavid Vernet
25916c294a6SDavid Vernet1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro.
26016c294a6SDavid Vernet
26116c294a6SDavid Vernet2. Specify the type and name of the valid nested field. This field must match
26216c294a6SDavid Vernet   the field in the original type definition exactly.
26316c294a6SDavid Vernet
26416c294a6SDavid VernetA new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so
26516c294a6SDavid Vernetthat it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)``
26616c294a6SDavid Vernetis emitted in the ``type_is_trusted()`` function as follows:
26716c294a6SDavid Vernet
26816c294a6SDavid Vernet.. code-block:: c
26916c294a6SDavid Vernet
27016c294a6SDavid Vernet	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket));
27116c294a6SDavid Vernet
27216c294a6SDavid Vernet
27316c294a6SDavid Vernet2.4.5 KF_SLEEPABLE flag
27463e564ebSKumar Kartikeya Dwivedi-----------------------
27563e564ebSKumar Kartikeya Dwivedi
27663e564ebSKumar Kartikeya DwivediThe KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
27763e564ebSKumar Kartikeya Dwivedibe called by sleepable BPF programs (BPF_F_SLEEPABLE).
27863e564ebSKumar Kartikeya Dwivedi
27963e564ebSKumar Kartikeya Dwivedi2.4.6 KF_DESTRUCTIVE flag
28063e564ebSKumar Kartikeya Dwivedi--------------------------
28163e564ebSKumar Kartikeya Dwivedi
28263e564ebSKumar Kartikeya DwivediThe KF_DESTRUCTIVE flag is used to indicate functions calling which is
28363e564ebSKumar Kartikeya Dwivedidestructive to the system. For example such a call can result in system
28463e564ebSKumar Kartikeya Dwivedirebooting or panicking. Due to this additional restrictions apply to these
28563e564ebSKumar Kartikeya Dwivedicalls. At the moment they only require CAP_SYS_BOOT capability, but more can be
28663e564ebSKumar Kartikeya Dwivediadded later.
28763e564ebSKumar Kartikeya Dwivedi
28863e564ebSKumar Kartikeya Dwivedi2.4.7 KF_RCU flag
28963e564ebSKumar Kartikeya Dwivedi-----------------
29063e564ebSKumar Kartikeya Dwivedi
29163e564ebSKumar Kartikeya DwivediThe KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
29263e564ebSKumar Kartikeya DwivediKF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
29363e564ebSKumar Kartikeya Dwivedithat the objects are valid and there is no use-after-free. The pointers are not
29463e564ebSKumar Kartikeya DwivediNULL, but the object's refcount could have reached zero. The kfuncs need to
29563e564ebSKumar Kartikeya Dwivediconsider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
29625c5e92dSDavid Vernetpointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
297027bdec8SDavid Vernetalso be KF_RET_NULL.
298027bdec8SDavid Vernet
299027bdec8SDavid Vernet.. _KF_deprecated_flag:
300027bdec8SDavid Vernet
301027bdec8SDavid Vernet2.4.8 KF_DEPRECATED flag
302027bdec8SDavid Vernet------------------------
303027bdec8SDavid Vernet
304027bdec8SDavid VernetThe KF_DEPRECATED flag is used for kfuncs which are scheduled to be
305027bdec8SDavid Vernetchanged or removed in a subsequent kernel release. A kfunc that is
306027bdec8SDavid Vernetmarked with KF_DEPRECATED should also have any relevant information
307027bdec8SDavid Vernetcaptured in its kernel doc. Such information typically includes the
308027bdec8SDavid Vernetkfunc's expected remaining lifespan, a recommendation for new
309027bdec8SDavid Vernetfunctionality that can replace it if any is available, and possibly a
310027bdec8SDavid Vernetrationale for why it is being removed.
311027bdec8SDavid Vernet
312027bdec8SDavid VernetNote that while on some occasions, a KF_DEPRECATED kfunc may continue to be
313027bdec8SDavid Vernetsupported and have its KF_DEPRECATED flag removed, it is likely to be far more
314027bdec8SDavid Vernetdifficult to remove a KF_DEPRECATED flag after it's been added than it is to
315027bdec8SDavid Vernetprevent it from being added in the first place. As described in
316027bdec8SDavid Vernet:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
317027bdec8SDavid Vernetencouraged to make their use-cases known as early as possible, and participate
318027bdec8SDavid Vernetin upstream discussions regarding whether to keep, change, deprecate, or remove
319027bdec8SDavid Vernetthose kfuncs if and when such discussions occur.
320027bdec8SDavid Vernet
321027bdec8SDavid Vernet2.5 Registering the kfuncs
322027bdec8SDavid Vernet--------------------------
323027bdec8SDavid Vernet
324027bdec8SDavid VernetOnce the kfunc is prepared for use, the final step to making it visible is
325027bdec8SDavid Vernetregistering it with the BPF subsystem. Registration is done per BPF program
326027bdec8SDavid Vernettype. An example is shown below::
327027bdec8SDavid Vernet
328027bdec8SDavid Vernet        BTF_SET8_START(bpf_task_set)
329027bdec8SDavid Vernet        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
330027bdec8SDavid Vernet        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
331027bdec8SDavid Vernet        BTF_SET8_END(bpf_task_set)
332027bdec8SDavid Vernet
333027bdec8SDavid Vernet        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
334027bdec8SDavid Vernet                .owner = THIS_MODULE,
335027bdec8SDavid Vernet                .set   = &bpf_task_set,
336027bdec8SDavid Vernet        };
337027bdec8SDavid Vernet
338027bdec8SDavid Vernet        static int init_subsystem(void)
339027bdec8SDavid Vernet        {
34016c294a6SDavid Vernet                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
34116c294a6SDavid Vernet        }
34216c294a6SDavid Vernet        late_initcall(init_subsystem);
34316c294a6SDavid Vernet
34416c294a6SDavid Vernet2.6  Specifying no-cast aliases with ___init
34516c294a6SDavid Vernet--------------------------------------------
34616c294a6SDavid Vernet
34716c294a6SDavid VernetThe verifier will always enforce that the BTF type of a pointer passed to a
34816c294a6SDavid Vernetkfunc by a BPF program, matches the type of pointer specified in the kfunc
34916c294a6SDavid Vernetdefinition. The verifier, does, however, allow types that are equivalent
35016c294a6SDavid Vernetaccording to the C standard to be passed to the same kfunc arg, even if their
35116c294a6SDavid VernetBTF_IDs differ.
35216c294a6SDavid Vernet
35316c294a6SDavid VernetFor example, for the following type definition:
35416c294a6SDavid Vernet
35516c294a6SDavid Vernet.. code-block:: c
35616c294a6SDavid Vernet
35716c294a6SDavid Vernet	struct bpf_cpumask {
35816c294a6SDavid Vernet		cpumask_t cpumask;
35916c294a6SDavid Vernet		refcount_t usage;
36016c294a6SDavid Vernet	};
36116c294a6SDavid Vernet
36216c294a6SDavid VernetThe verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
36316c294a6SDavid Vernettaking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For
36416c294a6SDavid Vernetinstance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed
36516c294a6SDavid Vernetto bpf_cpumask_test_cpu().
36616c294a6SDavid Vernet
36716c294a6SDavid VernetIn some cases, this type-aliasing behavior is not desired. ``struct
36816c294a6SDavid Vernetnf_conn___init`` is one such example:
36916c294a6SDavid Vernet
37016c294a6SDavid Vernet.. code-block:: c
37116c294a6SDavid Vernet
37216c294a6SDavid Vernet	struct nf_conn___init {
37316c294a6SDavid Vernet		struct nf_conn ct;
37416c294a6SDavid Vernet	};
37516c294a6SDavid Vernet
37616c294a6SDavid VernetThe C standard would consider these types to be equivalent, but it would not
37716c294a6SDavid Vernetalways be safe to pass either type to a trusted kfunc. ``struct
37816c294a6SDavid Vernetnf_conn___init`` represents an allocated ``struct nf_conn`` object that has
37916c294a6SDavid Vernet*not yet been initialized*, so it would therefore be unsafe to pass a ``struct
38016c294a6SDavid Vernetnf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
38116c294a6SDavid Vernetnf_conn *`` (e.g. ``bpf_ct_change_timeout()``).
38216c294a6SDavid Vernet
38316c294a6SDavid VernetIn order to accommodate such requirements, the verifier will enforce strict
38416c294a6SDavid VernetPTR_TO_BTF_ID type matching if two types have the exact same name, with one
38516c294a6SDavid Vernetbeing suffixed with ``___init``.
38616c294a6SDavid Vernet
38716c294a6SDavid Vernet.. _BPF_kfunc_lifecycle_expectations:
38816c294a6SDavid Vernet
38916c294a6SDavid Vernet3. kfunc lifecycle expectations
39016c294a6SDavid Vernet===============================
39116c294a6SDavid Vernet
39216c294a6SDavid Vernetkfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
39316c294a6SDavid Vernetstrict stability restrictions associated with kernel <-> user UAPIs. This means
39416c294a6SDavid Vernetthey can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be
39516c294a6SDavid Vernetmodified or removed by a maintainer of the subsystem they're defined in when
39616c294a6SDavid Vernetit's deemed necessary.
39716c294a6SDavid Vernet
39816c294a6SDavid VernetLike any other change to the kernel, maintainers will not change or remove a
39916c294a6SDavid Vernetkfunc without having a reasonable justification.  Whether or not they'll choose
40016c294a6SDavid Vernetto change a kfunc will ultimately depend on a variety of factors, such as how
40116c294a6SDavid Vernetwidely used the kfunc is, how long the kfunc has been in the kernel, whether an
40216c294a6SDavid Vernetalternative kfunc exists, what the norm is in terms of stability for the
40316c294a6SDavid Vernetsubsystem in question, and of course what the technical cost is of continuing
40416c294a6SDavid Vernetto support the kfunc.
40516c294a6SDavid Vernet
40616c294a6SDavid VernetThere are several implications of this:
40716c294a6SDavid Vernet
40816c294a6SDavid Verneta) kfuncs that are widely used or have been in the kernel for a long time will
40916c294a6SDavid Vernet   be more difficult to justify being changed or removed by a maintainer. In
41016c294a6SDavid Vernet   other words, kfuncs that are known to have a lot of users and provide
41116c294a6SDavid Vernet   significant value provide stronger incentives for maintainers to invest the
41216c294a6SDavid Vernet   time and complexity in supporting them. It is therefore important for
41316c294a6SDavid Vernet   developers that are using kfuncs in their BPF programs to communicate and
41416c294a6SDavid Vernet   explain how and why those kfuncs are being used, and to participate in
41516c294a6SDavid Vernet   discussions regarding those kfuncs when they occur upstream.
41616c294a6SDavid Vernet
41716c294a6SDavid Vernetb) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs
41816c294a6SDavid Vernet   that call kfuncs are generally not part of the kernel tree. This means that
41916c294a6SDavid Vernet   refactoring cannot typically change callers in-place when a kfunc changes,
42016c294a6SDavid Vernet   as is done for e.g. an upstreamed driver being updated in place when a
42116c294a6SDavid Vernet   kernel symbol is changed.
42216c294a6SDavid Vernet
42316c294a6SDavid Vernet   Unlike with regular kernel symbols, this is expected behavior for BPF
42416c294a6SDavid Vernet   symbols, and out-of-tree BPF programs that use kfuncs should be considered
42516c294a6SDavid Vernet   relevant to discussions and decisions around modifying and removing those
42616c294a6SDavid Vernet   kfuncs. The BPF community will take an active role in participating in
42716c294a6SDavid Vernet   upstream discussions when necessary to ensure that the perspectives of such
42816c294a6SDavid Vernet   users are taken into account.
42916c294a6SDavid Vernet
43016c294a6SDavid Vernetc) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
43116c294a6SDavid Vernet   will not ever hard-block a change in the kernel purely for stability
43216c294a6SDavid Vernet   reasons. That being said, kfuncs are features that are meant to solve
43316c294a6SDavid Vernet   problems and provide value to users. The decision of whether to change or
43425c5e92dSDavid Vernet   remove a kfunc is a multivariate technical decision that is made on a
43525c5e92dSDavid Vernet   case-by-case basis, and which is informed by data points such as those
43625c5e92dSDavid Vernet   mentioned above. It is expected that a kfunc being removed or changed with
43725c5e92dSDavid Vernet   no warning will not be a common occurrence or take place without sound
43825c5e92dSDavid Vernet   justification, but it is a possibility that must be accepted if one is to
43925c5e92dSDavid Vernet   use kfuncs.
44016c294a6SDavid Vernet
44125c5e92dSDavid Vernet3.1 kfunc deprecation
44225c5e92dSDavid Vernet---------------------
44325c5e92dSDavid Vernet
44425c5e92dSDavid VernetAs described above, while sometimes a maintainer may find that a kfunc must be
44525c5e92dSDavid Vernetchanged or removed immediately to accommodate some changes in their subsystem,
44625c5e92dSDavid Vernetusually kfuncs will be able to accommodate a longer and more measured
44725c5e92dSDavid Vernetdeprecation process. For example, if a new kfunc comes along which provides
44825c5e92dSDavid Vernetsuperior functionality to an existing kfunc, the existing kfunc may be
44925c5e92dSDavid Vernetdeprecated for some period of time to allow users to migrate their BPF programs
45025c5e92dSDavid Vernetto use the new one. Or, if a kfunc has no known users, a decision may be made
45125c5e92dSDavid Vernetto remove the kfunc (without providing an alternative API) after some
45225c5e92dSDavid Vernetdeprecation period so as to provide users with a window to notify the kfunc
45325c5e92dSDavid Vernetmaintainer if it turns out that the kfunc is actually being used.
45425c5e92dSDavid Vernet
45525c5e92dSDavid VernetIt's expected that the common case will be that kfuncs will go through a
45625c5e92dSDavid Vernetdeprecation period rather than being changed or removed without warning. As
45725c5e92dSDavid Vernetdescribed in :ref:`KF_deprecated_flag`, the kfunc framework provides the
45825c5e92dSDavid VernetKF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been
45925c5e92dSDavid Vernetdeprecated. Once a kfunc has been marked with KF_DEPRECATED, the following
46025c5e92dSDavid Vernetprocedure is followed for removal:
46125c5e92dSDavid Vernet
46225c5e92dSDavid Vernet1. Any relevant information for deprecated kfuncs is documented in the kfunc's
46325c5e92dSDavid Vernet   kernel docs. This documentation will typically include the kfunc's expected
46425c5e92dSDavid Vernet   remaining lifespan, a recommendation for new functionality that can replace
465db9d479aSDavid Vernet   the usage of the deprecated function (or an explanation as to why no such
46625c5e92dSDavid Vernet   replacement exists), etc.
46725c5e92dSDavid Vernet
46825c5e92dSDavid Vernet2. The deprecated kfunc is kept in the kernel for some period of time after it
46925c5e92dSDavid Vernet   was first marked as deprecated. This time period will be chosen on a
47025c5e92dSDavid Vernet   case-by-case basis, and will typically depend on how widespread the use of
47125c5e92dSDavid Vernet   the kfunc is, how long it has been in the kernel, and how hard it is to move
47225c5e92dSDavid Vernet   to alternatives. This deprecation time period is "best effort", and as
47325c5e92dSDavid Vernet   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
47425c5e92dSDavid Vernet   sometimes dictate that the kfunc be removed before the full intended
475db9d479aSDavid Vernet   deprecation period has elapsed.
476db9d479aSDavid Vernet
477db9d479aSDavid Vernet3. After the deprecation period the kfunc will be removed. At this point, BPF
478db9d479aSDavid Vernet   programs calling the kfunc will be rejected by the verifier.
479db9d479aSDavid Vernet
480db9d479aSDavid Vernet4. Core kfuncs
481db9d479aSDavid Vernet==============
482db9d479aSDavid Vernet
483db9d479aSDavid VernetThe BPF subsystem provides a number of "core" kfuncs that are potentially
484db9d479aSDavid Vernetapplicable to a wide variety of different possible use cases and programs.
485db9d479aSDavid VernetThose kfuncs are documented here.
486db9d479aSDavid Vernet
487db9d479aSDavid Vernet4.1 struct task_struct * kfuncs
488db9d479aSDavid Vernet-------------------------------
489db9d479aSDavid Vernet
490db9d479aSDavid VernetThere are a number of kfuncs that allow ``struct task_struct *`` objects to be
491db9d479aSDavid Vernetused as kptrs:
492db9d479aSDavid Vernet
493db9d479aSDavid Vernet.. kernel-doc:: kernel/bpf/helpers.c
494db9d479aSDavid Vernet   :identifiers: bpf_task_acquire bpf_task_release
495db9d479aSDavid Vernet
496db9d479aSDavid VernetThese kfuncs are useful when you want to acquire or release a reference to a
497db9d479aSDavid Vernet``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
498db9d479aSDavid Vernetstruct_ops callback arg. For example:
499db9d479aSDavid Vernet
500db9d479aSDavid Vernet.. code-block:: c
501db9d479aSDavid Vernet
502db9d479aSDavid Vernet	/**
503db9d479aSDavid Vernet	 * A trivial example tracepoint program that shows how to
504db9d479aSDavid Vernet	 * acquire and release a struct task_struct * pointer.
505db9d479aSDavid Vernet	 */
506db9d479aSDavid Vernet	SEC("tp_btf/task_newtask")
507db9d479aSDavid Vernet	int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
508db9d479aSDavid Vernet	{
509db9d479aSDavid Vernet		struct task_struct *acquired;
510db9d479aSDavid Vernet
511db9d479aSDavid Vernet		acquired = bpf_task_acquire(task);
51225c5e92dSDavid Vernet		if (acquired)
51325c5e92dSDavid Vernet			/*
51425c5e92dSDavid Vernet			 * In a typical program you'd do something like store
51525c5e92dSDavid Vernet			 * the task in a map, and the map will automatically
51625c5e92dSDavid Vernet			 * release it later. Here, we release it manually.
51725c5e92dSDavid Vernet			 */
51825c5e92dSDavid Vernet			bpf_task_release(acquired);
51925c5e92dSDavid Vernet		return 0;
52025c5e92dSDavid Vernet	}
52125c5e92dSDavid Vernet
52225c5e92dSDavid Vernet
52325c5e92dSDavid VernetReferences acquired on ``struct task_struct *`` objects are RCU protected.
52425c5e92dSDavid VernetTherefore, when in an RCU read region, you can obtain a pointer to a task
52525c5e92dSDavid Vernetembedded in a map value without having to acquire a reference:
52625c5e92dSDavid Vernet
52725c5e92dSDavid Vernet.. code-block:: c
52825c5e92dSDavid Vernet
52925c5e92dSDavid Vernet	#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
53025c5e92dSDavid Vernet	private(TASK) static struct task_struct *global;
53125c5e92dSDavid Vernet
53225c5e92dSDavid Vernet	/**
53325c5e92dSDavid Vernet	 * A trivial example showing how to access a task stored
53425c5e92dSDavid Vernet	 * in a map using RCU.
53525c5e92dSDavid Vernet	 */
53625c5e92dSDavid Vernet	SEC("tp_btf/task_newtask")
53725c5e92dSDavid Vernet	int BPF_PROG(task_rcu_read_example, struct task_struct *task, u64 clone_flags)
53825c5e92dSDavid Vernet	{
53925c5e92dSDavid Vernet		struct task_struct *local_copy;
54025c5e92dSDavid Vernet
54125c5e92dSDavid Vernet		bpf_rcu_read_lock();
54225c5e92dSDavid Vernet		local_copy = global;
54325c5e92dSDavid Vernet		if (local_copy)
54425c5e92dSDavid Vernet			/*
54525c5e92dSDavid Vernet			 * We could also pass local_copy to kfuncs or helper functions here,
54625c5e92dSDavid Vernet			 * as we're guaranteed that local_copy will be valid until we exit
54725c5e92dSDavid Vernet			 * the RCU read region below.
54825c5e92dSDavid Vernet			 */
54925c5e92dSDavid Vernet			bpf_printk("Global task %s is valid", local_copy->comm);
55025c5e92dSDavid Vernet		else
55125c5e92dSDavid Vernet			bpf_printk("No global task found");
55236aa10ffSDavid Vernet		bpf_rcu_read_unlock();
55316c294a6SDavid Vernet
55436aa10ffSDavid Vernet		/* At this point we can no longer reference local_copy. */
55536aa10ffSDavid Vernet
55636aa10ffSDavid Vernet		return 0;
55736aa10ffSDavid Vernet	}
55836aa10ffSDavid Vernet
55936aa10ffSDavid Vernet----
56036aa10ffSDavid Vernet
56136aa10ffSDavid VernetA BPF program can also look up a task from a pid. This can be useful if the
56236aa10ffSDavid Vernetcaller doesn't have a trusted pointer to a ``struct task_struct *`` object that
56336aa10ffSDavid Vernetit can acquire a reference on with bpf_task_acquire().
56436aa10ffSDavid Vernet
56536aa10ffSDavid Vernet.. kernel-doc:: kernel/bpf/helpers.c
566332ea1f6STejun Heo   :identifiers: bpf_task_from_pid
567332ea1f6STejun Heo
568332ea1f6STejun HeoHere is an example of it being used:
569332ea1f6STejun Heo
57036aa10ffSDavid Vernet.. code-block:: c
57136aa10ffSDavid Vernet
57236aa10ffSDavid Vernet	SEC("tp_btf/task_newtask")
57336aa10ffSDavid Vernet	int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
574332ea1f6STejun Heo	{
575332ea1f6STejun Heo		struct task_struct *lookup;
576332ea1f6STejun Heo
57736aa10ffSDavid Vernet		lookup = bpf_task_from_pid(task->pid);
57836aa10ffSDavid Vernet		if (!lookup)
57936aa10ffSDavid Vernet			/* A task should always be found, as %task is a tracepoint arg. */
58036aa10ffSDavid Vernet			return -ENOENT;
58136aa10ffSDavid Vernet
58236aa10ffSDavid Vernet		if (lookup->pid != task->pid) {
58336aa10ffSDavid Vernet			/* bpf_task_from_pid() looks up the task via its
58436aa10ffSDavid Vernet			 * globally-unique pid from the init_pid_ns. Thus,
58536aa10ffSDavid Vernet			 * the pid of the lookup task should always be the
58636aa10ffSDavid Vernet			 * same as the input task.
58736aa10ffSDavid Vernet			 */
58836aa10ffSDavid Vernet			bpf_task_release(lookup);
58936aa10ffSDavid Vernet			return -EINVAL;
59036aa10ffSDavid Vernet		}
59136aa10ffSDavid Vernet
59236aa10ffSDavid Vernet		/* bpf_task_from_pid() returns an acquired reference,
59336aa10ffSDavid Vernet		 * so it must be dropped before returning from the
59436aa10ffSDavid Vernet		 * tracepoint handler.
59536aa10ffSDavid Vernet		 */
59636aa10ffSDavid Vernet		bpf_task_release(lookup);
59736aa10ffSDavid Vernet		return 0;
59836aa10ffSDavid Vernet	}
59936aa10ffSDavid Vernet
60036aa10ffSDavid Vernet4.2 struct cgroup * kfuncs
60136aa10ffSDavid Vernet--------------------------
60236aa10ffSDavid Vernet
603bdbda395SDavid Vernet``struct cgroup *`` objects also have acquire and release functions:
60416c294a6SDavid Vernet
605bdbda395SDavid Vernet.. kernel-doc:: kernel/bpf/helpers.c
606bdbda395SDavid Vernet   :identifiers: bpf_cgroup_acquire bpf_cgroup_release
607bdbda395SDavid Vernet
608bdbda395SDavid VernetThese kfuncs are used in exactly the same manner as bpf_task_acquire() and
609bdbda395SDavid Vernetbpf_task_release() respectively, so we won't provide examples for them.
610
611----
612
613Other kfuncs available for interacting with ``struct cgroup *`` objects are
614bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access
615the ancestor of a cgroup and find a cgroup by its ID, respectively. Both
616return a cgroup kptr.
617
618.. kernel-doc:: kernel/bpf/helpers.c
619   :identifiers: bpf_cgroup_ancestor
620
621.. kernel-doc:: kernel/bpf/helpers.c
622   :identifiers: bpf_cgroup_from_id
623
624Eventually, BPF should be updated to allow this to happen with a normal memory
625load in the program itself. This is currently not possible without more work in
626the verifier. bpf_cgroup_ancestor() can be used as follows:
627
628.. code-block:: c
629
630	/**
631	 * Simple tracepoint example that illustrates how a cgroup's
632	 * ancestor can be accessed using bpf_cgroup_ancestor().
633	 */
634	SEC("tp_btf/cgroup_mkdir")
635	int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
636	{
637		struct cgroup *parent;
638
639		/* The parent cgroup resides at the level before the current cgroup's level. */
640		parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
641		if (!parent)
642			return -ENOENT;
643
644		bpf_printk("Parent id is %d", parent->self.id);
645
646		/* Return the parent cgroup that was acquired above. */
647		bpf_cgroup_release(parent);
648		return 0;
649	}
650
6514.3 struct cpumask * kfuncs
652---------------------------
653
654BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
655destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
656for more details.
657