xref: /openbmc/linux/Documentation/bpf/kfuncs.rst (revision 9b68f30b)
1.. SPDX-License-Identifier: GPL-2.0
2
3.. _kfuncs-header-label:
4
5=============================
6BPF Kernel Functions (kfuncs)
7=============================
8
91. Introduction
10===============
11
12BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
13kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
14kfuncs do not have a stable interface and can change from one kernel release to
15another. Hence, BPF programs need to be updated in response to changes in the
16kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information.
17
182. Defining a kfunc
19===================
20
21There are two ways to expose a kernel function to BPF programs, either make an
22existing function in the kernel visible, or add a new wrapper for BPF. In both
23cases, care must be taken that BPF program can only call such function in a
24valid context. To enforce this, visibility of a kfunc can be per program type.
25
26If you are not creating a BPF wrapper for existing kernel function, skip ahead
27to :ref:`BPF_kfunc_nodef`.
28
292.1 Creating a wrapper kfunc
30----------------------------
31
32When defining a wrapper kfunc, the wrapper function should have extern linkage.
33This prevents the compiler from optimizing away dead code, as this wrapper kfunc
34is not invoked anywhere in the kernel itself. It is not necessary to provide a
35prototype in a header for the wrapper kfunc.
36
37An example is given below::
38
39        /* Disables missing prototype warnings */
40        __diag_push();
41        __diag_ignore_all("-Wmissing-prototypes",
42                          "Global kfuncs as their definitions will be in BTF");
43
44        __bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
45        {
46                return find_get_task_by_vpid(nr);
47        }
48
49        __diag_pop();
50
51A wrapper kfunc is often needed when we need to annotate parameters of the
52kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
53registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
54
552.2 Annotating kfunc parameters
56-------------------------------
57
58Similar to BPF helpers, there is sometime need for additional context required
59by the verifier to make the usage of kernel functions safer and more useful.
60Hence, we can annotate a parameter by suffixing the name of the argument of the
61kfunc with a __tag, where tag may be one of the supported annotations.
62
632.2.1 __sz Annotation
64---------------------
65
66This annotation is used to indicate a memory and size pair in the argument list.
67An example is given below::
68
69        __bpf_kfunc void bpf_memzero(void *mem, int mem__sz)
70        {
71        ...
72        }
73
74Here, the verifier will treat first argument as a PTR_TO_MEM, and second
75argument as its size. By default, without __sz annotation, the size of the type
76of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
77pointer.
78
792.2.2 __k Annotation
80--------------------
81
82This annotation is only understood for scalar arguments, where it indicates that
83the verifier must check the scalar argument to be a known constant, which does
84not indicate a size parameter, and the value of the constant is relevant to the
85safety of the program.
86
87An example is given below::
88
89        __bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...)
90        {
91        ...
92        }
93
94Here, bpf_obj_new uses local_type_id argument to find out the size of that type
95ID in program's BTF and return a sized pointer to it. Each type ID will have a
96distinct size, hence it is crucial to treat each such call as distinct when
97values don't match during verifier state pruning checks.
98
99Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
100size parameter, and the value of the constant matters for program safety, __k
101suffix should be used.
102
1032.2.3 __uninit Annotation
104-------------------------
105
106This annotation is used to indicate that the argument will be treated as
107uninitialized.
108
109An example is given below::
110
111        __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit)
112        {
113        ...
114        }
115
116Here, the dynptr will be treated as an uninitialized dynptr. Without this
117annotation, the verifier will reject the program if the dynptr passed in is
118not initialized.
119
1202.2.4 __opt Annotation
121-------------------------
122
123This annotation is used to indicate that the buffer associated with an __sz or __szk
124argument may be null. If the function is passed a nullptr in place of the buffer,
125the verifier will not check that length is appropriate for the buffer. The kfunc is
126responsible for checking if this buffer is null before using it.
127
128An example is given below::
129
130        __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
131        {
132        ...
133        }
134
135Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
136Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
137annotation, the verifier will reject the program if a null pointer is passed in with
138a nonzero size.
139
140
141.. _BPF_kfunc_nodef:
142
1432.3 Using an existing kernel function
144-------------------------------------
145
146When an existing function in the kernel is fit for consumption by BPF programs,
147it can be directly registered with the BPF subsystem. However, care must still
148be taken to review the context in which it will be invoked by the BPF program
149and whether it is safe to do so.
150
1512.4 Annotating kfuncs
152---------------------
153
154In addition to kfuncs' arguments, verifier may need more information about the
155type of kfunc(s) being registered with the BPF subsystem. To do so, we define
156flags on a set of kfuncs as follows::
157
158        BTF_SET8_START(bpf_task_set)
159        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
160        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
161        BTF_SET8_END(bpf_task_set)
162
163This set encodes the BTF ID of each kfunc listed above, and encodes the flags
164along with it. Ofcourse, it is also allowed to specify no flags.
165
166kfunc definitions should also always be annotated with the ``__bpf_kfunc``
167macro. This prevents issues such as the compiler inlining the kfunc if it's a
168static kernel function, or the function being elided in an LTO build as it's
169not used in the rest of the kernel. Developers should not manually add
170annotations to their kfunc to prevent these issues. If an annotation is
171required to prevent such an issue with your kfunc, it is a bug and should be
172added to the definition of the macro so that other kfuncs are similarly
173protected. An example is given below::
174
175        __bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid)
176        {
177        ...
178        }
179
1802.4.1 KF_ACQUIRE flag
181---------------------
182
183The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
184refcounted object. The verifier will then ensure that the pointer to the object
185is eventually released using a release kfunc, or transferred to a map using a
186referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
187loading of the BPF program until no lingering references remain in all possible
188explored states of the program.
189
1902.4.2 KF_RET_NULL flag
191----------------------
192
193The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
194may be NULL. Hence, it forces the user to do a NULL check on the pointer
195returned from the kfunc before making use of it (dereferencing or passing to
196another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
197both are orthogonal to each other.
198
1992.4.3 KF_RELEASE flag
200---------------------
201
202The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
203passed in to it. There can be only one referenced pointer that can be passed
204in. All copies of the pointer being released are invalidated as a result of
205invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
206protection afforded by the KF_TRUSTED_ARGS flag described below.
207
2082.4.4 KF_TRUSTED_ARGS flag
209--------------------------
210
211The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
212indicates that the all pointer arguments are valid, and that all pointers to
213BTF objects have been passed in their unmodified form (that is, at a zero
214offset, and without having been obtained from walking another pointer, with one
215exception described below).
216
217There are two types of pointers to kernel objects which are considered "valid":
218
2191. Pointers which are passed as tracepoint or struct_ops callback arguments.
2202. Pointers which were returned from a KF_ACQUIRE kfunc.
221
222Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
223KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
224
225The definition of "valid" pointers is subject to change at any time, and has
226absolutely no ABI stability guarantees.
227
228As mentioned above, a nested pointer obtained from walking a trusted pointer is
229no longer trusted, with one exception. If a struct type has a field that is
230guaranteed to be valid as long as its parent pointer is trusted, the
231``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as
232follows:
233
234.. code-block:: c
235
236	BTF_TYPE_SAFE_NESTED(struct task_struct) {
237		const cpumask_t *cpus_ptr;
238	};
239
240In other words, you must:
241
2421. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro.
243
2442. Specify the type and name of the trusted nested field. This field must match
245   the field in the original type definition exactly.
246
2472.4.5 KF_SLEEPABLE flag
248-----------------------
249
250The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
251be called by sleepable BPF programs (BPF_F_SLEEPABLE).
252
2532.4.6 KF_DESTRUCTIVE flag
254--------------------------
255
256The KF_DESTRUCTIVE flag is used to indicate functions calling which is
257destructive to the system. For example such a call can result in system
258rebooting or panicking. Due to this additional restrictions apply to these
259calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
260added later.
261
2622.4.7 KF_RCU flag
263-----------------
264
265The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
266KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
267that the objects are valid and there is no use-after-free. The pointers are not
268NULL, but the object's refcount could have reached zero. The kfuncs need to
269consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
270pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
271also be KF_RET_NULL.
272
273.. _KF_deprecated_flag:
274
2752.4.8 KF_DEPRECATED flag
276------------------------
277
278The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
279changed or removed in a subsequent kernel release. A kfunc that is
280marked with KF_DEPRECATED should also have any relevant information
281captured in its kernel doc. Such information typically includes the
282kfunc's expected remaining lifespan, a recommendation for new
283functionality that can replace it if any is available, and possibly a
284rationale for why it is being removed.
285
286Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be
287supported and have its KF_DEPRECATED flag removed, it is likely to be far more
288difficult to remove a KF_DEPRECATED flag after it's been added than it is to
289prevent it from being added in the first place. As described in
290:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are
291encouraged to make their use-cases known as early as possible, and participate
292in upstream discussions regarding whether to keep, change, deprecate, or remove
293those kfuncs if and when such discussions occur.
294
2952.5 Registering the kfuncs
296--------------------------
297
298Once the kfunc is prepared for use, the final step to making it visible is
299registering it with the BPF subsystem. Registration is done per BPF program
300type. An example is shown below::
301
302        BTF_SET8_START(bpf_task_set)
303        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
304        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
305        BTF_SET8_END(bpf_task_set)
306
307        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
308                .owner = THIS_MODULE,
309                .set   = &bpf_task_set,
310        };
311
312        static int init_subsystem(void)
313        {
314                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
315        }
316        late_initcall(init_subsystem);
317
3182.6  Specifying no-cast aliases with ___init
319--------------------------------------------
320
321The verifier will always enforce that the BTF type of a pointer passed to a
322kfunc by a BPF program, matches the type of pointer specified in the kfunc
323definition. The verifier, does, however, allow types that are equivalent
324according to the C standard to be passed to the same kfunc arg, even if their
325BTF_IDs differ.
326
327For example, for the following type definition:
328
329.. code-block:: c
330
331	struct bpf_cpumask {
332		cpumask_t cpumask;
333		refcount_t usage;
334	};
335
336The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc
337taking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For
338instance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed
339to bpf_cpumask_test_cpu().
340
341In some cases, this type-aliasing behavior is not desired. ``struct
342nf_conn___init`` is one such example:
343
344.. code-block:: c
345
346	struct nf_conn___init {
347		struct nf_conn ct;
348	};
349
350The C standard would consider these types to be equivalent, but it would not
351always be safe to pass either type to a trusted kfunc. ``struct
352nf_conn___init`` represents an allocated ``struct nf_conn`` object that has
353*not yet been initialized*, so it would therefore be unsafe to pass a ``struct
354nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct
355nf_conn *`` (e.g. ``bpf_ct_change_timeout()``).
356
357In order to accommodate such requirements, the verifier will enforce strict
358PTR_TO_BTF_ID type matching if two types have the exact same name, with one
359being suffixed with ``___init``.
360
361.. _BPF_kfunc_lifecycle_expectations:
362
3633. kfunc lifecycle expectations
364===============================
365
366kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the
367strict stability restrictions associated with kernel <-> user UAPIs. This means
368they can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be
369modified or removed by a maintainer of the subsystem they're defined in when
370it's deemed necessary.
371
372Like any other change to the kernel, maintainers will not change or remove a
373kfunc without having a reasonable justification.  Whether or not they'll choose
374to change a kfunc will ultimately depend on a variety of factors, such as how
375widely used the kfunc is, how long the kfunc has been in the kernel, whether an
376alternative kfunc exists, what the norm is in terms of stability for the
377subsystem in question, and of course what the technical cost is of continuing
378to support the kfunc.
379
380There are several implications of this:
381
382a) kfuncs that are widely used or have been in the kernel for a long time will
383   be more difficult to justify being changed or removed by a maintainer. In
384   other words, kfuncs that are known to have a lot of users and provide
385   significant value provide stronger incentives for maintainers to invest the
386   time and complexity in supporting them. It is therefore important for
387   developers that are using kfuncs in their BPF programs to communicate and
388   explain how and why those kfuncs are being used, and to participate in
389   discussions regarding those kfuncs when they occur upstream.
390
391b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs
392   that call kfuncs are generally not part of the kernel tree. This means that
393   refactoring cannot typically change callers in-place when a kfunc changes,
394   as is done for e.g. an upstreamed driver being updated in place when a
395   kernel symbol is changed.
396
397   Unlike with regular kernel symbols, this is expected behavior for BPF
398   symbols, and out-of-tree BPF programs that use kfuncs should be considered
399   relevant to discussions and decisions around modifying and removing those
400   kfuncs. The BPF community will take an active role in participating in
401   upstream discussions when necessary to ensure that the perspectives of such
402   users are taken into account.
403
404c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and
405   will not ever hard-block a change in the kernel purely for stability
406   reasons. That being said, kfuncs are features that are meant to solve
407   problems and provide value to users. The decision of whether to change or
408   remove a kfunc is a multivariate technical decision that is made on a
409   case-by-case basis, and which is informed by data points such as those
410   mentioned above. It is expected that a kfunc being removed or changed with
411   no warning will not be a common occurrence or take place without sound
412   justification, but it is a possibility that must be accepted if one is to
413   use kfuncs.
414
4153.1 kfunc deprecation
416---------------------
417
418As described above, while sometimes a maintainer may find that a kfunc must be
419changed or removed immediately to accommodate some changes in their subsystem,
420usually kfuncs will be able to accommodate a longer and more measured
421deprecation process. For example, if a new kfunc comes along which provides
422superior functionality to an existing kfunc, the existing kfunc may be
423deprecated for some period of time to allow users to migrate their BPF programs
424to use the new one. Or, if a kfunc has no known users, a decision may be made
425to remove the kfunc (without providing an alternative API) after some
426deprecation period so as to provide users with a window to notify the kfunc
427maintainer if it turns out that the kfunc is actually being used.
428
429It's expected that the common case will be that kfuncs will go through a
430deprecation period rather than being changed or removed without warning. As
431described in :ref:`KF_deprecated_flag`, the kfunc framework provides the
432KF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been
433deprecated. Once a kfunc has been marked with KF_DEPRECATED, the following
434procedure is followed for removal:
435
4361. Any relevant information for deprecated kfuncs is documented in the kfunc's
437   kernel docs. This documentation will typically include the kfunc's expected
438   remaining lifespan, a recommendation for new functionality that can replace
439   the usage of the deprecated function (or an explanation as to why no such
440   replacement exists), etc.
441
4422. The deprecated kfunc is kept in the kernel for some period of time after it
443   was first marked as deprecated. This time period will be chosen on a
444   case-by-case basis, and will typically depend on how widespread the use of
445   the kfunc is, how long it has been in the kernel, and how hard it is to move
446   to alternatives. This deprecation time period is "best effort", and as
447   described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may
448   sometimes dictate that the kfunc be removed before the full intended
449   deprecation period has elapsed.
450
4513. After the deprecation period the kfunc will be removed. At this point, BPF
452   programs calling the kfunc will be rejected by the verifier.
453
4544. Core kfuncs
455==============
456
457The BPF subsystem provides a number of "core" kfuncs that are potentially
458applicable to a wide variety of different possible use cases and programs.
459Those kfuncs are documented here.
460
4614.1 struct task_struct * kfuncs
462-------------------------------
463
464There are a number of kfuncs that allow ``struct task_struct *`` objects to be
465used as kptrs:
466
467.. kernel-doc:: kernel/bpf/helpers.c
468   :identifiers: bpf_task_acquire bpf_task_release
469
470These kfuncs are useful when you want to acquire or release a reference to a
471``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a
472struct_ops callback arg. For example:
473
474.. code-block:: c
475
476	/**
477	 * A trivial example tracepoint program that shows how to
478	 * acquire and release a struct task_struct * pointer.
479	 */
480	SEC("tp_btf/task_newtask")
481	int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags)
482	{
483		struct task_struct *acquired;
484
485		acquired = bpf_task_acquire(task);
486		if (acquired)
487			/*
488			 * In a typical program you'd do something like store
489			 * the task in a map, and the map will automatically
490			 * release it later. Here, we release it manually.
491			 */
492			bpf_task_release(acquired);
493		return 0;
494	}
495
496
497References acquired on ``struct task_struct *`` objects are RCU protected.
498Therefore, when in an RCU read region, you can obtain a pointer to a task
499embedded in a map value without having to acquire a reference:
500
501.. code-block:: c
502
503	#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
504	private(TASK) static struct task_struct *global;
505
506	/**
507	 * A trivial example showing how to access a task stored
508	 * in a map using RCU.
509	 */
510	SEC("tp_btf/task_newtask")
511	int BPF_PROG(task_rcu_read_example, struct task_struct *task, u64 clone_flags)
512	{
513		struct task_struct *local_copy;
514
515		bpf_rcu_read_lock();
516		local_copy = global;
517		if (local_copy)
518			/*
519			 * We could also pass local_copy to kfuncs or helper functions here,
520			 * as we're guaranteed that local_copy will be valid until we exit
521			 * the RCU read region below.
522			 */
523			bpf_printk("Global task %s is valid", local_copy->comm);
524		else
525			bpf_printk("No global task found");
526		bpf_rcu_read_unlock();
527
528		/* At this point we can no longer reference local_copy. */
529
530		return 0;
531	}
532
533----
534
535A BPF program can also look up a task from a pid. This can be useful if the
536caller doesn't have a trusted pointer to a ``struct task_struct *`` object that
537it can acquire a reference on with bpf_task_acquire().
538
539.. kernel-doc:: kernel/bpf/helpers.c
540   :identifiers: bpf_task_from_pid
541
542Here is an example of it being used:
543
544.. code-block:: c
545
546	SEC("tp_btf/task_newtask")
547	int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags)
548	{
549		struct task_struct *lookup;
550
551		lookup = bpf_task_from_pid(task->pid);
552		if (!lookup)
553			/* A task should always be found, as %task is a tracepoint arg. */
554			return -ENOENT;
555
556		if (lookup->pid != task->pid) {
557			/* bpf_task_from_pid() looks up the task via its
558			 * globally-unique pid from the init_pid_ns. Thus,
559			 * the pid of the lookup task should always be the
560			 * same as the input task.
561			 */
562			bpf_task_release(lookup);
563			return -EINVAL;
564		}
565
566		/* bpf_task_from_pid() returns an acquired reference,
567		 * so it must be dropped before returning from the
568		 * tracepoint handler.
569		 */
570		bpf_task_release(lookup);
571		return 0;
572	}
573
5744.2 struct cgroup * kfuncs
575--------------------------
576
577``struct cgroup *`` objects also have acquire and release functions:
578
579.. kernel-doc:: kernel/bpf/helpers.c
580   :identifiers: bpf_cgroup_acquire bpf_cgroup_release
581
582These kfuncs are used in exactly the same manner as bpf_task_acquire() and
583bpf_task_release() respectively, so we won't provide examples for them.
584
585----
586
587Other kfuncs available for interacting with ``struct cgroup *`` objects are
588bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access
589the ancestor of a cgroup and find a cgroup by its ID, respectively. Both
590return a cgroup kptr.
591
592.. kernel-doc:: kernel/bpf/helpers.c
593   :identifiers: bpf_cgroup_ancestor
594
595.. kernel-doc:: kernel/bpf/helpers.c
596   :identifiers: bpf_cgroup_from_id
597
598Eventually, BPF should be updated to allow this to happen with a normal memory
599load in the program itself. This is currently not possible without more work in
600the verifier. bpf_cgroup_ancestor() can be used as follows:
601
602.. code-block:: c
603
604	/**
605	 * Simple tracepoint example that illustrates how a cgroup's
606	 * ancestor can be accessed using bpf_cgroup_ancestor().
607	 */
608	SEC("tp_btf/cgroup_mkdir")
609	int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
610	{
611		struct cgroup *parent;
612
613		/* The parent cgroup resides at the level before the current cgroup's level. */
614		parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1);
615		if (!parent)
616			return -ENOENT;
617
618		bpf_printk("Parent id is %d", parent->self.id);
619
620		/* Return the parent cgroup that was acquired above. */
621		bpf_cgroup_release(parent);
622		return 0;
623	}
624
6254.3 struct cpumask * kfuncs
626---------------------------
627
628BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
629destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
630for more details.
631