xref: /openbmc/linux/Documentation/bpf/kfuncs.rst (revision a48acad7)
1=============================
2BPF Kernel Functions (kfuncs)
3=============================
4
51. Introduction
6===============
7
8BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux
9kernel which are exposed for use by BPF programs. Unlike normal BPF helpers,
10kfuncs do not have a stable interface and can change from one kernel release to
11another. Hence, BPF programs need to be updated in response to changes in the
12kernel.
13
142. Defining a kfunc
15===================
16
17There are two ways to expose a kernel function to BPF programs, either make an
18existing function in the kernel visible, or add a new wrapper for BPF. In both
19cases, care must be taken that BPF program can only call such function in a
20valid context. To enforce this, visibility of a kfunc can be per program type.
21
22If you are not creating a BPF wrapper for existing kernel function, skip ahead
23to :ref:`BPF_kfunc_nodef`.
24
252.1 Creating a wrapper kfunc
26----------------------------
27
28When defining a wrapper kfunc, the wrapper function should have extern linkage.
29This prevents the compiler from optimizing away dead code, as this wrapper kfunc
30is not invoked anywhere in the kernel itself. It is not necessary to provide a
31prototype in a header for the wrapper kfunc.
32
33An example is given below::
34
35        /* Disables missing prototype warnings */
36        __diag_push();
37        __diag_ignore_all("-Wmissing-prototypes",
38                          "Global kfuncs as their definitions will be in BTF");
39
40        struct task_struct *bpf_find_get_task_by_vpid(pid_t nr)
41        {
42                return find_get_task_by_vpid(nr);
43        }
44
45        __diag_pop();
46
47A wrapper kfunc is often needed when we need to annotate parameters of the
48kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
49registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
50
512.2 Annotating kfunc parameters
52-------------------------------
53
54Similar to BPF helpers, there is sometime need for additional context required
55by the verifier to make the usage of kernel functions safer and more useful.
56Hence, we can annotate a parameter by suffixing the name of the argument of the
57kfunc with a __tag, where tag may be one of the supported annotations.
58
592.2.1 __sz Annotation
60---------------------
61
62This annotation is used to indicate a memory and size pair in the argument list.
63An example is given below::
64
65        void bpf_memzero(void *mem, int mem__sz)
66        {
67        ...
68        }
69
70Here, the verifier will treat first argument as a PTR_TO_MEM, and second
71argument as its size. By default, without __sz annotation, the size of the type
72of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
73pointer.
74
752.2.2 __k Annotation
76--------------------
77
78This annotation is only understood for scalar arguments, where it indicates that
79the verifier must check the scalar argument to be a known constant, which does
80not indicate a size parameter, and the value of the constant is relevant to the
81safety of the program.
82
83An example is given below::
84
85        void *bpf_obj_new(u32 local_type_id__k, ...)
86        {
87        ...
88        }
89
90Here, bpf_obj_new uses local_type_id argument to find out the size of that type
91ID in program's BTF and return a sized pointer to it. Each type ID will have a
92distinct size, hence it is crucial to treat each such call as distinct when
93values don't match during verifier state pruning checks.
94
95Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
96size parameter, and the value of the constant matters for program safety, __k
97suffix should be used.
98
99.. _BPF_kfunc_nodef:
100
1012.3 Using an existing kernel function
102-------------------------------------
103
104When an existing function in the kernel is fit for consumption by BPF programs,
105it can be directly registered with the BPF subsystem. However, care must still
106be taken to review the context in which it will be invoked by the BPF program
107and whether it is safe to do so.
108
1092.4 Annotating kfuncs
110---------------------
111
112In addition to kfuncs' arguments, verifier may need more information about the
113type of kfunc(s) being registered with the BPF subsystem. To do so, we define
114flags on a set of kfuncs as follows::
115
116        BTF_SET8_START(bpf_task_set)
117        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
118        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
119        BTF_SET8_END(bpf_task_set)
120
121This set encodes the BTF ID of each kfunc listed above, and encodes the flags
122along with it. Ofcourse, it is also allowed to specify no flags.
123
1242.4.1 KF_ACQUIRE flag
125---------------------
126
127The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
128refcounted object. The verifier will then ensure that the pointer to the object
129is eventually released using a release kfunc, or transferred to a map using a
130referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
131loading of the BPF program until no lingering references remain in all possible
132explored states of the program.
133
1342.4.2 KF_RET_NULL flag
135----------------------
136
137The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
138may be NULL. Hence, it forces the user to do a NULL check on the pointer
139returned from the kfunc before making use of it (dereferencing or passing to
140another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
141both are orthogonal to each other.
142
1432.4.3 KF_RELEASE flag
144---------------------
145
146The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
147passed in to it. There can be only one referenced pointer that can be passed in.
148All copies of the pointer being released are invalidated as a result of invoking
149kfunc with this flag.
150
1512.4.4 KF_KPTR_GET flag
152----------------------
153
154The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
155as a pointer to kptr, safely increments the refcount of the object it points to,
156and returns a reference to the user. The rest of the arguments may be normal
157arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
158KF_ACQUIRE and KF_RET_NULL flags.
159
1602.4.5 KF_TRUSTED_ARGS flag
161--------------------------
162
163The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
164indicates that the all pointer arguments are valid, and that all pointers to
165BTF objects have been passed in their unmodified form (that is, at a zero
166offset, and without having been obtained from walking another pointer).
167
168There are two types of pointers to kernel objects which are considered "valid":
169
1701. Pointers which are passed as tracepoint or struct_ops callback arguments.
1712. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
172
173Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
174KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
175
176The definition of "valid" pointers is subject to change at any time, and has
177absolutely no ABI stability guarantees.
178
1792.4.6 KF_SLEEPABLE flag
180-----------------------
181
182The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
183be called by sleepable BPF programs (BPF_F_SLEEPABLE).
184
1852.4.7 KF_DESTRUCTIVE flag
186--------------------------
187
188The KF_DESTRUCTIVE flag is used to indicate functions calling which is
189destructive to the system. For example such a call can result in system
190rebooting or panicking. Due to this additional restrictions apply to these
191calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
192added later.
193
1942.5 Registering the kfuncs
195--------------------------
196
197Once the kfunc is prepared for use, the final step to making it visible is
198registering it with the BPF subsystem. Registration is done per BPF program
199type. An example is shown below::
200
201        BTF_SET8_START(bpf_task_set)
202        BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL)
203        BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE)
204        BTF_SET8_END(bpf_task_set)
205
206        static const struct btf_kfunc_id_set bpf_task_kfunc_set = {
207                .owner = THIS_MODULE,
208                .set   = &bpf_task_set,
209        };
210
211        static int init_subsystem(void)
212        {
213                return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set);
214        }
215        late_initcall(init_subsystem);
216