1.. SPDX-License-Identifier: GPL-2.0 2 3.. _kfuncs-header-label: 4 5============================= 6BPF Kernel Functions (kfuncs) 7============================= 8 91. Introduction 10=============== 11 12BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux 13kernel which are exposed for use by BPF programs. Unlike normal BPF helpers, 14kfuncs do not have a stable interface and can change from one kernel release to 15another. Hence, BPF programs need to be updated in response to changes in the 16kernel. See :ref:`BPF_kfunc_lifecycle_expectations` for more information. 17 182. Defining a kfunc 19=================== 20 21There are two ways to expose a kernel function to BPF programs, either make an 22existing function in the kernel visible, or add a new wrapper for BPF. In both 23cases, care must be taken that BPF program can only call such function in a 24valid context. To enforce this, visibility of a kfunc can be per program type. 25 26If you are not creating a BPF wrapper for existing kernel function, skip ahead 27to :ref:`BPF_kfunc_nodef`. 28 292.1 Creating a wrapper kfunc 30---------------------------- 31 32When defining a wrapper kfunc, the wrapper function should have extern linkage. 33This prevents the compiler from optimizing away dead code, as this wrapper kfunc 34is not invoked anywhere in the kernel itself. It is not necessary to provide a 35prototype in a header for the wrapper kfunc. 36 37An example is given below:: 38 39 /* Disables missing prototype warnings */ 40 __diag_push(); 41 __diag_ignore_all("-Wmissing-prototypes", 42 "Global kfuncs as their definitions will be in BTF"); 43 44 __bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) 45 { 46 return find_get_task_by_vpid(nr); 47 } 48 49 __diag_pop(); 50 51A wrapper kfunc is often needed when we need to annotate parameters of the 52kfunc. Otherwise one may directly make the kfunc visible to the BPF program by 53registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. 54 552.2 Annotating kfunc parameters 56------------------------------- 57 58Similar to BPF helpers, there is sometime need for additional context required 59by the verifier to make the usage of kernel functions safer and more useful. 60Hence, we can annotate a parameter by suffixing the name of the argument of the 61kfunc with a __tag, where tag may be one of the supported annotations. 62 632.2.1 __sz Annotation 64--------------------- 65 66This annotation is used to indicate a memory and size pair in the argument list. 67An example is given below:: 68 69 __bpf_kfunc void bpf_memzero(void *mem, int mem__sz) 70 { 71 ... 72 } 73 74Here, the verifier will treat first argument as a PTR_TO_MEM, and second 75argument as its size. By default, without __sz annotation, the size of the type 76of the pointer is used. Without __sz annotation, a kfunc cannot accept a void 77pointer. 78 792.2.2 __k Annotation 80-------------------- 81 82This annotation is only understood for scalar arguments, where it indicates that 83the verifier must check the scalar argument to be a known constant, which does 84not indicate a size parameter, and the value of the constant is relevant to the 85safety of the program. 86 87An example is given below:: 88 89 __bpf_kfunc void *bpf_obj_new(u32 local_type_id__k, ...) 90 { 91 ... 92 } 93 94Here, bpf_obj_new uses local_type_id argument to find out the size of that type 95ID in program's BTF and return a sized pointer to it. Each type ID will have a 96distinct size, hence it is crucial to treat each such call as distinct when 97values don't match during verifier state pruning checks. 98 99Hence, whenever a constant scalar argument is accepted by a kfunc which is not a 100size parameter, and the value of the constant matters for program safety, __k 101suffix should be used. 102 1032.2.2 __uninit Annotation 104------------------------- 105 106This annotation is used to indicate that the argument will be treated as 107uninitialized. 108 109An example is given below:: 110 111 __bpf_kfunc int bpf_dynptr_from_skb(..., struct bpf_dynptr_kern *ptr__uninit) 112 { 113 ... 114 } 115 116Here, the dynptr will be treated as an uninitialized dynptr. Without this 117annotation, the verifier will reject the program if the dynptr passed in is 118not initialized. 119 120.. _BPF_kfunc_nodef: 121 1222.3 Using an existing kernel function 123------------------------------------- 124 125When an existing function in the kernel is fit for consumption by BPF programs, 126it can be directly registered with the BPF subsystem. However, care must still 127be taken to review the context in which it will be invoked by the BPF program 128and whether it is safe to do so. 129 1302.4 Annotating kfuncs 131--------------------- 132 133In addition to kfuncs' arguments, verifier may need more information about the 134type of kfunc(s) being registered with the BPF subsystem. To do so, we define 135flags on a set of kfuncs as follows:: 136 137 BTF_SET8_START(bpf_task_set) 138 BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 139 BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 140 BTF_SET8_END(bpf_task_set) 141 142This set encodes the BTF ID of each kfunc listed above, and encodes the flags 143along with it. Ofcourse, it is also allowed to specify no flags. 144 145kfunc definitions should also always be annotated with the ``__bpf_kfunc`` 146macro. This prevents issues such as the compiler inlining the kfunc if it's a 147static kernel function, or the function being elided in an LTO build as it's 148not used in the rest of the kernel. Developers should not manually add 149annotations to their kfunc to prevent these issues. If an annotation is 150required to prevent such an issue with your kfunc, it is a bug and should be 151added to the definition of the macro so that other kfuncs are similarly 152protected. An example is given below:: 153 154 __bpf_kfunc struct task_struct *bpf_get_task_pid(s32 pid) 155 { 156 ... 157 } 158 1592.4.1 KF_ACQUIRE flag 160--------------------- 161 162The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a 163refcounted object. The verifier will then ensure that the pointer to the object 164is eventually released using a release kfunc, or transferred to a map using a 165referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the 166loading of the BPF program until no lingering references remain in all possible 167explored states of the program. 168 1692.4.2 KF_RET_NULL flag 170---------------------- 171 172The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc 173may be NULL. Hence, it forces the user to do a NULL check on the pointer 174returned from the kfunc before making use of it (dereferencing or passing to 175another helper). This flag is often used in pairing with KF_ACQUIRE flag, but 176both are orthogonal to each other. 177 1782.4.3 KF_RELEASE flag 179--------------------- 180 181The KF_RELEASE flag is used to indicate that the kfunc releases the pointer 182passed in to it. There can be only one referenced pointer that can be passed 183in. All copies of the pointer being released are invalidated as a result of 184invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the 185protection afforded by the KF_TRUSTED_ARGS flag described below. 186 1872.4.4 KF_KPTR_GET flag 188---------------------- 189 190The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument 191as a pointer to kptr, safely increments the refcount of the object it points to, 192and returns a reference to the user. The rest of the arguments may be normal 193arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with 194KF_ACQUIRE and KF_RET_NULL flags. 195 1962.4.5 KF_TRUSTED_ARGS flag 197-------------------------- 198 199The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It 200indicates that the all pointer arguments are valid, and that all pointers to 201BTF objects have been passed in their unmodified form (that is, at a zero 202offset, and without having been obtained from walking another pointer, with one 203exception described below). 204 205There are two types of pointers to kernel objects which are considered "valid": 206 2071. Pointers which are passed as tracepoint or struct_ops callback arguments. 2082. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc. 209 210Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to 211KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset. 212 213The definition of "valid" pointers is subject to change at any time, and has 214absolutely no ABI stability guarantees. 215 216As mentioned above, a nested pointer obtained from walking a trusted pointer is 217no longer trusted, with one exception. If a struct type has a field that is 218guaranteed to be valid as long as its parent pointer is trusted, the 219``BTF_TYPE_SAFE_NESTED`` macro can be used to express that to the verifier as 220follows: 221 222.. code-block:: c 223 224 BTF_TYPE_SAFE_NESTED(struct task_struct) { 225 const cpumask_t *cpus_ptr; 226 }; 227 228In other words, you must: 229 2301. Wrap the trusted pointer type in the ``BTF_TYPE_SAFE_NESTED`` macro. 231 2322. Specify the type and name of the trusted nested field. This field must match 233 the field in the original type definition exactly. 234 2352.4.6 KF_SLEEPABLE flag 236----------------------- 237 238The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only 239be called by sleepable BPF programs (BPF_F_SLEEPABLE). 240 2412.4.7 KF_DESTRUCTIVE flag 242-------------------------- 243 244The KF_DESTRUCTIVE flag is used to indicate functions calling which is 245destructive to the system. For example such a call can result in system 246rebooting or panicking. Due to this additional restrictions apply to these 247calls. At the moment they only require CAP_SYS_BOOT capability, but more can be 248added later. 249 2502.4.8 KF_RCU flag 251----------------- 252 253The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with 254KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees 255that the objects are valid and there is no use-after-free. The pointers are not 256NULL, but the object's refcount could have reached zero. The kfuncs need to 257consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE 258pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely 259also be KF_RET_NULL. 260 261.. _KF_deprecated_flag: 262 2632.4.9 KF_DEPRECATED flag 264------------------------ 265 266The KF_DEPRECATED flag is used for kfuncs which are scheduled to be 267changed or removed in a subsequent kernel release. A kfunc that is 268marked with KF_DEPRECATED should also have any relevant information 269captured in its kernel doc. Such information typically includes the 270kfunc's expected remaining lifespan, a recommendation for new 271functionality that can replace it if any is available, and possibly a 272rationale for why it is being removed. 273 274Note that while on some occasions, a KF_DEPRECATED kfunc may continue to be 275supported and have its KF_DEPRECATED flag removed, it is likely to be far more 276difficult to remove a KF_DEPRECATED flag after it's been added than it is to 277prevent it from being added in the first place. As described in 278:ref:`BPF_kfunc_lifecycle_expectations`, users that rely on specific kfuncs are 279encouraged to make their use-cases known as early as possible, and participate 280in upstream discussions regarding whether to keep, change, deprecate, or remove 281those kfuncs if and when such discussions occur. 282 2832.5 Registering the kfuncs 284-------------------------- 285 286Once the kfunc is prepared for use, the final step to making it visible is 287registering it with the BPF subsystem. Registration is done per BPF program 288type. An example is shown below:: 289 290 BTF_SET8_START(bpf_task_set) 291 BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) 292 BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) 293 BTF_SET8_END(bpf_task_set) 294 295 static const struct btf_kfunc_id_set bpf_task_kfunc_set = { 296 .owner = THIS_MODULE, 297 .set = &bpf_task_set, 298 }; 299 300 static int init_subsystem(void) 301 { 302 return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set); 303 } 304 late_initcall(init_subsystem); 305 3062.6 Specifying no-cast aliases with ___init 307-------------------------------------------- 308 309The verifier will always enforce that the BTF type of a pointer passed to a 310kfunc by a BPF program, matches the type of pointer specified in the kfunc 311definition. The verifier, does, however, allow types that are equivalent 312according to the C standard to be passed to the same kfunc arg, even if their 313BTF_IDs differ. 314 315For example, for the following type definition: 316 317.. code-block:: c 318 319 struct bpf_cpumask { 320 cpumask_t cpumask; 321 refcount_t usage; 322 }; 323 324The verifier would allow a ``struct bpf_cpumask *`` to be passed to a kfunc 325taking a ``cpumask_t *`` (which is a typedef of ``struct cpumask *``). For 326instance, both ``struct cpumask *`` and ``struct bpf_cpmuask *`` can be passed 327to bpf_cpumask_test_cpu(). 328 329In some cases, this type-aliasing behavior is not desired. ``struct 330nf_conn___init`` is one such example: 331 332.. code-block:: c 333 334 struct nf_conn___init { 335 struct nf_conn ct; 336 }; 337 338The C standard would consider these types to be equivalent, but it would not 339always be safe to pass either type to a trusted kfunc. ``struct 340nf_conn___init`` represents an allocated ``struct nf_conn`` object that has 341*not yet been initialized*, so it would therefore be unsafe to pass a ``struct 342nf_conn___init *`` to a kfunc that's expecting a fully initialized ``struct 343nf_conn *`` (e.g. ``bpf_ct_change_timeout()``). 344 345In order to accommodate such requirements, the verifier will enforce strict 346PTR_TO_BTF_ID type matching if two types have the exact same name, with one 347being suffixed with ``___init``. 348 349.. _BPF_kfunc_lifecycle_expectations: 350 3513. kfunc lifecycle expectations 352=============================== 353 354kfuncs provide a kernel <-> kernel API, and thus are not bound by any of the 355strict stability restrictions associated with kernel <-> user UAPIs. This means 356they can be thought of as similar to EXPORT_SYMBOL_GPL, and can therefore be 357modified or removed by a maintainer of the subsystem they're defined in when 358it's deemed necessary. 359 360Like any other change to the kernel, maintainers will not change or remove a 361kfunc without having a reasonable justification. Whether or not they'll choose 362to change a kfunc will ultimately depend on a variety of factors, such as how 363widely used the kfunc is, how long the kfunc has been in the kernel, whether an 364alternative kfunc exists, what the norm is in terms of stability for the 365subsystem in question, and of course what the technical cost is of continuing 366to support the kfunc. 367 368There are several implications of this: 369 370a) kfuncs that are widely used or have been in the kernel for a long time will 371 be more difficult to justify being changed or removed by a maintainer. In 372 other words, kfuncs that are known to have a lot of users and provide 373 significant value provide stronger incentives for maintainers to invest the 374 time and complexity in supporting them. It is therefore important for 375 developers that are using kfuncs in their BPF programs to communicate and 376 explain how and why those kfuncs are being used, and to participate in 377 discussions regarding those kfuncs when they occur upstream. 378 379b) Unlike regular kernel symbols marked with EXPORT_SYMBOL_GPL, BPF programs 380 that call kfuncs are generally not part of the kernel tree. This means that 381 refactoring cannot typically change callers in-place when a kfunc changes, 382 as is done for e.g. an upstreamed driver being updated in place when a 383 kernel symbol is changed. 384 385 Unlike with regular kernel symbols, this is expected behavior for BPF 386 symbols, and out-of-tree BPF programs that use kfuncs should be considered 387 relevant to discussions and decisions around modifying and removing those 388 kfuncs. The BPF community will take an active role in participating in 389 upstream discussions when necessary to ensure that the perspectives of such 390 users are taken into account. 391 392c) A kfunc will never have any hard stability guarantees. BPF APIs cannot and 393 will not ever hard-block a change in the kernel purely for stability 394 reasons. That being said, kfuncs are features that are meant to solve 395 problems and provide value to users. The decision of whether to change or 396 remove a kfunc is a multivariate technical decision that is made on a 397 case-by-case basis, and which is informed by data points such as those 398 mentioned above. It is expected that a kfunc being removed or changed with 399 no warning will not be a common occurrence or take place without sound 400 justification, but it is a possibility that must be accepted if one is to 401 use kfuncs. 402 4033.1 kfunc deprecation 404--------------------- 405 406As described above, while sometimes a maintainer may find that a kfunc must be 407changed or removed immediately to accommodate some changes in their subsystem, 408usually kfuncs will be able to accommodate a longer and more measured 409deprecation process. For example, if a new kfunc comes along which provides 410superior functionality to an existing kfunc, the existing kfunc may be 411deprecated for some period of time to allow users to migrate their BPF programs 412to use the new one. Or, if a kfunc has no known users, a decision may be made 413to remove the kfunc (without providing an alternative API) after some 414deprecation period so as to provide users with a window to notify the kfunc 415maintainer if it turns out that the kfunc is actually being used. 416 417It's expected that the common case will be that kfuncs will go through a 418deprecation period rather than being changed or removed without warning. As 419described in :ref:`KF_deprecated_flag`, the kfunc framework provides the 420KF_DEPRECATED flag to kfunc developers to signal to users that a kfunc has been 421deprecated. Once a kfunc has been marked with KF_DEPRECATED, the following 422procedure is followed for removal: 423 4241. Any relevant information for deprecated kfuncs is documented in the kfunc's 425 kernel docs. This documentation will typically include the kfunc's expected 426 remaining lifespan, a recommendation for new functionality that can replace 427 the usage of the deprecated function (or an explanation as to why no such 428 replacement exists), etc. 429 4302. The deprecated kfunc is kept in the kernel for some period of time after it 431 was first marked as deprecated. This time period will be chosen on a 432 case-by-case basis, and will typically depend on how widespread the use of 433 the kfunc is, how long it has been in the kernel, and how hard it is to move 434 to alternatives. This deprecation time period is "best effort", and as 435 described :ref:`above<BPF_kfunc_lifecycle_expectations>`, circumstances may 436 sometimes dictate that the kfunc be removed before the full intended 437 deprecation period has elapsed. 438 4393. After the deprecation period the kfunc will be removed. At this point, BPF 440 programs calling the kfunc will be rejected by the verifier. 441 4424. Core kfuncs 443============== 444 445The BPF subsystem provides a number of "core" kfuncs that are potentially 446applicable to a wide variety of different possible use cases and programs. 447Those kfuncs are documented here. 448 4494.1 struct task_struct * kfuncs 450------------------------------- 451 452There are a number of kfuncs that allow ``struct task_struct *`` objects to be 453used as kptrs: 454 455.. kernel-doc:: kernel/bpf/helpers.c 456 :identifiers: bpf_task_acquire bpf_task_release 457 458These kfuncs are useful when you want to acquire or release a reference to a 459``struct task_struct *`` that was passed as e.g. a tracepoint arg, or a 460struct_ops callback arg. For example: 461 462.. code-block:: c 463 464 /** 465 * A trivial example tracepoint program that shows how to 466 * acquire and release a struct task_struct * pointer. 467 */ 468 SEC("tp_btf/task_newtask") 469 int BPF_PROG(task_acquire_release_example, struct task_struct *task, u64 clone_flags) 470 { 471 struct task_struct *acquired; 472 473 acquired = bpf_task_acquire(task); 474 if (acquired) 475 /* 476 * In a typical program you'd do something like store 477 * the task in a map, and the map will automatically 478 * release it later. Here, we release it manually. 479 */ 480 bpf_task_release(acquired); 481 return 0; 482 } 483 484 485References acquired on ``struct task_struct *`` objects are RCU protected. 486Therefore, when in an RCU read region, you can obtain a pointer to a task 487embedded in a map value without having to acquire a reference: 488 489.. code-block:: c 490 491 #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8))) 492 private(TASK) static struct task_struct *global; 493 494 /** 495 * A trivial example showing how to access a task stored 496 * in a map using RCU. 497 */ 498 SEC("tp_btf/task_newtask") 499 int BPF_PROG(task_rcu_read_example, struct task_struct *task, u64 clone_flags) 500 { 501 struct task_struct *local_copy; 502 503 bpf_rcu_read_lock(); 504 local_copy = global; 505 if (local_copy) 506 /* 507 * We could also pass local_copy to kfuncs or helper functions here, 508 * as we're guaranteed that local_copy will be valid until we exit 509 * the RCU read region below. 510 */ 511 bpf_printk("Global task %s is valid", local_copy->comm); 512 else 513 bpf_printk("No global task found"); 514 bpf_rcu_read_unlock(); 515 516 /* At this point we can no longer reference local_copy. */ 517 518 return 0; 519 } 520 521---- 522 523A BPF program can also look up a task from a pid. This can be useful if the 524caller doesn't have a trusted pointer to a ``struct task_struct *`` object that 525it can acquire a reference on with bpf_task_acquire(). 526 527.. kernel-doc:: kernel/bpf/helpers.c 528 :identifiers: bpf_task_from_pid 529 530Here is an example of it being used: 531 532.. code-block:: c 533 534 SEC("tp_btf/task_newtask") 535 int BPF_PROG(task_get_pid_example, struct task_struct *task, u64 clone_flags) 536 { 537 struct task_struct *lookup; 538 539 lookup = bpf_task_from_pid(task->pid); 540 if (!lookup) 541 /* A task should always be found, as %task is a tracepoint arg. */ 542 return -ENOENT; 543 544 if (lookup->pid != task->pid) { 545 /* bpf_task_from_pid() looks up the task via its 546 * globally-unique pid from the init_pid_ns. Thus, 547 * the pid of the lookup task should always be the 548 * same as the input task. 549 */ 550 bpf_task_release(lookup); 551 return -EINVAL; 552 } 553 554 /* bpf_task_from_pid() returns an acquired reference, 555 * so it must be dropped before returning from the 556 * tracepoint handler. 557 */ 558 bpf_task_release(lookup); 559 return 0; 560 } 561 5624.2 struct cgroup * kfuncs 563-------------------------- 564 565``struct cgroup *`` objects also have acquire and release functions: 566 567.. kernel-doc:: kernel/bpf/helpers.c 568 :identifiers: bpf_cgroup_acquire bpf_cgroup_release 569 570These kfuncs are used in exactly the same manner as bpf_task_acquire() and 571bpf_task_release() respectively, so we won't provide examples for them. 572 573---- 574 575Other kfuncs available for interacting with ``struct cgroup *`` objects are 576bpf_cgroup_ancestor() and bpf_cgroup_from_id(), allowing callers to access 577the ancestor of a cgroup and find a cgroup by its ID, respectively. Both 578return a cgroup kptr. 579 580.. kernel-doc:: kernel/bpf/helpers.c 581 :identifiers: bpf_cgroup_ancestor 582 583.. kernel-doc:: kernel/bpf/helpers.c 584 :identifiers: bpf_cgroup_from_id 585 586Eventually, BPF should be updated to allow this to happen with a normal memory 587load in the program itself. This is currently not possible without more work in 588the verifier. bpf_cgroup_ancestor() can be used as follows: 589 590.. code-block:: c 591 592 /** 593 * Simple tracepoint example that illustrates how a cgroup's 594 * ancestor can be accessed using bpf_cgroup_ancestor(). 595 */ 596 SEC("tp_btf/cgroup_mkdir") 597 int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path) 598 { 599 struct cgroup *parent; 600 601 /* The parent cgroup resides at the level before the current cgroup's level. */ 602 parent = bpf_cgroup_ancestor(cgrp, cgrp->level - 1); 603 if (!parent) 604 return -ENOENT; 605 606 bpf_printk("Parent id is %d", parent->self.id); 607 608 /* Return the parent cgroup that was acquired above. */ 609 bpf_cgroup_release(parent); 610 return 0; 611 } 612 6134.3 struct cpumask * kfuncs 614--------------------------- 615 616BPF provides a set of kfuncs that can be used to query, allocate, mutate, and 617destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label` 618for more details. 619