Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1 |
|
#
55d49f75 |
| 01-Sep-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
The commit c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse"), refactored the bpf_{sk,task,inode}_storage_fr
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
The commit c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse"), refactored the bpf_{sk,task,inode}_storage_free() into bpf_local_storage_unlink_nolock() which then later renamed to bpf_local_storage_destroy(). The commit accidentally passed the "bool uncharge_mem = false" argument to bpf_selem_unlink_storage_nolock() which then stopped the uncharge from happening to the sk->sk_omem_alloc.
This missing uncharge only happens when the sk is going away (during __sk_destruct).
This patch fixes it by always passing "uncharge_mem = true". It is a noop to the task/inode/cgroup storage because they do not have the map_local_storage_(un)charge enabled in the map_ops. A followup patch will be done in bpf-next to remove the uncharge_mem argument.
A selftest is added in the next patch.
Fixes: c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-3-martin.lau@linux.dev
show more ...
|
#
a96a44ab |
| 01-Sep-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] ============================= [ 27.138122] [ BUG: Invalid wait c
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] ============================= [ 27.138122] [ BUG: Invalid wait context ] [ 27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O [ 27.139542] ----------------------------- [ 27.140106] test_progs/1729 is trying to lock: [ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130 [ 27.141834] other info that might help us debug this: [ 27.142437] context-{5:5} [ 27.142856] 2 locks held by test_progs/1729: [ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40 [ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0 [ 27.145855] stack backtrace: [ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a #247 [ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 27.149127] Call Trace: [ 27.149490] <TASK> [ 27.149867] dump_stack_lvl+0x130/0x1d0 [ 27.152609] dump_stack+0x14/0x20 [ 27.153131] __lock_acquire+0x1657/0x2220 [ 27.153677] lock_acquire+0x1b8/0x510 [ 27.157908] local_lock_acquire+0x29/0x130 [ 27.159048] obj_cgroup_charge+0xf4/0x3c0 [ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0 [ 27.161931] __kmem_cache_alloc_node+0x51/0x210 [ 27.163557] __kmalloc+0xaa/0x210 [ 27.164593] bpf_map_kzalloc+0xbc/0x170 [ 27.165147] bpf_selem_alloc+0x130/0x510 [ 27.166295] bpf_local_storage_update+0x5aa/0x8e0 [ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0 [ 27.169199] bpf_map_update_value+0x415/0x4f0 [ 27.169871] map_update_elem+0x413/0x550 [ 27.170330] __sys_bpf+0x5e9/0x640 [ 27.174065] __x64_sys_bpf+0x80/0x90 [ 27.174568] do_syscall_64+0x48/0xa0 [ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 27.175932] RIP: 0033:0x7effb40e41ad [ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8 [ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141 [ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad [ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002 [ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788 [ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000 [ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000 [ 27.184958] </TASK>
It complains about acquiring a local_lock while holding a raw_spin_lock. It means it should not allocate memory while holding a raw_spin_lock since it is not safe for RT.
raw_spin_lock is needed because bpf_local_storage supports tracing context. In particular for task local storage, it is easy to get a "current" task PTR_TO_BTF_ID in tracing bpf prog. However, task (and cgroup) local storage has already been moved to bpf mem allocator which can be used after raw_spin_lock.
The splat is for the sk storage. For sk (and inode) storage, it has not been moved to bpf mem allocator. Using raw_spin_lock or not, kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context. However, the local storage helper requires a verifier accepted sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running a bpf prog in a kzalloc unsafe context and also able to hold a verifier accepted sk pointer) could happen.
This patch avoids kzalloc after raw_spin_lock to silent the splat. There is an existing kzalloc before the raw_spin_lock. At that point, a kzalloc is very likely required because a lookup has just been done before. Thus, this patch always does the kzalloc before acquiring the raw_spin_lock and remove the later kzalloc usage after the raw_spin_lock. After this change, it will have a charge and then uncharge during the syscall bpf_map_update_elem() code path. This patch opts for simplicity and not continue the old optimization to save one charge and uncharge.
This issue is dated back to the very first commit of bpf_sk_storage which had been refactored multiple times to create task, inode, and cgroup storage. This patch uses a Fixes tag with a more recent commit that should be easier to do backport.
Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev
show more ...
|
Revision tags: v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
6c3eba1c |
| 13-Jun-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Centralize permissions checks for all BPF map types
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g.
bpf: Centralize permissions checks for all BPF map types
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants, as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit and easy to verify).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24 |
|
#
10fd5f70 |
| 12-Apr-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Handle NULL in bpf_local_storage_free.
During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and call to bpf_local_storage_free() with NULL pointer will cause a crash like: [ 2717
bpf: Handle NULL in bpf_local_storage_free.
During OOM bpf_local_storage_alloc() may fail to allocate 'storage' and call to bpf_local_storage_free() with NULL pointer will cause a crash like: [ 271718.917646] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 271719.019620] RIP: 0010:call_rcu+0x2d/0x240 [ 271719.216274] bpf_local_storage_alloc+0x19e/0x1e0 [ 271719.250121] bpf_local_storage_update+0x33b/0x740
Fixes: 7e30a8477b0b ("bpf: Add bpf_local_storage_free()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230412171252.15635-1-alexei.starovoitov@gmail.com
show more ...
|
Revision tags: v6.1.23, v6.1.22 |
|
#
6ae9d5e9 |
| 22-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Use bpf_mem_cache_alloc/free for bpf_local_storage
This patch uses bpf_mem_cache_alloc/free for allocating and freeing bpf_local_storage for task and cgroup storage.
The changes are similar to
bpf: Use bpf_mem_cache_alloc/free for bpf_local_storage
This patch uses bpf_mem_cache_alloc/free for allocating and freeing bpf_local_storage for task and cgroup storage.
The changes are similar to the previous patch. A few things that worth to mention for bpf_local_storage:
The local_storage is freed when the last selem is deleted. Before deleting a selem from local_storage, it needs to retrieve the local_storage->smap because the bpf_selem_unlink_storage_nolock() may have set it to NULL. Note that local_storage->smap may have already been NULL when the selem created this local_storage has been removed. In this case, call_rcu will be used to free the local_storage. Also, the bpf_ma (true or false) value is needed before calling bpf_local_storage_free(). The bpf_ma can either be obtained from the local_storage->smap (if available) or any of its selem's smap. A new helper check_storage_bpf_ma() is added to obtain bpf_ma for a deleting bpf_local_storage.
When bpf_local_storage_alloc getting a reused memory, all fields are either in the correct values or will be initialized. 'cache[]' must already be all NULLs. 'list' must be empty. Others will be initialized.
Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230322215246.1675516-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
08a7ce38 |
| 22-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
This patch uses bpf_mem_alloc for the task and cgroup local storage that the bpf prog can easily get a hold of the storage owner's PTR_TO_
bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem
This patch uses bpf_mem_alloc for the task and cgroup local storage that the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID. eg. bpf_get_current_task_btf() can be used in some of the kmalloc code path which will cause deadlock/recursion. bpf_mem_cache_alloc is deadlock free and will solve a legit use case in [1].
For sk storage, its batch creation benchmark shows a few percent regression when the sk create/destroy batch size is larger than 32. The sk creation/destruction happens much more often and depends on external traffic. Considering it is hypothetical to be able to cause deadlock with sk storage, it can cross the bridge to use bpf_mem_alloc till a legit (ie. useful) use case comes up.
For inode storage, bpf_local_storage_destroy() is called before waiting for a rcu gp and its memory cannot be reused immediately. inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp.
A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc(). Only task and cgroup storage have 'bpf_ma == true' which means to use bpf_mem_cache_alloc/free(). This patch only changes selem to use bpf_mem_alloc for task and cgroup. The next patch will change the local_storage to use bpf_mem_alloc also for task and cgroup.
Here is some more details on the changes:
* memory allocation: After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because bpf_mem_cache_alloc() could return a reused selem. It is to keep the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data is zero-ed. SDATA(selem)->data is the visible part to the bpf prog. No need to use zero_map_value() to do the zeroing because bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using the selem before returning the selem through bpf_mem_cache_free(). For the internal fields of selem, they will be initialized when linking to the new smap and the new local_storage.
When 'bpf_ma == false', nothing changes in this patch. It will stay with the bpf_map_kzalloc().
* memory free: The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle the bpf_ma == true case.
For the common selem free path where its owner is also being destroyed, the mem is freed in bpf_local_storage_destroy(), the owner (task and cgroup) has gone through a rcu gp. The memory can be reused immediately, so bpf_local_storage_destroy() will call bpf_selem_free(..., reuse_now = true) which will do bpf_mem_cache_free() for immediate reuse consideration.
An exception is the delete elem code path. The delete elem code path is called from the helper bpf_*_storage_delete() and the syscall bpf_map_delete_elem(). This path is an unusual case for local storage because the common use case is to have the local storage staying with its owner life time so that the bpf prog and the user space does not have to monitor the owner's destruction. For the delete elem path, the selem cannot be reused immediately because there could be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false) and it will wait for a rcu tasks trace gp before freeing the elem. The rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree().
When 'bpf_ma == false', it should be the same as before. __bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu(). A few words on the 'reuse_now == true'. When 'reuse_now == true', it is still racing with bpf_local_storage_map_free which is under rcu protection, so it still needs to wait for a rcu gp instead of kfree(). Otherwise, the selem may be reused by slab for a totally different struct while the bpf_local_storage_map_free() is still using it (as a rcu reader). For the inode case, there may be other rcu readers also. In short, when bpf_ma == false and reuse_now == true => vanilla rcu.
[1]: https://lore.kernel.org/bpf/20221118190109.1512674-1-namhyung@kernel.org/
Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230322215246.1675516-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16 |
|
#
7e30a847 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Add bpf_local_storage_free()
This patch refactors local_storage freeing logic into bpf_local_storage_free(). It is a preparation work for a later patch that uses bpf_mem_cache_alloc/free. The o
bpf: Add bpf_local_storage_free()
This patch refactors local_storage freeing logic into bpf_local_storage_free(). It is a preparation work for a later patch that uses bpf_mem_cache_alloc/free. The other kfree(local_storage) cases are also changed to bpf_local_storage_free(..., reuse_now = true).
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-12-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
1288aaa2 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Add bpf_local_storage_rcu callback
The existing bpf_local_storage_free_rcu is renamed to bpf_local_storage_free_trace_rcu. A new bpf_local_storage_rcu callback is added to do the kfree instead
bpf: Add bpf_local_storage_rcu callback
The existing bpf_local_storage_free_rcu is renamed to bpf_local_storage_free_trace_rcu. A new bpf_local_storage_rcu callback is added to do the kfree instead of using kfree_rcu. It is a preparation work for a later patch using bpf_mem_cache_alloc/free.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-11-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c0d63f30 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Add bpf_selem_free()
This patch refactors the selem freeing logic into bpf_selem_free(). It is a preparation work for a later patch using bpf_mem_cache_alloc/free. The other kfree(selem) cases
bpf: Add bpf_selem_free()
This patch refactors the selem freeing logic into bpf_selem_free(). It is a preparation work for a later patch using bpf_mem_cache_alloc/free. The other kfree(selem) cases are also changed to bpf_selem_free(..., reuse_now = true).
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-10-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
f8ccf30c |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Add bpf_selem_free_rcu callback
Add bpf_selem_free_rcu() callback to do the kfree() instead of using kfree_rcu. It is a preparation work for using bpf_mem_cache_alloc/free in a later patch.
Si
bpf: Add bpf_selem_free_rcu callback
Add bpf_selem_free_rcu() callback to do the kfree() instead of using kfree_rcu. It is a preparation work for using bpf_mem_cache_alloc/free in a later patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c6099813 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Remove bpf_selem_free_fields*_rcu
This patch removes the bpf_selem_free_fields*_rcu. The bpf_obj_free_fields() can be done before the call_rcu_trasks_trace() and kfree_rcu(). It is needed when
bpf: Remove bpf_selem_free_fields*_rcu
This patch removes the bpf_selem_free_fields*_rcu. The bpf_obj_free_fields() can be done before the call_rcu_trasks_trace() and kfree_rcu(). It is needed when a later patch uses bpf_mem_cache_alloc/free. In bpf hashtab, bpf_obj_free_fields() is also called before calling bpf_mem_cache_free. The discussion can be found in https://lore.kernel.org/bpf/f67021ee-21d9-bfae-6134-4ca542fab843@linux.dev/
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-8-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
a47eabf2 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to r
bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage
This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to reuse_now. Other than the boolean test is reversed, it should be a no-op.
The following explains the reason for the rename and how it will be used in a later patch.
In a later patch, bpf_mem_cache_alloc/free will be used in the bpf_local_storage. The bpf mem allocator will reuse the freed memory immediately. Some of the free paths in bpf_local_storage does not support memory to be reused immediately. These paths are the "delete" elem cases from the bpf_*_storage_delete() helper and the map_delete_elem() syscall. Note that "delete" elem before the owner's (sk/task/cgrp/inode) lifetime ended is not the common usage for the local storage.
The common free path, bpf_local_storage_destroy(), can reuse the memory immediately. This common path means the storage stays with its owner until the owner is destroyed.
The above mentioned "delete" elem paths that cannot reuse immediately always has the 'use_trace_rcu == true'. The cases that is safe for immediate reuse always have 'use_trace_rcu == false'. Instead of adding another arg in a later patch, this patch re-purpose this arg to reuse_now and have the test logic reversed.
In a later patch, 'reuse_now == true' will free to the bpf_mem_cache_free() where the memory can be reused immediately. 'reuse_now == false' will go through the call_rcu_tasks_trace().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
fc6652aa |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Remember smap in bpf_local_storage
This patch remembers which smap triggers the allocation of a 'struct bpf_local_storage' object. The local_storage is allocated during the very first selem add
bpf: Remember smap in bpf_local_storage
This patch remembers which smap triggers the allocation of a 'struct bpf_local_storage' object. The local_storage is allocated during the very first selem added to the owner. The smap pointer is needed when using the bpf_mem_cache_free in a later patch because it needs to free to the correct smap's bpf_mem_alloc object.
When a selem is being removed, it needs to check if it is the selem that triggers the creation of the local_storage. If it is, the local_storage->smap pointer will be reset to NULL. This NULL reset is done under the local_storage->lock in bpf_selem_unlink_storage_nolock() when a selem is being removed. Also note that the local_storage may not go away even local_storage->smap is NULL because there may be other selem still stored in the local_storage.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
121f31f3 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Remove the preceding __ from __bpf_selem_unlink_storage
__bpf_selem_unlink_storage is taking the spin lock and there is no name collision also. Having the preceding '__' is confusing when revie
bpf: Remove the preceding __ from __bpf_selem_unlink_storage
__bpf_selem_unlink_storage is taking the spin lock and there is no name collision also. Having the preceding '__' is confusing when reviewing the later patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
62827d61 |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Remove __bpf_local_storage_map_alloc
bpf_local_storage_map_alloc() is the only caller of __bpf_local_storage_map_alloc(). The remaining logic in bpf_local_storage_map_alloc() is only a one lin
bpf: Remove __bpf_local_storage_map_alloc
bpf_local_storage_map_alloc() is the only caller of __bpf_local_storage_map_alloc(). The remaining logic in bpf_local_storage_map_alloc() is only a one liner setting the smap->cache_idx.
Remove __bpf_local_storage_map_alloc() to simplify code.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
2ffcb6fc |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Refactor codes into bpf_local_storage_destroy
This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's
bpf: Refactor codes into bpf_local_storage_destroy
This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's owner (sk/task/cgrp/inode) is being kfree().
All bpf_local_storage_destroy's caller is taking the spin lock and then free the storage. This patch also moves these two steps into the bpf_local_storage_destroy.
This is a preparation work for a later patch that uses bpf_mem_cache_alloc/free in the bpf_local_storage.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
4cbd23cc |
| 08-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Move a few bpf_local_storage functions to static scope
This patch moves the bpf_local_storage_free_rcu() and bpf_selem_unlink_map() to static because they are not used outside of bpf_local_stor
bpf: Move a few bpf_local_storage functions to static scope
This patch moves the bpf_local_storage_free_rcu() and bpf_selem_unlink_map() to static because they are not used outside of bpf_local_storage.c.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
7490b7f1 |
| 05-Mar-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf, net: bpf_local_storage memory usage
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_
bpf, net: bpf_local_storage memory usage
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
e768e3c5 |
| 03-Mar-2023 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Use separate RCU callbacks for freeing selem
Martin suggested that instead of using a byte in the hole (which he has a use for in his future patch) in bpf_local_storage_elem, we can dispatch a
bpf: Use separate RCU callbacks for freeing selem
Martin suggested that instead of using a byte in the hole (which he has a use for in his future patch) in bpf_local_storage_elem, we can dispatch a different call_rcu callback based on whether we need to free special fields in bpf_local_storage_elem data. The free path, described in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"), only waits for call_rcu callbacks when there are special (kptrs, etc.) fields in the map value, hence it is necessary that we only access smap in this case.
Therefore, dispatch different RCU callbacks based on the BPF map has a valid btf_record, which dereference and use smap's btf_record only when it is valid.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230303141542.300068-1-memxor@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
Revision tags: v6.1.15 |
|
#
9db44fdd |
| 25-Feb-2023 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Support kptrs in local storage maps
Enable support for kptrs in local storage maps by wiring up the freeing of these kptrs from map value. Freeing of bpf_local_storage_map is only delayed in ca
bpf: Support kptrs in local storage maps
Enable support for kptrs in local storage maps by wiring up the freeing of these kptrs from map value. Freeing of bpf_local_storage_map is only delayed in case there are special fields, therefore bpf_selem_free_* path can also only dereference smap safely in that case. This is recorded using a bool utilizing a hole in bpF_local_storage_elem. It could have been tagged in the pointer value smap using the lowest bit (since alignment > 1), but since there was already a hole I went with the simpler option. Only the map structure freeing is delayed using RCU barriers, as the buckets aren't used when selem is being freed, so they can be freed once all readers of the bucket lists can no longer access it.
Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230225154010.391965-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.1.14, v6.1.13 |
|
#
0a09a2f9 |
| 21-Feb-2023 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Annotate data races in bpf_local_storage
There are a few cases where hlist_node is checked to be unhashed without holding the lock protecting its modification. In this case, one must use hlist_
bpf: Annotate data races in bpf_local_storage
There are a few cases where hlist_node is checked to be unhashed without holding the lock protecting its modification. In this case, one must use hlist_unhashed_lockless to avoid load tearing and KCSAN reports. Fix this by using lockless variant in places not protected by the lock.
Since this is not prompted by any actual KCSAN reports but only from code review, I have not included a fixes tag.
Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230221200646.2500777-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.2, v6.1.12 |
|
#
ddef81b5 |
| 10-Feb-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf: use bpf_map_kvcalloc in bpf_local_storage
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map inste
bpf: use bpf_map_kvcalloc in bpf_local_storage
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15 |
|
#
552d42a3 |
| 20-Dec-2022 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Reduce smap->elem_size
'struct bpf_local_storage_elem' has an unused 56 byte padding at the end due to struct's cache-line alignment requirement. This padding space is overlapped by storage val
bpf: Reduce smap->elem_size
'struct bpf_local_storage_elem' has an unused 56 byte padding at the end due to struct's cache-line alignment requirement. This padding space is overlapped by storage value contents, so if we use sizeof() to calculate the total size, we overinflate it by 56 bytes. Use offsetof() instead to calculate more exact memory use.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
show more ...
|
Revision tags: v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79 |
|
#
836e49e1 |
| 14-Nov-2022 |
Xu Kuohai <xukuohai@huawei.com> |
bpf: Do not copy spin lock field from user in bpf_selem_alloc
bpf_selem_alloc function is used by inode_storage, sk_storage and task_storage maps to set map value, for these map types, there may be
bpf: Do not copy spin lock field from user in bpf_selem_alloc
bpf_selem_alloc function is used by inode_storage, sk_storage and task_storage maps to set map value, for these map types, there may be a spin lock in the map value, so if we use memcpy to copy the whole map value from user, the spin lock field may be initialized incorrectly.
Since the spin lock field is zeroed by kzalloc, call copy_map_value instead of memcpy to skip copying the spin lock field to fix it.
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20221114134720.1057939-2-xukuohai@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
Revision tags: v6.0.8, v5.15.78 |
|
#
db559117 |
| 03-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Consolidate spin_lock, timer management into btf_record
Now that kptr_off_tab has been refactored into btf_record, and can hold more than one specific field type, accomodate bpf_spin_lock and b
bpf: Consolidate spin_lock, timer management into btf_record
Now that kptr_off_tab has been refactored into btf_record, and can hold more than one specific field type, accomodate bpf_spin_lock and bpf_timer as well.
While they don't require any more metadata than offset, having all special fields in one place allows us to share the same code for allocated user defined types and handle both map values and these allocated objects in a similar fashion.
As an optimization, we still keep spin_lock_off and timer_off offsets in the btf_record structure, just to avoid having to find the btf_field struct each time their offset is needed. This is mostly needed to manipulate such objects in a map value at runtime. It's ok to hardcode just one offset as more than one field is disallowed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|