9e67b054 | 18-Oct-2024 |
Chen Ridong <chenridong@huawei.com> |
cgroup/bpf: only cgroup v2 can be attached by bpf programs
[ Upstream commit 2190df6c91373fdec6db9fc07e427084f232f57e ]
Only cgroup v2 can be attached by bpf programs, so this patch introduces that
cgroup/bpf: only cgroup v2 can be attached by bpf programs
[ Upstream commit 2190df6c91373fdec6db9fc07e427084f232f57e ]
Only cgroup v2 can be attached by bpf programs, so this patch introduces that cgroup_bpf_inherit and cgroup_bpf_offline can only be called in cgroup v2, and this can fix the memleak mentioned by commit 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline"), which has been reverted.
Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Link: https://lore.kernel.org/cgroups/aka2hk5jsel5zomucpwlxsej6iwnfw4qu5jkrmjhyfhesjlfdw@46zxhg5bdnr7/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
96226fbe | 27-Jun-2024 |
Chen Ridong <chenridong@huawei.com> |
cgroup/cpuset: Prevent UAF in proc_cpuset_show()
[ Upstream commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a ]
An UAF can happen when /proc/cpuset is read as reported in [1].
This can be reproduced
cgroup/cpuset: Prevent UAF in proc_cpuset_show()
[ Upstream commit 1be59c97c83ccd67a519d8a49486b3a8a73ca28a ]
An UAF can happen when /proc/cpuset is read as reported in [1].
This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly.
The race that cause this bug can be shown as below:
(umount) | (cat /proc/<pid>/cpuset) css_release | proc_cpuset_show css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root | mutex_lock(&cgroup_mutex); rebind_subsystems | cgroup_free_root | | // cgrp was freed, UAF | cgroup_path_ns_locked(cgrp,..);
When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp.
The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed.
Fix this problem by using rcu_read_lock in proc_cpuset_show(). As cgroup_root is kfree_rcu after commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe"), css->cgroup won't be freed during the critical section. To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to replace task_get_css with task_css.
[1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
78d44b82 | 17-Aug-2023 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
cgroup: Avoid -Wstringop-overflow warnings
Change the notation from pointer-to-array to pointer-to-pointer. With this, we avoid the compiler complaining about trying to access a region of size zero
cgroup: Avoid -Wstringop-overflow warnings
Change the notation from pointer-to-array to pointer-to-pointer. With this, we avoid the compiler complaining about trying to access a region of size zero as an argument during function calls.
This is a workaround to prevent the compiler complaining about accessing an array of size zero when evaluating the arguments of a couple of function calls. See below:
kernel/cgroup/cgroup.c: In function 'find_css_set': kernel/cgroup/cgroup.c:1206:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] 1206 | cset = find_existing_css_set(old_cset, cgrp, template); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/cgroup/cgroup.c:1206:16: note: referencing argument 3 of type 'struct cgroup_subsys_state *[0]' kernel/cgroup/cgroup.c:1071:24: note: in a call to function 'find_existing_css_set' 1071 | static struct css_set *find_existing_css_set(struct css_set *old_cset, | ^~~~~~~~~~~~~~~~~~~~~
With the change to pointer-to-pointer, the functions are not prevented from being executed, and they will do what they have to do when CGROUP_SUBSYS_COUNT == 0.
Address the following -Wstringop-overflow warnings seen when built with ARM architecture and aspeed_g4_defconfig configuration (notice that under this configuration CGROUP_SUBSYS_COUNT == 0):
kernel/cgroup/cgroup.c:1208:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:1258:15: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6089:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6153:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=]
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/316 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
82b90b6c | 10-Aug-2023 |
Lu Jialin <lujialin4@huawei.com> |
cgroup:namespace: Remove unused cgroup_namespaces_init()
cgroup_namspace_init() just return 0. Therefore, there is no need to call it during start_kernel. Just remove it.
Fixes: a79a908fd2b0 ("cgro
cgroup:namespace: Remove unused cgroup_namespaces_init()
cgroup_namspace_init() just return 0. Therefore, there is no need to call it during start_kernel. Just remove it.
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
0437719c | 06-Aug-2023 |
Hao Jia <jiahao.os@bytedance.com> |
cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants
The member variable bstat of the structure cgroup_rstat_cpu records the per-cpu time of the cgroup itself, but does not
cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants
The member variable bstat of the structure cgroup_rstat_cpu records the per-cpu time of the cgroup itself, but does not include the per-cpu time of its descendants. The per-cpu time including descendants is very useful for calculating the per-cpu usage of cgroups.
Although we can indirectly obtain the total per-cpu time of the cgroup and its descendants by accumulating the per-cpu bstat of each descendant of the cgroup. But after a child cgroup is removed, we will lose its bstat information. This will cause the cumulative value to be non-monotonic, thus affecting the accuracy of cgroup per-cpu usage.
So we add the subtree_bstat variable to record the total per-cpu time of this cgroup and its descendants, which is similar to "cpuacct.usage*" in cgroup v1. And this is also helpful for the migration from cgroup v1 to cgroup v2. After adding this variable, we can obtain the per-cpu time of cgroup and its descendants in user mode through eBPF/drgn, etc. And we are still trying to determine how to expose it in the cgroupfs interface.
Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Jia <jiahao.os@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
e7e64a1b | 07-Aug-2023 |
Miaohe Lin <linmiaohe@huawei.com> |
cgroup: clean up if condition in cgroup_pidlist_start()
There's no need to use '<=' when knowing 'l->list[mid] != pid' already. No functional change intended.
Signed-off-by: Miaohe Lin <linmiaohe@h
cgroup: clean up if condition in cgroup_pidlist_start()
There's no need to use '<=' when knowing 'l->list[mid] != pid' already. No functional change intended.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
7f828eac | 03-Aug-2023 |
Miaohe Lin <linmiaohe@huawei.com> |
cgroup: fix obsolete function name in cgroup_destroy_locked()
Since commit e76ecaeef65c ("cgroup: use cgroup_kn_lock_live() in other cgroup kernfs methods"), cgroup_kn_lock_live() is used in cgroup
cgroup: fix obsolete function name in cgroup_destroy_locked()
Since commit e76ecaeef65c ("cgroup: use cgroup_kn_lock_live() in other cgroup kernfs methods"), cgroup_kn_lock_live() is used in cgroup kernfs methods. Update corresponding comment.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
a2c15fec | 01-Aug-2023 |
Miaohe Lin <linmiaohe@huawei.com> |
cgroup: fix obsolete function name above css_free_rwork_fn()
Since commit 8f36aaec9c92 ("cgroup: Use rcu_work instead of explicit rcu and work item"), css_free_work_fn has been renamed to css_free_r
cgroup: fix obsolete function name above css_free_rwork_fn()
Since commit 8f36aaec9c92 ("cgroup: Use rcu_work instead of explicit rcu and work item"), css_free_work_fn has been renamed to css_free_rwork_fn. Update corresponding comment.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
05f76ae9 | 01-Aug-2023 |
Cai Xinchen <caixinchen1@huawei.com> |
cgroup/cpuset: fix kernel-doc
Add kernel-doc of param @rotor to fix warnings:
kernel/cgroup/cpuset.c:4162: warning: Function parameter or member 'rotor' not described in 'cpuset_spread_node' kernel
cgroup/cpuset: fix kernel-doc
Add kernel-doc of param @rotor to fix warnings:
kernel/cgroup/cpuset.c:4162: warning: Function parameter or member 'rotor' not described in 'cpuset_spread_node' kernel/cgroup/cpuset.c:3771: warning: Function parameter or member 'work' not described in 'cpuset_hotplug_workfn'
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
a3fdeeb3 | 19-Jul-2023 |
Miaohe Lin <linmiaohe@huawei.com> |
cgroup: fix obsolete comment above cgroup_create()
Since commit 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID"), cgrp is associated with its kernfs_node. Update corresponding comment.
Si
cgroup: fix obsolete comment above cgroup_create()
Since commit 743210386c03 ("cgroup: use cgrp->kn->id as the cgroup ID"), cgrp is associated with its kernfs_node. Update corresponding comment.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|