Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
3c615294 |
| 14-Jul-2023 |
GONG, Ruiqi <gongruiqi@huaweicloud.com> |
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it play
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc.
There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill.
To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed.
Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups.
The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below:
sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec)
control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759
experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343
The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies:
control 4 copies 8 copies 16 copies
total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M
Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
0c474d31 |
| 12-Jun-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
mm/slab: simplify create_kmalloc_cache() args and make it static
In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With t
mm/slab: simplify create_kmalloc_cache() args and make it static
In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them.
Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
f7e466e9 |
| 16-Apr-2023 |
David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> |
mm/slab: Replace invocation of weak PRNG
The Slab allocator randomization uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap state, by randomizing the f
mm/slab: Replace invocation of weak PRNG
The Slab allocator randomization uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap state, by randomizing the freelists state.
However, this PRNG turned out to be weak, as noted in commit c51f8f88d705 To fix it, we have changed the invocation of prandom_u32_state to get_random_u32 to ensure the PRNG is strong. Since a modulo operation is applied right after that, we used get_random_u32_below, to achieve uniformity.
In addition, we changed the freelist_init_state union to struct, since the rnd_state inside which is used to store the state of prandom_u32, is not needed anymore, since get_random_u32 maintains its own state.
Signed-off-by: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
444f20c2 |
| 17-Apr-2023 |
zhaoxinchao <chrisxinchao@outlook.com> |
mm/slab: correct return values in comment for _kmem_cache_create()
__kmem_cache_create() returns 0 on success and non-zero on failure. The comment is wrong in two instances, so fix the first one and
mm/slab: correct return values in comment for _kmem_cache_create()
__kmem_cache_create() returns 0 on success and non-zero on failure. The comment is wrong in two instances, so fix the first one and remove the second one. Also make the comment non-doc, because it doesn't describe an API function, but SLAB-specific implementation.
Signed-off-by: zhaoxinchao <chrisxinchao@outlook.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.24 |
|
#
c7b23b68 |
| 13-Apr-2023 |
Yosry Ahmed <yosryahmed@google.com> |
mm: vmscan: refactor updating current->reclaim_state
During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, whi
mm: vmscan: refactor updating current->reclaim_state
During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct.
However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h.
Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.23, v6.1.22, v6.1.21, v6.1.20 |
|
#
23baf831 |
| 15-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definiti
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definition is counter-intuitive and lead to number of bugs all over the kernel.
Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now.
[kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
66a1c22b |
| 21-Mar-2023 |
Geert Uytterhoeven <geert+renesas@glider.be> |
mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
sh/migor_defconfig:
mm/slab.c: In function ‘slab_memory_callback’: mm/slab.c:1127:23: error: implicit declaration of function
mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP
sh/migor_defconfig:
mm/slab.c: In function ‘slab_memory_callback’: mm/slab.c:1127:23: error: implicit declaration of function ‘init_cache_node_node’; did you mean ‘drain_cache_node_node’? [-Werror=implicit-function-declaration] 1127 | ret = init_cache_node_node(nid); | ^~~~~~~~~~~~~~~~~~~~ | drain_cache_node_node
The #ifdef condition protecting the definition of init_cache_node_node() no longer matches the conditions protecting the (multiple) users.
Fix this by syncing the conditions.
Fixes: 76af6a054da40553 ("mm/migrate: add CPU hotplug to demotion #ifdef") Reported-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/b5bdea22-ed2f-3187-6efe-0c72330270a4@infradead.org Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
f5451547 |
| 07-Feb-2023 |
Thomas Gleixner <tglx@linutronix.de> |
mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early
The memory allocators are available during early boot even in the phase where interrupts are disabled and scheduling is not yet possi
mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early
The memory allocators are available during early boot even in the phase where interrupts are disabled and scheduling is not yet possible.
The setup is so that GFP_KERNEL allocations work in this phase without causing might_alloc() splats to be emitted because the system state is SYSTEM_BOOTING at that point which prevents the warnings to trigger.
Most allocation/free functions use local_irq_save()/restore() or a lock variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when interrupts are enabled during the early boot phase.
This went unnoticed so far as there are no early users of these interfaces. The upcoming conversion of the interrupt descriptor store from radix_tree to maple_tree triggered this warning as maple_tree uses the bulk interface.
Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and SLAB to local[_lock]_irq_save()/restore().
There is obviously no reclaim possible and required at this point so there is no need to expand this coverage further.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4 |
|
#
02d65d6f |
| 06-Jan-2023 |
Sidhartha Kumar <sidhartha.kumar@oracle.com> |
mm: introduce folio_is_pfmemalloc
Add a folio equivalent for page_is_pfmemalloc. This removes two instances of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used directly.
Link: http
mm: introduce folio_is_pfmemalloc
Add a folio equivalent for page_is_pfmemalloc. This removes two instances of page_is_pfmemalloc(folio_page(folio, 0)) so the folio can be used directly.
Link: https://lkml.kernel.org/r/20230106215251.599222-1-sidhartha.kumar@oracle.com Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
81ce2ebd |
| 11-Jan-2023 |
lvqian <lvqian@nfschina.com> |
mm/slab.c: cleanup is_debug_pagealloc_cache()
Remove the if statement to increase code readability. Also make the function inline, per David.
Signed-off-by: lvqian <lvqian@nfschina.com> Acked-by: D
mm/slab.c: cleanup is_debug_pagealloc_cache()
Remove the if statement to increase code readability. Also make the function inline, per David.
Signed-off-by: lvqian <lvqian@nfschina.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
c034c6a4 |
| 09-Jan-2023 |
SeongJae Park <sj@kernel.org> |
mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages
The standard idiom for getting head page of a given folio is '&folio->page', but some are wrongly using 'folio_page(folio, 0)' fo
mm/sl{a,u}b: fix wrong usages of folio_page() for getting head pages
The standard idiom for getting head page of a given folio is '&folio->page', but some are wrongly using 'folio_page(folio, 0)' for the purpose. Fix those to use the idiom.
Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
35e3c36d |
| 18-Dec-2022 |
Gou Hao <gouhao@uniontech.com> |
mm/slab: remove unused slab_early_init
'slab_early_init' was introduced by 'commit e0a42726794f ("[PATCH] mm/slab.c: fix early init assumption")', this flag was used to prevent off-slab caches being
mm/slab: remove unused slab_early_init
'slab_early_init' was introduced by 'commit e0a42726794f ("[PATCH] mm/slab.c: fix early init assumption")', this flag was used to prevent off-slab caches being created so early during bootup.
The only user of 'slab_early_init' was removed in 'commit 3217fd9bdf00 ("mm/slab: make criteria for off slab determination robust and simple")'.
Signed-off-by: Gou Hao <gouhao@uniontech.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
cc2e9d2b |
| 28-Dec-2022 |
David Rientjes <rientjes@google.com> |
mm, slab: periodically resched in drain_freelist()
drain_freelist() can be called with a very large number of slabs to free, such as for kmem_cache_shrink(), or depending on various settings of the
mm, slab: periodically resched in drain_freelist()
drain_freelist() can be called with a very large number of slabs to free, such as for kmem_cache_shrink(), or depending on various settings of the slab cache when doing periodic reaping.
If there is a potentially long list of slabs to drain, periodically schedule to ensure we aren't saturating the cpu for too long.
Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78 |
|
#
8b881763 |
| 04-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm/migrate: make isolate_movable_page() skip slab pages
In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with S
mm/migrate: make isolate_movable_page() skip slab pages
In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1].
To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page().
Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock.
[1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/
Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
show more ...
|
#
838de63b |
| 10-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slab: move and adjust kernel-doc for kmem_cache_alloc
Alexander reports an issue with the kmem_cache_alloc() comment in mm/slab.c:
> The current comment mentioned that the flags only matters if
mm/slab: move and adjust kernel-doc for kmem_cache_alloc
Alexander reports an issue with the kmem_cache_alloc() comment in mm/slab.c:
> The current comment mentioned that the flags only matters if the > cache has no available objects. It's different for the __GFP_ZERO > flag which will ensure that the returned object is always zeroed > in any case.
> I have the feeling I run into this question already two times if > the user need to zero the object or not, but the user does not need > to zero the object afterwards. However another use of __GFP_ZERO > and only zero the object if the cache has no available objects would > also make no sense.
and suggests thus mentioning __GFP_ZERO as the exception. But on closer inspection, the part about flags being only relevant if cache has no available objects is misleading. The slab user has no reliable way to determine if there are available objects, and e.g. the might_sleep() debug check can be performed even if objects are available, so passing correct flags given the allocation context always matters.
Thus remove that sentence completely, and while at it, move the comment to from SLAB-specific mm/slab.c to the common include/linux/slab.h The comment otherwise refers flags description for kmalloc(), so add __GFP_ZERO comment there and remove a very misleading GFP_HIGHUSER (not applicable to slab) description from there. Mention kzalloc() and kmem_cache_zalloc() shortcuts.
Reported-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/all/20221011145413.8025-1-aahringo@redhat.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3 |
|
#
9ce67395 |
| 20-Oct-2022 |
Feng Tang <feng.tang@intel.com> |
mm/slub: only zero requested size of buffer for kzalloc when debug enabled
kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more t
mm/slub: only zero requested size of buffer for kzalloc when debug enabled
kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory.
To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that redzone sanity check could be added to the extra space later.
For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more when debug is enabled. (Thanks to Kees Cook's effort to sanitize all ksize() user cases [1], this won't be a big issue).
[1]. https://lore.kernel.org/all/20220922031013.2150682-1-keescook@chromium.org/#r
Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
b539ce9f |
| 21-Oct-2022 |
Jiri Kosina <jkosina@suse.cz> |
mm/slab: Annotate kmem_cache_node->list_lock as raw
The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, beca
mm/slab: Annotate kmem_cache_node->list_lock as raw
The list_lock can be taken in hardirq context when do_drain() is being called via IPI on all cores, and therefore lockdep complains about it, because it can't be preempted on PREEMPT_RT.
That's not a real issue, as SLAB can't be built on PREEMPT_RT anyway, but we still want to get rid of the warning on non-PREEMPT_RT builds.
Annotate it therefore as a raw lock in order to get rid of he lockdep warning below.
============================= [ BUG: Invalid wait context ] 6.1.0-rc1-00134-ge35184f32151 #4 Not tainted ----------------------------- swapper/3/0 is trying to lock: ffff8bc88086dc18 (&parent->list_lock){..-.}-{3:3}, at: do_drain+0x57/0xb0 other info that might help us debug this: context-{2:2} no locks held by swapper/3/0. stack backtrace: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.1.0-rc1-00134-ge35184f32151 #4 Hardware name: LENOVO 20K5S22R00/20K5S22R00, BIOS R0IET38W (1.16 ) 05/31/2017 Call Trace: <IRQ> dump_stack_lvl+0x6b/0x9d __lock_acquire+0x1519/0x1730 ? build_sched_domains+0x4bd/0x1590 ? __lock_acquire+0xad2/0x1730 lock_acquire+0x294/0x340 ? do_drain+0x57/0xb0 ? sched_clock_tick+0x41/0x60 _raw_spin_lock+0x2c/0x40 ? do_drain+0x57/0xb0 do_drain+0x57/0xb0 __flush_smp_call_function_queue+0x138/0x220 __sysvec_call_function+0x4f/0x210 sysvec_call_function+0x4b/0x90 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 RIP: 0010:mwait_idle+0x5e/0x80 Code: 31 d2 65 48 8b 04 25 80 ed 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 14 66 90 0f 00 2d 0b 78 46 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 06 fb 0f 1f 44 00 00 65 48 8b 04 25 80 ed 01 00 f0 80 60 02 df RSP: 0000:ffffa90940217ee0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9bb9f93a RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000001 R10: ffffa90940217ea8 R11: 0000000000000000 R12: ffffffffffffffff R13: 0000000000000000 R14: ffff8bc88127c500 R15: 0000000000000000 ? default_idle_call+0x1a/0xa0 default_idle_call+0x4b/0xa0 do_idle+0x1f1/0x2c0 ? _raw_spin_unlock_irqrestore+0x56/0x70 cpu_startup_entry+0x19/0x20 start_secondary+0x122/0x150 secondary_startup_64_no_verify+0xce/0xdb </TASK>
Signed-off-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.0.2, v5.15.74 |
|
#
e36ce448 |
| 14-Oct-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large (
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does.
SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE.
Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy.
Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v5.15.73, v6.0.1 |
|
#
a251c17a |
| 05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32() when possible
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same cod
treewide: use get_random_u32() when possible
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
show more ...
|
Revision tags: v5.15.72, v6.0, v5.15.71 |
|
#
05a94065 |
| 23-Sep-2022 |
Kees Cook <keescook@chromium.org> |
slab: Introduce kmalloc_size_roundup()
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's abi
slab: Introduce kmalloc_size_roundup()
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
[ vbabka@suse.cz: add SLOB version ]
Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61 |
|
#
2c1d697f |
| 17-Aug-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro.
And then this patch does:
mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro.
And then this patch does: - Do not pass pointer to struct kmem_cache to trace_kmalloc. gfp flag is enough to know if it's accounted or not. - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event. - Avoid dereferencing s->name in when not using kmem_cache_free event. - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB
Cc: Vasily Averin <vasily.averin@linux.dev> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
11e9734b |
| 17-Aug-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/slab_common: unify NUMA and UMA version of tracepoints
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints.
This will brea
mm/slab_common: unify NUMA and UMA version of tracepoints
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints.
This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node, but at this point maintaining both kmem_alloc and kmem_alloc_node event classes does not makes sense at all.
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
26a40990 |
| 17-Aug-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace().
Move its implementation to
mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()
Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace().
Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n.
Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements.
Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
b1405135 |
| 17-Aug-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/sl[au]b: generalize kmalloc subsystem
Now everything in kmalloc subsystem can be generalized. Let's do it!
Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __km
mm/sl[au]b: generalize kmalloc subsystem
Now everything in kmalloc subsystem can be generalized. Let's do it!
Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them to slab_common.c.
In the meantime, rename kmalloc_large_node_notrace() to __kmalloc_large_node() and make it static as it's now only called in slab_common.c.
[ feng.tang@intel.com: adjust kfence skip list to include __kmem_cache_free so that kfence kunit tests do not fail ]
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
ed4cd17e |
| 17-Aug-2022 |
Hyeonggon Yoo <42.hyeyoo@gmail.com> |
mm/sl[au]b: introduce common alloc/free functions without tracepoint
To unify kmalloc functions in later patch, introduce common alloc/free functions that does not have tracepoint.
Signed-off-by: H
mm/sl[au]b: introduce common alloc/free functions without tracepoint
To unify kmalloc functions in later patch, introduce common alloc/free functions that does not have tracepoint.
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|