Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7 |
|
#
c15cdea5 |
| 06-Oct-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") allows architectures with n
mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") allows architectures with non-coherent DMA to define a small ARCH_KMALLOC_MINALIGN (e.g. sizeof(unsigned long long)) and this has been enabled on arm64. With KASAN_HW_TAGS enabled, however, ARCH_SLAB_MINALIGN becomes 16 on arm64 (arch_slab_minalign() dynamically selects it since commit d949a8155d13 ("mm: make minimum slab alignment a runtime property")). This can lead to a situation where kmalloc-8 caches are attempted to be created with a kmem_caches.size aligned to 16. When the cache is mergeable, it can lead to kernel warnings like:
sysfs: cannot create duplicate filename '/kernel/slab/:d-0000016' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc1-00001-gda98843cd306-dirty #5 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x90/0xe8 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 sysfs_warn_dup+0x64/0x80 sysfs_create_dir_ns+0xe8/0x108 kobject_add_internal+0x98/0x264 kobject_init_and_add+0x8c/0xd8 sysfs_slab_add+0x12c/0x248 slab_sysfs_init+0x98/0x14c do_one_initcall+0x6c/0x1b0 kernel_init_freeable+0x1c0/0x288 kernel_init+0x24/0x1e0 ret_from_fork+0x10/0x20 kobject: kobject_add_internal failed for :d-0000016 with -EEXIST, don't try to register things with the same name in the same directory. SLUB: Unable to add boot slab dma-kmalloc-8 to sysfs
Limit the __kmalloc_minalign() return value (used to create the kmalloc-* caches) to arch_slab_minalign() so that kmalloc-8 caches are skipped when KASAN_HW_TAGS is enabled (both config and runtime).
Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: stable@vger.kernel.org # 6.5.x Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
8446a4de |
| 07-Sep-2023 |
David Laight <david.laight@aculab.com> |
slab: kmalloc_size_roundup() must not return 0 for non-zero size
The typical use of kmalloc_size_roundup() is:
ptr = kmalloc(sz = kmalloc_size_roundup(size), ...); if (!ptr) return -ENOMEM.
This
slab: kmalloc_size_roundup() must not return 0 for non-zero size
The typical use of kmalloc_size_roundup() is:
ptr = kmalloc(sz = kmalloc_size_roundup(size), ...); if (!ptr) return -ENOMEM.
This means it is vitally important that the returned value isn't less than the argument even if the argument is insane. In particular if kmalloc_slab() fails or the value is above (MAX_ULONG - PAGE_SIZE) zero is returned and kmalloc() will return its single zero-length buffer ZERO_SIZE_PTR.
Fix this by returning the input size if the size exceeds KMALLOC_MAX_SIZE. kmalloc() will then return NULL as the size really is too big.
kmalloc_slab() should not normally return NULL, unless called too early. Again, returning zero is not the correct action as it can be in some usage scenarios stored to a variable and only later cause kmalloc() return ZERO_SIZE_PTR and subsequent crashes on access. Instead we can simply stop checking the kmalloc_slab() result completely, as calling kmalloc_size_roundup() too early would then result in an immediate crash during boot and the developer noticing an issue in their code.
[vbabka@suse.cz: remove kmalloc_slab() result check, tweak comments and commit log] Fixes: 05a940656e1e ("slab: Introduce kmalloc_size_roundup()") Signed-off-by: David Laight <david.laight@aculab.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
46a9ea66 |
| 08-Sep-2023 |
Rafael Aquini <aquini@redhat.com> |
mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy()
After the commit in Fixes:, if a module that created a slab cache does not release all of its allocated objects before dest
mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy()
After the commit in Fixes:, if a module that created a slab cache does not release all of its allocated objects before destroying the cache (at rmmod time), we might end up releasing the kmem_cache object without removing it from the slab_caches list thus corrupting the list as kmem_cache_destroy() ignores the return value from shutdown_cache(), which in turn never removes the kmem_cache object from slabs_list in case __kmem_cache_shutdown() fails to release all of the cache's slabs.
This is easily observable on a kernel built with CONFIG_DEBUG_LIST=y as after that ill release the system will immediately trip on list_add, or list_del, assertions similar to the one shown below as soon as another kmem_cache gets created, or destroyed:
[ 1041.213632] list_del corruption. next->prev should be ffff89f596fb5768, but was 52f1e5016aeee75d. (next=ffff89f595a1b268) [ 1041.219165] ------------[ cut here ]------------ [ 1041.221517] kernel BUG at lib/list_debug.c:62! [ 1041.223452] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 1041.225408] CPU: 2 PID: 1852 Comm: rmmod Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 1041.228244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 1041.231212] RIP: 0010:__list_del_entry_valid+0xae/0xb0
Another quick way to trigger this issue, in a kernel with CONFIG_SLUB=y, is to set slub_debug to poison the released objects and then just run cat /proc/slabinfo after removing the module that leaks slab objects, in which case the kernel will panic:
[ 50.954843] general protection fault, probably for non-canonical address 0xa56b6b6b6b6b6b8b: 0000 [#1] PREEMPT SMP PTI [ 50.961545] CPU: 2 PID: 1495 Comm: cat Kdump: loaded Tainted: G B W OE 6.5.0 #15 [ 50.966808] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc37 05/24/2023 [ 50.972663] RIP: 0010:get_slabinfo+0x42/0xf0
This patch fixes this issue by properly checking shutdown_cache()'s return value before taking the kmem_cache_release() branch.
Fixes: 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43 |
|
#
05ee7741 |
| 01-Aug-2023 |
Petr Tesarik <petr.tesarik.ext@huawei.com> |
swiotlb: make io_tlb_default_mem local to swiotlb.c
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modi
swiotlb: make io_tlb_default_mem local to swiotlb.c
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code.
To avoid breaking existing users, provide helper functions for the few required fields.
As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
3c615294 |
| 14-Jul-2023 |
GONG, Ruiqi <gongruiqi@huaweicloud.com> |
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it play
Randomized slab caches for kmalloc()
When exploiting memory vulnerabilities, "heap spraying" is a common technique targeting those related to dynamic memory allocation (i.e. the "heap"), and it plays an important role in a successful exploitation. Basically, it is to overwrite the memory area of vulnerable object by triggering allocation in other subsystems or modules and therefore getting a reference to the targeted memory location. It's usable on various types of vulnerablity including use after free (UAF), heap out- of-bound write and etc.
There are (at least) two reasons why the heap can be sprayed: 1) generic slab caches are shared among different subsystems and modules, and 2) dedicated slab caches could be merged with the generic ones. Currently these two factors cannot be prevented at a low cost: the first one is a widely used memory allocation mechanism, and shutting down slab merging completely via `slub_nomerge` would be overkill.
To efficiently prevent heap spraying, we propose the following approach: to create multiple copies of generic slab caches that will never be merged, and random one of them will be used at allocation. The random selection is based on the address of code that calls `kmalloc()`, which means it is static at runtime (rather than dynamically determined at each time of allocation, which could be bypassed by repeatedly spraying in brute force). In other words, the randomness of cache selection will be with respect to the code address rather than time, i.e. allocations in different code paths would most likely pick different caches, although kmalloc() at each place would use the same cache copy whenever it is executed. In this way, the vulnerable object and memory allocated in other subsystems and modules will (most probably) be on different slab caches, which prevents the object from being sprayed.
Meanwhile, the static random selection is further enhanced with a per-boot random seed, which prevents the attacker from finding a usable kmalloc that happens to pick the same cache with the vulnerable subsystem/module by analyzing the open source code. In other words, with the per-boot seed, the random selection is static during each time the system starts and runs, but not across different system startups.
The overhead of performance has been tested on a 40-core x86 server by comparing the results of `perf bench all` between the kernels with and without this patch based on the latest linux-next kernel, which shows minor difference. A subset of benchmarks are listed below:
sched/ sched/ syscall/ mem/ mem/ messaging pipe basic memcpy memset (sec) (sec) (sec) (GB/sec) (GB/sec)
control1 0.019 5.459 0.733 15.258789 51.398026 control2 0.019 5.439 0.730 16.009221 48.828125 control3 0.019 5.282 0.735 16.009221 48.828125 control_avg 0.019 5.393 0.733 15.759077 49.684759
experiment1 0.019 5.374 0.741 15.500992 46.502976 experiment2 0.019 5.440 0.746 16.276042 51.398026 experiment3 0.019 5.242 0.752 15.258789 51.398026 experiment_avg 0.019 5.352 0.746 15.678608 49.766343
The overhead of memory usage was measured by executing `free` after boot on a QEMU VM with 1GB total memory, and as expected, it's positively correlated with # of cache copies:
control 4 copies 8 copies 16 copies
total 969.8M 968.2M 968.2M 968.2M used 20.0M 21.9M 24.1M 26.7M free 936.9M 933.6M 931.4M 928.6M available 932.2M 928.8M 926.6M 923.9M
Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Dennis Zhou <dennis@kernel.org> # percpu Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
b035f5a6 |
| 12-Jun-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible
If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimu
mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible
If an architecture opted in to DMA bouncing of unaligned kmalloc() buffers (ARCH_WANT_KMALLOC_DMA_BOUNCE), reduce the minimum kmalloc() cache alignment below cache-line size to ARCH_KMALLOC_MINALIGN.
Link: https://lkml.kernel.org/r/20230612153201.554742-17-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
963e84b0 |
| 12-Jun-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
mm/slab: limit kmalloc() minimum alignment to dma_get_cache_alignment()
Do not create kmalloc() caches which are not aligned to dma_get_cache_alignment(). There is no functional change since for cu
mm/slab: limit kmalloc() minimum alignment to dma_get_cache_alignment()
Do not create kmalloc() caches which are not aligned to dma_get_cache_alignment(). There is no functional change since for current architectures defining ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN equals ARCH_DMA_MINALIGN (and dma_get_cache_alignment()). On architectures without a specific ARCH_DMA_MINALIGN, dma_get_cache_alignment() is 1, so no change to the kmalloc() caches.
Link: https://lkml.kernel.org/r/20230612153201.554742-5-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
0c474d31 |
| 12-Jun-2023 |
Catalin Marinas <catalin.marinas@arm.com> |
mm/slab: simplify create_kmalloc_cache() args and make it static
In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With t
mm/slab: simplify create_kmalloc_cache() args and make it static
In the slab variant of kmem_cache_init(), call new_kmalloc_cache() instead of initialising the kmalloc_caches array directly. With this, create_kmalloc_cache() is now only called from new_kmalloc_cache() in the same file, so make it static. In addition, the useroffset argument is always 0 while usersize is the same as size. Remove them.
Link: https://lkml.kernel.org/r/20230612153201.554742-4-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
d5bf4857 |
| 13-Jun-2023 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slab_common: use SLAB_NO_MERGE instead of negative refcount
When CONFIG_MEMCG_KMEM is enabled, we disable cache merging for KMALLOC_NORMAL caches so they don't end up merged with a cache that use
mm/slab_common: use SLAB_NO_MERGE instead of negative refcount
When CONFIG_MEMCG_KMEM is enabled, we disable cache merging for KMALLOC_NORMAL caches so they don't end up merged with a cache that uses ad-hoc __GFP_ACCOUNT when allocating. This was implemented by setting the refcount to -1, but now we have a proper SLAB_NO_MERGE flag, so use that instead.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com>
show more ...
|
Revision tags: v6.1.33 |
|
#
b9dad156 |
| 06-Jun-2023 |
Zhen Lei <thunder.leizhen@huawei.com> |
mm/slab_common: reduce an if statement in create_cache()
Move the 'out:' statement block out of the successful path to avoid redundant check on 'err'. The value of 'err' is always zero on success an
mm/slab_common: reduce an if statement in create_cache()
Move the 'out:' statement block out of the successful path to avoid redundant check on 'err'. The value of 'err' is always zero on success and negative on failure.
No functional changes, no performance improvements, just a little more readability.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
Revision tags: v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7 |
|
#
d0bf7d57 |
| 17-Jan-2023 |
Jesper Dangaard Brouer <brouer@redhat.com> |
mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
Allow API users of kmem_cache_create to specify that they don't want any slab merge or aliasing (with similar sized objects). Use this in kfence_test
mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
Allow API users of kmem_cache_create to specify that they don't want any slab merge or aliasing (with similar sized objects). Use this in kfence_test.
The SKB (sk_buff) kmem_cache slab is critical for network performance. Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain performance by amortising the alloc/free cost.
For the bulk API to perform efficiently the slub fragmentation need to be low. Especially for the SLUB allocator, the efficiency of bulk free API depend on objects belonging to the same slab (page).
When running different network performance microbenchmarks, I started to notice that performance was reduced (slightly) when machines had longer uptimes. I believe the cause was 'skbuff_head_cache' got aliased/merged into the general slub for 256 bytes sized objects (with my kernel config, without CONFIG_HARDENED_USERCOPY).
For SKB kmem_cache network stack have reasons for not merging, but it varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE for this kmem_cache.
Another use case for the flag has been described by David Sterba [1]:
> This can be used for more fine grained control over the caches or for > debugging builds where separate slabs can verify that no objects leak.
> The slab_nomerge boot option is too coarse and would need to be > enabled on all testing hosts. There are some other ways how to disable > merging, e.g. a slab constructor but this disables poisoning besides > that it adds additional overhead. Other flags are internal and may > have other semantics.
> A concrete example what motivates the flag. During 'btrfs balance' > slab top reported huge increase in caches like
> 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt > 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace > 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot
> which was confusing and that it's because of slab merging was not the > first idea. After rebooting with slab_nomerge all the caches were > from btrfs_ namespace as expected.
[1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
[ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the one proposed by David so it does not collide with internal SLAB/SLUB flags, write a comment for the flag, expand changelog, drop the skbuff part to be handled spearately ]
Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/ Reported-by: David Sterba <dsterba@suse.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
show more ...
|
#
ffe4dfe0 |
| 16-Apr-2023 |
David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> |
mm/slab_common: Replace invocation of weak PRNG
The Slab allocator randomization inside slab_common.c uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap
mm/slab_common: Replace invocation of weak PRNG
The Slab allocator randomization inside slab_common.c uses the prandom_u32 PRNG. That was added to prevent attackers to obtain information on the heap state.
However, this PRNG turned out to be weak, as noted in commit c51f8f88d705 To fix it, we have changed the invocation of prandom_u32_state to get_random_u32 to ensure the PRNG is strong.
Since a modulo operation is applied right after that, in the Fisher-Yates shuffle, we used get_random_u32_below, to achieve uniformity.
Signed-off-by: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
ae65a521 |
| 02-Mar-2023 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slab: document kfree() as allowed for kmem_cache_alloc() objects
This will make it easier to free objects in situations when they can come from either kmalloc() or kmem_cache_alloc(), and also al
mm/slab: document kfree() as allowed for kmem_cache_alloc() objects
This will make it easier to free objects in situations when they can come from either kmalloc() or kmem_cache_alloc(), and also allow kfree_rcu() for freeing objects from kmem_cache_alloc().
For the SLAB and SLUB allocators this was always possible so with SLOB gone, we can document it as supported.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org>
show more ...
|
#
de4d6089 |
| 28-Feb-2023 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slab: remove CONFIG_SLOB code from slab common code
CONFIG_SLOB has been removed from Kconfig. Remove code and #ifdef's specific to SLOB in the slab headers and common code.
Signed-off-by: Vlast
mm/slab: remove CONFIG_SLOB code from slab common code
CONFIG_SLOB has been removed from Kconfig. Remove code and #ifdef's specific to SLOB in the slab headers and common code.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17 |
|
#
bbc61844 |
| 04-Jan-2023 |
Feng Tang <feng.tang@intel.com> |
mm/kasan: simplify and refine kasan_cache code
struct 'kasan_cache' has a member 'is_kmalloc' indicating whether its host kmem_cache is a kmalloc cache. With newly introduced is_kmalloc_cache() hel
mm/kasan: simplify and refine kasan_cache code
struct 'kasan_cache' has a member 'is_kmalloc' indicating whether its host kmem_cache is a kmalloc cache. With newly introduced is_kmalloc_cache() helper, 'is_kmalloc' and its related function can be replaced and removed.
Also 'kasan_cache' is only needed by KASAN generic mode, and not by SW/HW tag modes, so refine its protection macro accordingly, suggested by Andrey Konoval.
Link: https://lkml.kernel.org/r/20230104060605.930910-2-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70 |
|
#
38931d89 |
| 22-Sep-2022 |
Kees Cook <keescook@chromium.org> |
mm: Make ksize() a reporting-only function
With all "silently resizing" callers of ksize() refactored, remove the logic in ksize() that would allow it to be used to effectively change the size of an
mm: Make ksize() a reporting-only function
With all "silently resizing" callers of ksize() refactored, remove the logic in ksize() that would allow it to be used to effectively change the size of an allocation (bypassing __alloc_size hints, etc). Users wanting this feature need to either use kmalloc_size_roundup() before an allocation, or use krealloc() directly.
For kfree_sensitive(), move the unpoisoning logic inline. Replace the some of the partially open-coded ksize() in __do_krealloc with ksize() now that it doesn't perform unpoisoning.
Adjust the KUnit tests to match the new ksize() behavior. Execution tested with:
$ ./tools/testing/kunit/kunit.py run \ --kconfig_add CONFIG_KASAN=y \ --kconfig_add CONFIG_KASAN_GENERIC=y \ --arch x86_64 kasan
Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: linux-mm@kvack.org Cc: kasan-dev@googlegroups.com Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Enhanced-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
show more ...
|
#
2f7c1c13 |
| 15-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
Distinguishing kmalloc(__GFP_RECLAIMABLE) can help against fragmentation by grouping pages by mobility, but on tiny systems the extra
mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
Distinguishing kmalloc(__GFP_RECLAIMABLE) can help against fragmentation by grouping pages by mobility, but on tiny systems the extra memory overhead of separate set of kmalloc-rcl caches will probably be worse, and mobility grouping likely disabled anyway.
Thus with CONFIG_SLUB_TINY, don't create kmalloc-rcl caches and use the regular ones.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com>
show more ...
|
#
346907ce |
| 16-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slab: ignore hardened usercopy parameters when disabled
With CONFIG_HARDENED_USERCOPY not enabled, there are no __check_heap_object() checks happening that would use the struct kmem_cache userof
mm, slab: ignore hardened usercopy parameters when disabled
With CONFIG_HARDENED_USERCOPY not enabled, there are no __check_heap_object() checks happening that would use the struct kmem_cache useroffset and usersize fields. Yet the fields are still initialized, preventing merging of otherwise compatible caches.
Also the fields contribute to struct kmem_cache size unnecessarily when unused. Thus #ifdef them out completely when CONFIG_HARDENED_USERCOPY is disabled. In kmem_dump_obj() print object_size instead of usersize, as that's actually the intention.
In a quick virtme boot test, this has reduced the number of caches in /proc/slabinfo from 131 to 111.
Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com>
show more ...
|
#
946fa0db |
| 20-Oct-2022 |
Feng Tang <feng.tang@intel.com> |
mm/slub: extend redzone check to extra allocated kmalloc space than requested
kmalloc will round up the request size to a fixed size (mostly power of 2), so there could be a extra space than what is
mm/slub: extend redzone check to extra allocated kmalloc space than requested
kmalloc will round up the request size to a fixed size (mostly power of 2), so there could be a extra space than what is requested, whose size is the actual buffer size minus original request size.
To better detect out of bound access or abuse of this space, add redzone sanity check for it.
In current kernel, some kmalloc user already knows the existence of the space and utilizes it after calling 'ksize()' to know the real size of the allocated buffer. So we skip the sanity check for objects which have been called with ksize(), as treating them as legitimate users. Kees Cook is working on sanitizing all these user cases, by using kmalloc_size_roundup() to avoid ambiguous usages. And after this is done, this special handling for ksize() can be removed.
In some cases, the free pointer could be saved inside the latter part of object data area, which may overlap the redzone part(for small sizes of kmalloc objects). As suggested by Hyeonggon Yoo, force the free pointer to be in meta data area when kmalloc redzone debug is enabled, to make all kmalloc objects covered by redzone check.
Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
c18c20f1 |
| 07-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm, slab: remove duplicate kernel-doc comment for ksize()
Akira reports:
> "make htmldocs" reports duplicate C declaration of ksize() as follows:
> /linux/Documentation/core-api/mm-api:43: ./mm/sl
mm, slab: remove duplicate kernel-doc comment for ksize()
Akira reports:
> "make htmldocs" reports duplicate C declaration of ksize() as follows:
> /linux/Documentation/core-api/mm-api:43: ./mm/slab_common.c:1428: WARNING: Duplicate C declaration, also defined at core-api/mm-api:212. > Declaration is '.. c:function:: size_t ksize (const void *objp)'.
> This is due to the kernel-doc comment for ksize() declaration added in > include/linux/slab.h by commit 05a940656e1e ("slab: Introduce > kmalloc_size_roundup()").
There is an older kernel-doc comment for ksize() definition in mm/slab_common.c, which is not only duplicated, but also contradicts the new one - the additional storage discovered by ksize() should not be used by callers anymore. Delete the old kernel-doc.
Reported-by: Akira Yokosawa <akiyks@gmail.com> Link: https://lore.kernel.org/all/d33440f6-40cf-9747-3340-e54ffaf7afb8@gmail.com/ Fixes: 05a940656e1e ("slab: Introduce kmalloc_size_roundup()") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
32868715 |
| 05-Nov-2022 |
Kees Cook <keescook@chromium.org> |
mm/slab_common: Restore passing "caller" for tracing
The "caller" argument was accidentally being ignored in a few places that were recently refactored. Restore these "caller" arguments, instead of
mm/slab_common: Restore passing "caller" for tracing
The "caller" argument was accidentally being ignored in a few places that were recently refactored. Restore these "caller" arguments, instead of _RET_IP_.
Fixes: 11e9734bcb6a ("mm/slab_common: unify NUMA and UMA version of tracepoints") Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
eb4940d4 |
| 04-Nov-2022 |
Vlastimil Babka <vbabka@suse.cz> |
mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()
For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save
mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()
For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call.
However since commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled).
We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil.
It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway.
[1] https://lore.kernel.org/linux-mm/20221101222520.never.109-kees@kernel.org/T/#m20ecf14390e406247bde0ea9cce368f469c539ed
Link: https://lore.kernel.org/all/097d8fba-bd10-a312-24a3-a4068c4f424c@suse.cz/ Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
a2076201 |
| 31-Oct-2022 |
Lukas Bulwahn <lukas.bulwahn@gmail.com> |
mm/slab_common: repair kernel-doc for __ksize()
Commit 445d41d7a7c1 ("Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next") resolved a conflict of two concurrent changes to __ksize()
mm/slab_common: repair kernel-doc for __ksize()
Commit 445d41d7a7c1 ("Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-next") resolved a conflict of two concurrent changes to __ksize().
However, it did not adjust the kernel-doc comment of __ksize(), while the name of the argument to __ksize() was renamed.
Hence, ./scripts/ kernel-doc -none mm/slab_common.c warns about it.
Adjust the kernel-doc comment for __ksize() for make W=1 happiness.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
05a94065 |
| 23-Sep-2022 |
Kees Cook <keescook@chromium.org> |
slab: Introduce kmalloc_size_roundup()
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's abi
slab: Introduce kmalloc_size_roundup()
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for.
There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel:
1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example:
data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data);
2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators:
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize);
In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information.
We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?".
Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize().
[1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183
[ vbabka@suse.cz: add SLOB version ]
Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
#
9ed9cac1 |
| 23-Sep-2022 |
Kees Cook <keescook@chromium.org> |
slab: Remove __malloc attribute from realloc functions
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instea
slab: Remove __malloc attribute from realloc functions
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed.
Thanks to Geert Uytterhoeven <geert@linux-m68k.org> for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it.
Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-mm@kvack.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|