Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12 |
|
#
b5bf39cd |
| 12-Jan-2024 |
Alison Schofield <alison.schofield@intel.com> |
x86/numa: Fix the address overlap check in numa_fill_memblks()
[ Upstream commit 9b99c17f7510bed2adbe17751fb8abddba5620bc ]
numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a phys
x86/numa: Fix the address overlap check in numa_fill_memblks()
[ Upstream commit 9b99c17f7510bed2adbe17751fb8abddba5620bc ]
numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a physical address range. To do so, it first creates a list of existing memblks that overlap that address range. The issue is that it is off by one when comparing to the end of the address range, so memblks that do not overlap are selected.
The impact of selecting a memblk that does not actually overlap is that an existing memblk may be filled when the expected action is to do nothing and return NUMA_NO_MEMBLK to the caller. The caller can then add a new NUMA node and memblk.
Replace the broken open-coded search for address overlap with the memblock helper memblock_addrs_overlap(). Update the kernel doc and in code comments.
Suggested by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
e791a345 |
| 18-Jan-2024 |
Yajun Deng <yajun.deng@linux.dev> |
memblock: fix crash when reserved memory is not added to memory
[ Upstream commit 6a9531c3a88096a26cf3ac582f7ec44f94a7dcb2 ]
After commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") n
memblock: fix crash when reserved memory is not added to memory
[ Upstream commit 6a9531c3a88096a26cf3ac582f7ec44f94a7dcb2 ]
After commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") nid of a reserved region is used by init_reserved_page() (with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y) to access node strucure. In many cases the nid of the reserved memory is not set and this causes a crash.
When the nid of a reserved region is not set, fall back to early_pfn_to_nid(), so that nid of the first_online_node will be passed to init_reserved_page().
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Link: https://lore.kernel.org/r/20240118061853.2652295-1-yajun.deng@linux.dev [rppt: massaged the commit message] Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43 |
|
#
0db31d63 |
| 02-Aug-2023 |
Ma Wupeng <mawupeng1@huawei.com> |
mm: disable kernelcore=mirror when no mirror memory
For system with kernelcore=mirror enabled while no mirrored memory is reported by efi. This could lead to kernel OOM during startup since all mem
mm: disable kernelcore=mirror when no mirror memory
For system with kernelcore=mirror enabled while no mirrored memory is reported by efi. This could lead to kernel OOM during startup since all memory beside zone DMA are in the movable zone and this prevents the kernel to use it.
Zone DMA/DMA32 initialization is independent of mirrored memory and their max pfn is set in zone_sizes_init(). Since kernel can fallback to zone DMA/DMA32 if there is no memory in zone Normal, these zones are seen as mirrored memory no mather their memory attributes are.
To solve this problem, disable kernelcore=mirror when there is no real mirrored memory exists.
Link: https://lkml.kernel.org/r/20230802072328.2107981-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
c442a957 |
| 28-Jul-2023 |
Mike Rapoport (IBM) <rppt@kernel.org> |
Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"
This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c.
kbuild reports a warning in memblock_remove_region()
Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"
This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c.
kbuild reports a warning in memblock_remove_region() because of a false positive caused by partial reset of the memblock state.
Doing the full reset will remove the false positives, but will allow late use of memblock_free() to go unnoticed, so it is better to revert the offending commit.
WARNING: CPU: 0 PID: 1 at mm/memblock.c:352 memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00001-g9e46e4dcd9d6 #2 RIP: 0010:memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1)) Call Trace: memblock_discard (kbuild/src/x86_64/mm/memblock.c:383) page_alloc_init_late (kbuild/src/x86_64/include/linux/find.h:208 kbuild/src/x86_64/include/linux/nodemask.h:266 kbuild/src/x86_64/mm/mm_init.c:2405) kernel_init_freeable (kbuild/src/x86_64/init/main.c:1325 kbuild/src/x86_64/init/main.c:1546) kernel_init (kbuild/src/x86_64/init/main.c:1439) ret_from_fork (kbuild/src/x86_64/arch/x86/kernel/process.c:145) ret_from_fork_asm (kbuild/src/x86_64/arch/x86/entry/entry_64.S:298)
Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307271656.447aa17e-oliver.sang@intel.com Signed-off-by: "Mike Rapoport (IBM)" <rppt@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v6.1.42, v6.1.41, v6.1.40 |
|
#
9e46e4dc |
| 19-Jul-2023 |
Rik van Riel <riel@surriel.com> |
mm,memblock: reset memblock.reserved to system init state to prevent UAF
The memblock_discard function frees the memblock.reserved.regions array, which is good.
However, if a subsequent memblock_fr
mm,memblock: reset memblock.reserved to system init state to prevent UAF
The memblock_discard function frees the memblock.reserved.regions array, which is good.
However, if a subsequent memblock_free (or memblock_phys_free) comes in later, from for example ima_free_kexec_buffer, that will result in a use after free bug in memblock_isolate_range.
When running a kernel with CONFIG_KASAN enabled, this will cause a kernel panic very early in boot. Without CONFIG_KASAN, there is a chance that memblock_isolate_range might scribble on memory that is now in use by somebody else.
Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer.
If memblock_free is called after memblock memory is discarded, that will print a warning in memblock_remove_region.
Signed-off-by: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20230719154137.732d8525@imladris.surriel.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35 |
|
#
61167ad5 |
| 18-Jun-2023 |
Yajun Deng <yajun.deng@linux.dev> |
mm: pass nid to reserve_bootmem_region()
early_pfn_to_nid() is called frequently in init_reserved_page(), it returns the node id of the PFN. These PFN are probably from the same memory region, they
mm: pass nid to reserve_bootmem_region()
early_pfn_to_nid() is called frequently in init_reserved_page(), it returns the node id of the PFN. These PFN are probably from the same memory region, they have the same node id. It's not necessary to call early_pfn_to_nid() for each PFN.
Pass nid to reserve_bootmem_region() and drop the call to early_pfn_to_nid() in init_reserved_page(). Also, set nid on all reserved pages before doing this, as some reserved memory regions may not be set nid.
The most beneficial function is memmap_init_reserved_pages() if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
The following data was tested on an x86 machine with 190GB of RAM.
before: memmap_init_reserved_pages() 67ms
after: memmap_init_reserved_pages() 20ms
Link: https://lkml.kernel.org/r/20230619023406.424298-1-yajun.deng@linux.dev Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.34, v6.1.33 |
|
#
a668968f |
| 06-Jun-2023 |
Haifeng Xu <haifeng.xu@shopee.com> |
mm/memory_hotplug: remove reset_node_managed_pages() in hotadd_init_pgdat()
managed pages has already been set to 0 in free_area_init_core_hotplug(), via zone_init_internals() on each zone. It's po
mm/memory_hotplug: remove reset_node_managed_pages() in hotadd_init_pgdat()
managed pages has already been set to 0 in free_area_init_core_hotplug(), via zone_init_internals() on each zone. It's pointless to reset again.
Furthermore, reset_node_managed_pages() no longer needs to be exposed outside of mm/memblock.c. Remove declaration in include/linux/memblock.h and define it as static.
In addtion to this, the only caller of reset_node_managed_pages() is reset_all_zones_managed_pages(), which is annotated with __init, so it should be safe to also mark reset_node_managed_pages() as __init.
Link: https://lkml.kernel.org/r/20230607024548.1240-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Suggested-by: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
dcdfdd40 |
| 06-Jun-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm: Add support for unaccepted memory
UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to
mm: Add support for unaccepted memory
UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific to the Virtual Machine platform.
There are several ways the kernel can deal with unaccepted memory:
1. Accept all the memory during boot. It is easy to implement and it doesn't have runtime cost once the system is booted. The downside is very long boot time.
Accept can be parallelized to multiple CPUs to keep it manageable (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate memory bandwidth and does not scale beyond the point.
2. Accept a block of memory on the first use. It requires more infrastructure and changes in page allocator to make it work, but it provides good boot time.
On-demand memory accept means latency spikes every time kernel steps onto a new memory block. The spikes will go away once workload data set size gets stabilized or all memory gets accepted.
3. Accept all memory in background. Introduce a thread (or multiple) that gets memory accepted proactively. It will minimize time the system experience latency spikes on memory allocation while keeping low boot time.
This approach cannot function on its own. It is an extension of #2: background memory acceptance requires functional scheduler, but the page allocator may need to tap into unaccepted memory before that.
The downside of the approach is that these threads also steal CPU cycles and memory bandwidth from the user's workload and may hurt user experience.
Implement #1 and #2 for now. #2 is the default. Some workloads may want to use #1 with accept_memory=eager in kernel command line. #3 can be implemented later based on user's demands.
Support of unaccepted memory requires a few changes in core-mm code:
- memblock accepts memory on allocation. It serves early boot memory allocations and doesn't limit them to pre-accepted pool of memory.
- page allocator accepts memory on the first allocation of the page. When kernel runs out of accepted memory, it accepts memory until the high watermark is reached. It helps to minimize fragmentation.
EFI code will provide two helpers if the platform supports unaccepted memory:
- accept_memory() makes a range of physical addresses accepted.
- range_contains_unaccepted_memory() checks anything within the range of physical addresses requires acceptance.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> # memblock Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com
show more ...
|
Revision tags: v6.1.32 |
|
#
de649e7f |
| 01-Jun-2023 |
Yuwei Guan <ssawgyw@gmail.com> |
memblock: Update nid info in memblock debugfs
The node id for memblock reserved regions will be wrong, so let's show 'x' for reg->nid == MAX_NUMNODES in debugfs to keep it align.
Suggested-by: Mike
memblock: Update nid info in memblock debugfs
The node id for memblock reserved regions will be wrong, so let's show 'x' for reg->nid == MAX_NUMNODES in debugfs to keep it align.
Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org> Co-developed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Yuwei Guan <ssawgyw@gmail.com> Link: https://lore.kernel.org/r/20230601133149.37160-1-ssawgyw@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.31, v6.1.30 |
|
#
493f349e |
| 19-May-2023 |
Yuwei Guan <ssawgyw@gmail.com> |
memblock: Add flags and nid info in memblock debugfs
Currently, the memblock debugfs can display the count of memblock_type and the base and end of the reg. However, when memblock_mark_*() or memblo
memblock: Add flags and nid info in memblock debugfs
Currently, the memblock debugfs can display the count of memblock_type and the base and end of the reg. However, when memblock_mark_*() or memblock_set_node() is executed on some range, the information in the existing debugfs cannot make it clear why the address is not consecutive.
For example, cat /sys/kernel/debug/memblock/memory 0: 0x0000000080000000..0x00000000901fffff 1: 0x0000000090200000..0x00000000905fffff 2: 0x0000000090600000..0x0000000092ffffff 3: 0x0000000093000000..0x00000000973fffff 4: 0x0000000097400000..0x00000000b71fffff 5: 0x00000000c0000000..0x00000000dfffffff 6: 0x00000000e2500000..0x00000000f87fffff 7: 0x00000000f8800000..0x00000000fa7fffff 8: 0x00000000fa800000..0x00000000fd3effff 9: 0x00000000fd3f0000..0x00000000fd3fefff 10: 0x00000000fd3ff000..0x00000000fd7fffff 11: 0x00000000fd800000..0x00000000fd901fff 12: 0x00000000fd902000..0x00000000fd909fff 13: 0x00000000fd90a000..0x00000000fd90bfff 14: 0x00000000fd90c000..0x00000000ffffffff 15: 0x0000000880000000..0x0000000affffffff
So we can add flags and nid to this debugfs.
For example, cat /sys/kernel/debug/memblock/memory 0: 0x0000000080000000..0x00000000901fffff 0 NONE 1: 0x0000000090200000..0x00000000905fffff 0 NOMAP 2: 0x0000000090600000..0x0000000092ffffff 0 NONE 3: 0x0000000093000000..0x00000000973fffff 0 NOMAP 4: 0x0000000097400000..0x00000000b71fffff 0 NONE 5: 0x00000000c0000000..0x00000000dfffffff 0 NONE 6: 0x00000000e2500000..0x00000000f87fffff 0 NONE 7: 0x00000000f8800000..0x00000000fa7fffff 0 NOMAP 8: 0x00000000fa800000..0x00000000fd3effff 0 NONE 9: 0x00000000fd3f0000..0x00000000fd3fefff 0 NOMAP 10: 0x00000000fd3ff000..0x00000000fd7fffff 0 NONE 11: 0x00000000fd800000..0x00000000fd901fff 0 NOMAP 12: 0x00000000fd902000..0x00000000fd909fff 0 NONE 13: 0x00000000fd90a000..0x00000000fd90bfff 0 NOMAP 14: 0x00000000fd90c000..0x00000000ffffffff 0 NONE 15: 0x0000000880000000..0x0000000affffffff 0 NONE
Signed-off-by: Yuwei Guan <ssawgyw@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20230519105321.333-1-ssawgyw@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3 |
|
#
fc493f83 |
| 23-Apr-2023 |
Claudio Migliorelli <claudio.migliorelli@mail.polimi.it> |
Fix some coding style errors in memblock.c
This patch removes the initialization of some static variables to 0 and `false` in the memblock source file, according to the coding style guidelines.
Sig
Fix some coding style errors in memblock.c
This patch removes the initialization of some static variables to 0 and `false` in the memblock source file, according to the coding style guidelines.
Signed-off-by: Claudio Migliorelli <claudio.migliorelli@mail.polimi.it> Link: https://lore.kernel.org/r/87r0sa7mm8.fsf@mail.polimi.it Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.25, v6.1.24, v6.1.23 |
|
#
59f876fb |
| 06-Apr-2023 |
Kirill A. Shutemov <kirill@shutemov.name> |
mm: avoid passing 0 to __ffs()
23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") results in various boot failures (hang) on arm targets Debug messages reveal the reason.
########### MAX_ORDE
mm: avoid passing 0 to __ffs()
23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") results in various boot failures (hang) on arm targets Debug messages reveal the reason.
########### MAX_ORDER=10 start=0 __ffs(start)=-1 min()=10 min_t=-1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If start==0, __ffs(start) returns 0xfffffff or (as int) -1, which min_t() interprets as such, while min() apparently uses the returned unsigned long value. Obviously a negative order isn't received well by the rest of the code.
[akpm@linux-foundation.org: fix comment, per Mike] Link: https://lkml.kernel.org/r/ZDBa7HWZK69dKKzH@kernel.org Link: https://lkml.kernel.org/r/20230406072529.vupqyrzqnhyozeyh@box.shutemov.name Fixes: 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/9460377a-38aa-4f39-ad57-fb73725f92db@roeck-us.net Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.22, v6.1.21, v6.1.20 |
|
#
23baf831 |
| 15-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definiti
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definition is counter-intuitive and lead to number of bugs all over the kernel.
Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now.
[kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
647037ad |
| 07-Feb-2023 |
Aaron Thompson <dev@aaront.org> |
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593.
The pages being freed by memblock_free_late() have al
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
This reverts commit 115d9d77bb0f9152c60b6e8646369fa7f6167593.
The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8 RIP: 0010:__list_del_entry_valid+0x3f/0x70 <TASK> __free_one_page+0x139/0x410 __free_pages_ok+0x21d/0x450 memblock_free_late+0x8c/0xb9 efi_free_boot_services+0x16b/0x25c efi_enter_virtual_mode+0x403/0x446 start_kernel+0x678/0x714 secondary_startup_64_no_verify+0xd2/0xdb </TASK>
A proper fix will be more involved so revert this change for the time being.
Fixes: 115d9d77bb0f ("mm: Always release pages to the buddy allocator in memblock_free_late().") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.10, v6.1.9 |
|
#
2fe03412 |
| 29-Jan-2023 |
Peng Zhang <zhangpeng.00@bytedance.com> |
memblock: Avoid useless checks in memblock_merge_regions().
memblock_merge_regions() is called after regions have been modified to merge the neighboring compatible regions. That will check all regio
memblock: Avoid useless checks in memblock_merge_regions().
memblock_merge_regions() is called after regions have been modified to merge the neighboring compatible regions. That will check all regions but most checks are useless.
Most of the time we only insert one or a few new regions, or modify one or a few regions. At this time, we don't need to check all the regions. We only need to check the changed regions, because other not related regions cannot be merged.
Add two parameters to memblock_merge_regions() to indicate the lower and upper boundary to scan.
Debug code that counts the number of total iterations in memblock_merge_regions(), like for instance
void memblock_merge_regions(struct memblock_type *type) { static int iteration_count = 0; static int max_nr_regions = 0;
max_nr_regions = max(max_nr_regions, (int)type->cnt); ... while () { iteration_count++; ... } pr_info("iteration_count: %d max_nr_regions %d", iteration_count, max_nr_regions); }
Produces the following numbers on a physical machine with 1T of memory:
before: [2.472243] iteration_count: 45410 max_nr_regions 178 after: [2.470869] iteration_count: 923 max_nr_regions 176
The actual startup speed seems to change little, but it does reduce the scan overhead.
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Link: https://lore.kernel.org/r/20230129090034.12310-3-zhangpeng.00@bytedance.com [rppt: massaged the changelog] Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
#
ad500fb2 |
| 29-Jan-2023 |
Peng Zhang <zhangpeng.00@bytedance.com> |
memblock: Make a boundary tighter in memblock_add_range().
When type->cnt * 2 + 1 is less than or equal to type->max, there is enough empty regions to insert.
Signed-off-by: Peng Zhang <zhangpeng.0
memblock: Make a boundary tighter in memblock_add_range().
When type->cnt * 2 + 1 is less than or equal to type->max, there is enough empty regions to insert.
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Link: https://lore.kernel.org/r/20230129090034.12310-2-zhangpeng.00@bytedance.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4 |
|
#
115d9d77 |
| 06-Jan-2023 |
Aaron Thompson <dev@aaront.org> |
mm: Always release pages to the buddy allocator in memblock_free_late().
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() only releases pages to the buddy allocator if they are
mm: Always release pages to the buddy allocator in memblock_free_late().
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() only releases pages to the buddy allocator if they are not in the deferred range. This is correct for free pages (as defined by for_each_free_mem_pfn_range_in_zone()) because free pages in the deferred range will be initialized and released as part of the deferred init process. memblock_free_pages() is called by memblock_free_late(), which is used to free reserved ranges after memblock_free_all() has run. All pages in reserved ranges have been initialized at that point, and accordingly, those pages are not touched by the deferred init process. This means that currently, if the pages that memblock_free_late() intends to release are in the deferred range, they will never be released to the buddy allocator. They will forever be reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(), which is also correct for free pages but is not correct for reserved pages. KMSAN metadata for reserved pages is initialized by kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called for free pages, and memblock_free_late() should call __free_pages_core() directly instead.
One case where this issue can occur in the wild is EFI boot on x86_64. The x86 EFI code reserves all EFI boot services memory ranges via memblock_reserve() and frees them later via memblock_free_late() (efi_reserve_boot_services() and efi_free_boot_services(), respectively). If any of those ranges happens to fall within the deferred init range, the pages will not be released and that memory will be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867
v6.2-rc2 + patch: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 # +43,949 pages
Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14 |
|
#
fa81ab49 |
| 16-Dec-2022 |
Miaoqian Lin <linmq006@gmail.com> |
memblock: Fix doc for memblock_phys_free
memblock_phys_free() is the counterpart to memblock_phys_alloc. Change memblock_alloc_xx() with memblock_phys_alloc_xx() to keep consistency.
Signed-off-by:
memblock: Fix doc for memblock_phys_free
memblock_phys_free() is the counterpart to memblock_phys_alloc. Change memblock_alloc_xx() with memblock_phys_alloc_xx() to keep consistency.
Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20221216100304.688209-1-linmq006@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
show more ...
|
Revision tags: v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66 |
|
#
5f7fa13f |
| 07-Sep-2022 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: add pageblock_align() macro
Add pageblock_align() macro and use it to simplify code.
Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wan
mm: add pageblock_align() macro
Add pageblock_align() macro and use it to simplify code.
Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
4f9bc69a |
| 07-Sep-2022 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
mm: reuse pageblock_start/end_pfn() macro
Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then they could be used somewhere else, not only in compaction, also use ALIGN_DOWN() i
mm: reuse pageblock_start/end_pfn() macro
Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then they could be used somewhere else, not only in compaction, also use ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should be same for pageblock usage.
Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48 |
|
#
450d0e74 |
| 15-Jun-2022 |
Zhou Guanghui <zhouguanghui1@huawei.com> |
memblock,arm64: expand the static memblock memory table
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB
memblock,arm64: expand the static memblock memory table
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks.
For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ...
The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost.
Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array.
Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled.
Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Darren Hart <darren@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Xu Qiang <xuqiang36@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
28e1a8f4 |
| 15-Jun-2022 |
Jinyu Tang <tjytimi@163.com> |
memblock: avoid some repeat when add new range
The worst case is that the new memory range overlaps all existing regions, which requires type->cnt + 1 empty struct memblock_region slots in the type-
memblock: avoid some repeat when add new range
The worst case is that the new memory range overlaps all existing regions, which requires type->cnt + 1 empty struct memblock_region slots in the type->regions array. So if type->cnt + 1 + type->cnt is less than type->max, we can insert regions directly rather than calculate the needed amount before the insertion. And becase of merge operation in the end of function, tpye->cnt will increase slowly for many cases.
This change allows to avoid unnecessary repeat of memblock ranges traversal for many cases when adding new memory range.
Signed-off-by: Jinyu Tang <tjytimi@163.com> [rppt: massaged comment and changelog text] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
show more ...
|
Revision tags: v5.15.47 |
|
#
c200d900 |
| 10-Jun-2022 |
Patrick Wang <patrick.wang.shcn@gmail.com> |
mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys()
Patch series "mm: kmemleak: store objects allocated with physical address separately and check when
mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys()
Patch series "mm: kmemleak: store objects allocated with physical address separately and check when scan", v4.
The kmemleak_*_phys() interface uses "min_low_pfn" and "max_low_pfn" to check address. But on some architectures, kmemleak_*_phys() is called before those two variables initialized. The following steps will be taken:
1) Add OBJECT_PHYS flag and rbtree for the objects allocated with physical address 2) Store physical address in objects if allocated with OBJECT_PHYS 3) Check the boundary when scan instead of in kmemleak_*_phys()
This patch set will solve: https://lore.kernel.org/r/20220527032504.30341-1-yee.lee@mediatek.com https://lore.kernel.org/r/9dd08bb5-f39e-53d8-f88d-bec598a08c93@gmail.com
v3: https://lore.kernel.org/r/20220609124950.1694394-1-patrick.wang.shcn@gmail.com v2: https://lore.kernel.org/r/20220603035415.1243913-1-patrick.wang.shcn@gmail.com v1: https://lore.kernel.org/r/20220531150823.1004101-1-patrick.wang.shcn@gmail.com
This patch (of 4):
Remove the unused kmemleak_not_leak_phys() function. And remove the min_count argument to kmemleak_alloc_phys() function, assume it's 0.
Link: https://lkml.kernel.org/r/20220611035551.1823303-1-patrick.wang.shcn@gmail.com Link: https://lkml.kernel.org/r/20220611035551.1823303-2-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
902c2d91 |
| 14-Jun-2022 |
Ma Wupeng <mawupeng1@huawei.com> |
memblock: Disable mirror feature if kernelcore is not specified
If system have some mirrored memory and mirrored feature is not specified in boot parameter, the basic mirrored feature will be enable
memblock: Disable mirror feature if kernelcore is not specified
If system have some mirrored memory and mirrored feature is not specified in boot parameter, the basic mirrored feature will be enabled and this will lead to the following situations:
- memblock memory allocation prefers mirrored region. This may have some unexpected influence on numa affinity.
- contiguous memory will be split into several parts if parts of them is mirrored memory via memblock_mark_mirror().
To fix this, variable mirrored_kernelcore will be checked in memblock_mark_mirror(). Mark mirrored memory with flag MEMBLOCK_MIRROR iff kernelcore=mirror is added in the kernel parameters.
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614092156.1972846-6-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
show more ...
|
#
14d9a675 |
| 14-Jun-2022 |
Ma Wupeng <mawupeng1@huawei.com> |
mm: Ratelimited mirrored memory related warning messages
If system has mirrored memory, memblock will try to allocate mirrored memory firstly and fallback to non-mirrored memory when fails, but if w
mm: Ratelimited mirrored memory related warning messages
If system has mirrored memory, memblock will try to allocate mirrored memory firstly and fallback to non-mirrored memory when fails, but if with limited mirrored memory or some numa node without mirrored memory, lots of warning message about memblock allocation will occur.
This patch ratelimit the warning message to avoid a very long print during bootup.
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220614092156.1972846-3-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
show more ...
|