81ac1e88 | 09-Aug-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
module: make waiting for a concurrent module loader interruptible
[ Upstream commit 2124d84db293ba164059077944e6b429ba530495 ]
The recursive aes-arm-bs module load situation reported by Russell Kin
module: make waiting for a concurrent module loader interruptible
[ Upstream commit 2124d84db293ba164059077944e6b429ba530495 ]
The recursive aes-arm-bs module load situation reported by Russell King is getting fixed in the crypto layer, but this in the meantime fixes the "recursive load hangs forever" by just making the waiting for the first module load be interruptible.
This should now match the old behavior before commit 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent"), which used the different "wait for module to be ready" code in module_patient_check_exists().
End result: a recursive module load will still block, but now a signal will interrupt it and fail the second module load, at which point the first module will successfully complete loading.
Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent") Cc: Russell King <linux@armlinux.org.uk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
a419beac | 29-Aug-2023 |
Andrea Righi <andrea.righi@canonical.com> |
module/decompress: use vmalloc() for zstd decompression workspace
Using kmalloc() to allocate the decompression workspace for zstd may trigger the following warning when large modules are loaded (i.
module/decompress: use vmalloc() for zstd decompression workspace
Using kmalloc() to allocate the decompression workspace for zstd may trigger the following warning when large modules are loaded (i.e., xfs):
[ 2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350 ... [ 2.989033] Call Trace: [ 2.989841] <TASK> [ 2.990614] ? show_regs+0x6d/0x80 [ 2.991573] ? __warn+0x89/0x160 [ 2.992485] ? __alloc_pages+0x2c3/0x350 [ 2.993520] ? report_bug+0x17e/0x1b0 [ 2.994506] ? handle_bug+0x51/0xa0 [ 2.995474] ? exc_invalid_op+0x18/0x80 [ 2.996469] ? asm_exc_invalid_op+0x1b/0x20 [ 2.997530] ? module_zstd_decompress+0xdc/0x2a0 [ 2.998665] ? __alloc_pages+0x2c3/0x350 [ 2.999695] ? module_zstd_decompress+0xdc/0x2a0 [ 3.000821] __kmalloc_large_node+0x7a/0x150 [ 3.001920] __kmalloc+0xdb/0x170 [ 3.002824] module_zstd_decompress+0xdc/0x2a0 [ 3.003857] module_decompress+0x37/0xc0 [ 3.004688] init_module_from_file+0xd0/0x100 [ 3.005668] idempotent_init_module+0x11c/0x2b0 [ 3.006632] __x64_sys_finit_module+0x64/0xd0 [ 3.007568] do_syscall_64+0x59/0x90 [ 3.008373] ? ksys_read+0x73/0x100 [ 3.009395] ? exit_to_user_mode_prepare+0x30/0xb0 [ 3.010531] ? syscall_exit_to_user_mode+0x37/0x60 [ 3.011662] ? do_syscall_64+0x68/0x90 [ 3.012511] ? do_syscall_64+0x68/0x90 [ 3.013364] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
However, continuous physical memory does not seem to be required in module_zstd_decompress(), so use vmalloc() instead, to prevent the warning and avoid potential failures at loading compressed modules.
Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
2abcc4b5 | 01-Aug-2023 |
James Morse <james.morse@arm.com> |
module: Expose module_init_layout_section()
module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when
module: Expose module_init_layout_section()
module_init_layout_section() choses whether the core module loader considers a section as init or not. This affects the placement of the exit section when module unloading is disabled. This code will never run, so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying relocations based on the section name. The init PLTs are stored separately so they can be free()d. arm and arm64 both use within_module_init() to decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading is disabled, both architecture would need to take this into account when counting the PLTs.
Today neither architecture does this, meaning when module unloading is disabled there are insufficient PLTs in the init section to load some modules, resulting in warnings: | WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc | Modules linked in: crct10dif_common | CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208 | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : module_emit_plt_entry+0x184/0x1cc | lr : module_emit_plt_entry+0x94/0x1cc | sp : ffffffc0803bba60 [...] | Call trace: | module_emit_plt_entry+0x184/0x1cc | apply_relocate_add+0x2bc/0x8e4 | load_module+0xe34/0x1bd4 | init_module_from_file+0x84/0xc0 | __arm64_sys_finit_module+0x1b8/0x27c | invoke_syscall.constprop.0+0x5c/0x104 | do_el0_svc+0x58/0x160 | el0_svc+0x38/0x110 | el0t_64_sync_handler+0xc0/0xc4 | el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com> Fixes: 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") Cc: stable@vger.kernel.org Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
ff09f6fd | 21-Jul-2023 |
Palmer Dabbelt <palmer@rivosinc.com> |
modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
Trying to restrict the '$'-prefix change to RISC-V caused some fallout, so let's just treat all those symbols as special.
Fixes:
modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
Trying to restrict the '$'-prefix change to RISC-V caused some fallout, so let's just treat all those symbols as special.
Fixes: c05780ef3c190 ("module: Ignore RISC-V mapping symbols too") Link: https://lore.kernel.org/all/20230712015747.77263-1-wangkefeng.wang@huawei.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
9b9879fc | 29-May-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
modules: catch concurrent module loads, treat them as idempotent
This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or
modules: catch concurrent module loads, treat them as idempotent
This is the new-and-improved attempt at avoiding huge memory load spikes when the user space boot sequence tries to load hundreds (or even thousands) of redundant duplicate modules in parallel.
See commit 9828ed3f695a ("module: error out early on concurrent load of the same module file") for background and an earlier failed attempt that was reverted.
That earlier attempt just said "concurrently loading the same module is silly, just open the module file exclusively and return -ETXTBSY if somebody else is already loading it".
While it is true that concurrent module loads of the same module is silly, the reason that earlier attempt then failed was that the concurrently loaded module would often be a prerequisite for another module.
Thus failing to load the prerequisite would then cause cascading failures of the other modules, rather than just short-circuiting that one unnecessary module load.
At the same time, we still really don't want to load the contents of the same module file hundreds of times, only to then wait for an eventually successful load, and have everybody else return -EEXIST.
As a result, this takes another approach, and treats concurrent module loads from the same file as "idempotent" in the inode. So if one module load is ongoing, we don't start a new one, but instead just wait for the first one to complete and return the same return value as it did.
So unlike the first attempt, this does not return early: the intent is not to speed up the boot, but to avoid a thundering herd problem in allocating memory (both physical and virtual) for a module more than once.
Also note that this does change behavior: it used to be that when you had concurrent loads, you'd have one "winner" that would return success, and everybody else would return -EEXIST.
In contrast, this idempotent logic goes all Oprah on the problem, and says "You are a winner! And you are a winner! We are ALL winners". But since there's no possible actual real semantic difference between "you loaded the module" and "somebody else already loaded the module", this is more of a feel-good change than an actual honest-to-goodness semantic change.
Of course, any true Johnny-come-latelies that don't get caught in the concurrency filter will still return -EEXIST. It's no different from not even getting a seat at an Oprah taping. That's life.
See the long thread on the kernel mailing list about this all, which includes some numbers for memory use before and after the patch.
Link: https://lore.kernel.org/lkml/20230524213620.3509138-1-mcgrof@kernel.org/ Reviewed-by: Johan Hovold <johan@kernel.org> Tested-by: Johan Hovold <johan@kernel.org> Tested-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Rudi Heitbaum <rudi@heitbaum..com> Tested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
fadb74f9 | 01-Jun-2023 |
Lucas De Marchi <lucas.demarchi@intel.com> |
module/decompress: Fix error checking on zstd decompression
While implementing support for in-kernel decompression in kmod, finit_module() was returning a very suspicious value:
finit_module(3, ""
module/decompress: Fix error checking on zstd decompression
While implementing support for in-kernel decompression in kmod, finit_module() was returning a very suspicious value:
finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296
It turns out the check for module_get_next_page() failing is wrong, and hence the decompression was not really taking place. Invert the condition to fix it.
Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression") Cc: stable@kernel.org Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
db3e33dd | 28-May-2023 |
Song Liu <song@kernel.org> |
module: fix module load for ia64
Frank reported boot regression in ia64 as:
ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux ver
module: fix module load for ia64
Frank reported boot regression in ia64 as:
ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [ 0.000000] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [ 0.000000] efi: EFI v1.1 by HP [ 0.000000] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [ 0.000000] PCDP: v3 at 0x3fe28000 [ 0.000000] earlycon: uart8250 at MMIO 0x00000000f4050000 (options '9600n8') [ 0.000000] printk: bootconsole [uart8250] enabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x000000003FE2A000 000028 (v02 HP ) [ 0.000000] ACPI: XSDT 0x000000003FE2A02C 0000CC (v01 HP rx2620 00000000 HP 00000000) [...] [ 3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [ 3.951100] ------------[ cut here ]------------ [ 3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [ 3.949512] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.951100] Modules linked in: [ 3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [ 3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [ 3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [ 3.951774] [ 3.951774] Call Trace: [ 3.958339] Unable to handle kernel paging request at virtual address 1000000000000000 [ 3.956161] Modules linked in: [ 3.951774] [<a0000001000156d0>] show_stack.part.0+0x30/0x60 [ 3.951774] sp=e000000183a67b20 bsp=e000000183a61628 [ 3.956161] [ 3.956161]
which bisect to module_memory change [1].
Debug showed that ia64 uses some special sections:
__layout_sections: section .got (sh_flags 10000002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 10000003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 10000003) matched to MOD_INVALID
All these sections are loaded to module core memory before [1].
Fix ia64 boot by loading these sections to MOD_DATA (core rw data).
[1] commit ac3b43283923 ("module: replace module_layout with module_memory")
Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Frank Scheiner <frank.scheiner@web.de> Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html Closes: https://marc.info/?l=linux-ia64&m=168509859125505 Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Song Liu <song@kernel.org> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
cb0b50b8 | 09-May-2023 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
module: Remove preempt_disable() from module reference counting.
The preempt_disable() section in module_put() was added in commit e1783a240f491 ("module: Use this_cpu_xx to dynamically allocate
module: Remove preempt_disable() from module reference counting.
The preempt_disable() section in module_put() was added in commit e1783a240f491 ("module: Use this_cpu_xx to dynamically allocate counters")
while the per-CPU counter were switched to another API. The API requires that during the RMW operation the CPU remained the same.
This counting API was later replaced with atomic_t in commit 2f35c41f58a97 ("module: Replace module_ref with atomic_t refcnt")
Since this atomic_t replacement there is no need to keep preemption disabled while the reference counter is modified.
Remove preempt_disable() from module_put(), __module_get() and try_module_get().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
8660484e | 14-Apr-2023 |
Luis Chamberlain <mcgrof@kernel.org> |
module: add debugging auto-load duplicate module support
The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module()
module: add debugging auto-load duplicate module support
The finit_module() system call can in the worst case use up to more than twice of a module's size in virtual memory. Duplicate finit_module() system calls are non fatal, however they unnecessarily strain virtual memory during bootup and in the worst case can cause a system to fail to boot. This is only known to currently be an issue on systems with larger number of CPUs.
To help debug this situation we need to consider the different sources for finit_module(). Requests from the kernel that rely on module auto-loading, ie, the kernel's *request_module() API, are one source of calls. Although modprobe checks to see if a module is already loaded prior to calling finit_module() there is a small race possible allowing userspace to trigger multiple modprobe calls racing against modprobe and this not seeing the module yet loaded.
This adds debugging support to the kernel module auto-loader (*request_module() calls) to easily detect duplicate module requests. To aid with possible bootup failure issues incurred by this, it will converge duplicates requests to a single request. This avoids any possible strain on virtual memory during bootup which could be incurred by duplicate module autoloading requests.
Folks debugging virtual memory abuse on bootup can and should enable this to see what pr_warn()s come on, to see if module auto-loading is to blame for their wores. If they see duplicates they can further debug this by enabling the module.enable_dups_trace kernel parameter or by enabling CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE.
Current evidence seems to point to only a few duplicates for module auto-loading. And so the source for other duplicates creating heavy virtual memory pressure due to larger number of CPUs should becoming from another place (likely udev).
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
a81b1fc8 | 18-Apr-2023 |
Arnd Bergmann <arnd@arndb.de> |
module: stats: fix invalid_mod_bytes typo
This was caught by randconfig builds but does not show up in build testing without CONFIG_MODULE_DECOMPRESS:
kernel/module/stats.c: In function 'mod_stat_b
module: stats: fix invalid_mod_bytes typo
This was caught by randconfig builds but does not show up in build testing without CONFIG_MODULE_DECOMPRESS:
kernel/module/stats.c: In function 'mod_stat_bump_invalid': kernel/module/stats.c:229:42: error: 'invalid_mod_byte' undeclared (first use in this function); did you mean 'invalid_mod_bytes'? 229 | atomic_long_add(info->compressed_len, &invalid_mod_byte); | ^~~~~~~~~~~~~~~~ | invalid_mod_bytes
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
9f5cab17 | 17-Apr-2023 |
Tom Rix <trix@redhat.com> |
module: remove use of uninitialized variable len
clang build reports kernel/module/stats.c:307:34: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized] len = scn
module: remove use of uninitialized variable len
clang build reports kernel/module/stats.c:307:34: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized] len = scnprintf(buf + 0, size - len, ^~~ At the start of this sequence, neither the '+ 0', nor the '- len' are needed. So remove them and fix using 'len' uninitalized.
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
719ccd80 | 17-Apr-2023 |
Arnd Bergmann <arnd@arndb.de> |
module: fix building stats for 32-bit targets
The new module statistics code mixes 64-bit types and wordsized 'long' variables, which leads to build failures on 32-bit architectures:
kernel/module/
module: fix building stats for 32-bit targets
The new module statistics code mixes 64-bit types and wordsized 'long' variables, which leads to build failures on 32-bit architectures:
kernel/module/stats.c: In function 'read_file_mod_stats': kernel/module/stats.c:291:29: error: passing argument 1 of 'atomic64_read' from incompatible pointer type [-Werror=incompatible-pointer-types] 291 | total_size = atomic64_read(&total_mod_size); x86_64-linux-ld: kernel/module/stats.o: in function `read_file_mod_stats': stats.c:(.text+0x2b2): undefined reference to `__udivdi3'
To fix this, the code has to use one of the two types consistently.
Change them all to word-size types here.
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|