Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8 |
|
#
410063c9 |
| 18-Dec-2023 |
Shuai Xue <xueshuai@linux.alibaba.com> |
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
[ Upstream commit a70297d2213253853e95f5b49651f924990c6d3b ]
There are two major types of uncorrected recoverable (U
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
[ Upstream commit a70297d2213253853e95f5b49651f924990c6d3b ]
There are two major types of uncorrected recoverable (UCR) errors :
- Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error.
- Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error.
When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For synchronous errors, kernel will kill the current process which accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all synchronous errors are handled as asynchronous errors in memory failure.
To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5 |
|
#
b7765b0a |
| 20-Sep-2023 |
Shiju Jose <shiju.jose@huawei.com> |
ACPI: APEI: Fix AER info corruption when error status data has multiple sections
[ Upstream commit e2abc47a5a1a9f641e7cacdca643fdd40729bf6e ]
ghes_handle_aer() passes AER data to the PCI core for l
ACPI: APEI: Fix AER info corruption when error status data has multiple sections
[ Upstream commit e2abc47a5a1a9f641e7cacdca643fdd40729bf6e ]
ghes_handle_aer() passes AER data to the PCI core for logging and recovery by calling aer_recover_queue() with a pointer to struct aer_capability_regs.
The problem was that aer_recover_queue() queues the pointer directly without copying the aer_capability_regs data. The pointer was to the ghes->estatus buffer, which could be reused before aer_recover_work_func() reads the data.
To avoid this problem, allocate a new aer_capability_regs structure from the ghes_estatus_pool, copy the AER data from the ghes->estatus buffer into it, pass a pointer to the new struct to aer_recover_queue(), and free it after aer_recover_work_func() has processed it.
Reported-by: Bjorn Helgaas <helgaas@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
9368aa18 |
| 19-May-2023 |
Li Yang <leoyang.li@nxp.com> |
APEI: GHES: correctly return NULL for ghes_get_devices()
Since 315bada690e0 ("EDAC: Check for GHES preference in the chipset-specific EDAC drivers"), vendor specific EDAC driver will not probe corre
APEI: GHES: correctly return NULL for ghes_get_devices()
Since 315bada690e0 ("EDAC: Check for GHES preference in the chipset-specific EDAC drivers"), vendor specific EDAC driver will not probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device is present. Make ghes_get_devices() return NULL when the GHES device list is empty to fix the problem.
Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module") Signed-off-by: Li Yang <leoyang.li@nxp.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
43482701 |
| 27-May-2023 |
Miaohe Lin <linmiaohe@huawei.com> |
ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()
ghes_estatus_pool_size_request() is unused now, so remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [ rjw: Subject and ch
ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()
ghes_estatus_pool_size_request() is unused now, so remove it.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15 |
|
#
292a089d |
| 20-Dec-2022 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed.
The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case.
This was created by using a coccinelle script and the following commands:
$ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); )
$ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch
Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4 |
|
#
dd3fa54b |
| 24-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to c
apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does:
/* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. */
New elements are added to the cache this way:
if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus);
The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported.
Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme.
And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really *is* that element that gets removed and not one which replaced it in the meantime.
Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available.
Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped.
And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case.
This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally.
Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all.
[ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ]
Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v6.0.3, v6.0.2, v5.15.74 |
|
#
36006ccb |
| 13-Oct-2022 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
ACPI: APEI: Drop unsetting driver data on remove
Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no driver is bound") the driver core cares for cleaning driver data, so don't do
ACPI: APEI: Drop unsetting driver data on remove
Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no driver is bound") the driver core cares for cleaning driver data, so don't do it in the driver, too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
0d2aa70b |
| 18-Oct-2022 |
Ard Biesheuvel <ardb@kernel.org> |
apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to c
apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()
Some documentation first, about how this machinery works:
It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does:
/* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. */
New elements are added to the cache this way:
if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus);
The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported.
Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme.
And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really *is* that element that gets removed and not one which replaced it in the meantime.
Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available.
Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped.
And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case.
This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally.
Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all.
[ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ]
Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
show more ...
|
Revision tags: v5.15.73, v6.0.1 |
|
#
802e7f1d |
| 09-Oct-2022 |
Jia He <justin.he@arm.com> |
EDAC/ghes: Make ghes_edac a proper module
Commit
dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
introduced a bug leading to ghes_edac_register() to be invoked before
EDAC/ghes: Make ghes_edac a proper module
Commit
dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
introduced a bug leading to ghes_edac_register() to be invoked before edac_init(). Because at that time the bus "edac" hadn't been even registered, this created sysfs nodes as /devices/mc0 instead of /sys/devices/system/edac/mc/mc0 on an Ampere eMag server.
Fix this by turning ghes_edac into a proper module.
The list of GHES devices returned is not protected from being modified concurrently but it is pretty static as it gets created only during GHES init and latter is not a module so...
[ bp: Massage. ]
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Co-developed-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
show more ...
|
#
9057a3f7 |
| 09-Oct-2022 |
Jia He <justin.he@arm.com> |
EDAC/ghes: Prepare to make ghes_edac a proper module
To make ghes_edac a proper module, prepare to decouple its dependencies from GHES.
Move the ghes_edac.force_load parameter to ghes.c in order to
EDAC/ghes: Prepare to make ghes_edac a proper module
To make ghes_edac a proper module, prepare to decouple its dependencies from GHES.
Move the ghes_edac.force_load parameter to ghes.c in order to properly control whether ghes_edac should be force-loaded: In ghes_edac_register() it is too late to set the module flag.
Introduce a helper ghes_get_devices(), which returns the list of GHES devices which got probed when the platform-check passes on the system.
The previous force_load check is not needed in ghes_edac_unregister() since it will be checked in the module's init function of ghes_edac later.
[ bp: Massage. ]
Suggested-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
show more ...
|
#
8e40612f |
| 09-Oct-2022 |
Jia He <justin.he@arm.com> |
EDAC/ghes: Add a notifier for reporting memory errors
In order to make it a proper module and disentangle it from facilities, add a notifier for reporting memory errors. Use an atomic notifier becau
EDAC/ghes: Add a notifier for reporting memory errors
In order to make it a proper module and disentangle it from facilities, add a notifier for reporting memory errors. Use an atomic notifier because calls sites like ghes_proc_in_irq() run in interrupt context.
[ bp: Massage commit message. ]
Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
show more ...
|
#
43d27483 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v5.15.72, v6.0, v5.15.71 |
|
#
415fed69 |
| 24-Sep-2022 |
Shuai Xue <xueshuai@linux.alibaba.com> |
ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
If an error is detected as a result of user-space process accessing a corrupt memory location, the CPU may take an abort. Then
ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
If an error is detected as a result of user-space process accessing a corrupt memory location, the CPU may take an abort. Then the platform firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA, NOTIFY_SOFTWARE_DELEGATED, etc.
For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue so that the work is processed before return to user-space.
The code use init_mm to check whether the error occurs in user space:
if (current->mm != &init_mm)
The condition is always true, becase _nobody_ ever has "init_mm" as a real VM any more.
In addition to abort, errors can also be signaled as asynchronous exceptions, such as interrupt and SError. In such case, the interrupted current process could be any kind of thread. When a kernel thread is interrupted, the work ghes_kick_task_work deferred to task_work will never be processed because entry_handler returns to call ret_to_kernel() instead of ret_to_user(). Consequently, the estatus_node alloced from ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed. After around 200 allocations in our platform, the ghes_estatus_pool will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a result, the event failed to be processed.
sdei: event 805 on CPU 113 failed with error: -2
Finally, a lot of unhandled events may cause platform firmware to exceed some threshold and reboot.
The condition should generally just do
if (current->mm)
as described in active_mm.rst documentation.
Then if an asynchronous error is detected when a kernel thread is running, (e.g. when detected by a background scrubber), do not add task_work to it as the original patch intends to do.
Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26 |
|
#
27e932a3 |
| 27-Feb-2022 |
Shuai Xue <xueshuai@linux.alibaba.com> |
ACPI: APEI: rename ghes_init() with an "acpi_" prefix
ghes_init() sticks out in acpi_init() because it is the only functions without an "acpi_" prefix.
Rename ghes_init with an "acpi_" prefix, then
ACPI: APEI: rename ghes_init() with an "acpi_" prefix
ghes_init() sticks out in acpi_init() because it is the only functions without an "acpi_" prefix.
Rename ghes_init with an "acpi_" prefix, then all looks fine.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
#
dc4e8c07 |
| 27-Feb-2022 |
Shuai Xue <xueshuai@linux.alibaba.com> |
ACPI: APEI: explicit init of HEST and GHES in apci_init()
From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus memory pool") was merged, ghes_init() relies on acpi_hest_init() to m
ACPI: APEI: explicit init of HEST and GHES in apci_init()
From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage the estatus memory pool. On the other hand, ghes_init() relies on sdei_init() to detect the SDEI version and (un)register events. The dependencies are as follows:
ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init() ghes_init() => sdei_init()
HEST is not PCI-specific and initcall ordering is implicit and not well-defined within a level.
Based on above, remove acpi_hest_init() from acpi_pci_root_init() and convert ghes_init() and sdei_init() from initcalls to explicit calls in the following order:
acpi_hest_init() ghes_init() sdei_init()
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15 |
|
#
3ad6fd77 |
| 26-Oct-2021 |
Tony Luck <tony.luck@intel.com> |
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
SGX EPC pages do not have a "struct page" associated with them so the pfn_valid() sanity check fails and results in a warning message to
x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
SGX EPC pages do not have a "struct page" associated with them so the pfn_valid() sanity check fails and results in a warning message to the console.
Add an additional check to skip the warning if the address of the error is in an SGX EPC page.
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c50ec157 |
| 05-Oct-2022 |
Ashish Kalra <ashish.kalra@amd.com> |
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causin
ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|