#
8e84b2d1 |
| 09-Aug-2017 |
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> |
powerpc/powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the firmware how to distribute/throttle power between different entities in a
powerpc/powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the firmware how to distribute/throttle power between different entities in a system (e.g CPU v/s GPU). This ratio is used by OCC for power capping algorithm.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
cb8b340d |
| 09-Aug-2017 |
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> |
powerpc/powernv: Add support for powercap framework
Adds a generic powercap framework to change the system powercap inband through OPAL-OCC command/response interface.
Signed-off-by: Shilpasri G Bh
powerpc/powernv: Add support for powercap framework
Adds a generic powercap framework to change the system powercap inband through OPAL-OCC command/response interface.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
8f95faaa |
| 18-Jul-2017 |
Madhavan Srinivasan <maddy@linux.vnet.ibm.com> |
powerpc/powernv: Detect and create IMC device
Code to create platform device for the In-Memory Collection (IMC) counters. Platform devices are created based on the IMC compatibility. New header file
powerpc/powernv: Detect and create IMC device
Code to create platform device for the In-Memory Collection (IMC) counters. Platform devices are created based on the IMC compatibility. New header file created to contain the data structures and macros needed for In-Memory Collection (IMC) counter pmu devices.
The device tree for IMC counters starts at the node "imc-counters". This node contains all the IMC PMU nodes and event nodes for these IMC PMUs. Device probe() parses the device to locate three possible IMC device types (Nest/Core/Thread). Function then branch to parse each unit nodes to populate vital information such as device memory sizes, event nodes information, base address for reserve memory access (if any) and so on. Simple bare-minimum shutdown function added which only "stops" the engines.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Fix build with CONFIG_PERF_EVENTS=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
a70b487b |
| 17-Jul-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
In commit 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9"), we added additional flags to the OPAL
powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
In commit 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9"), we added additional flags to the OPAL call to configure CPUs at boot.
These flags only work on Power9 firmwares, and worse can cause boot failures on Power8 machines, so we check for CPU_FTR_ARCH_300 (aka POWER9) before adding the extra flags.
Unfortunately we forgot that opal_configure_cores() is called before the CPU feature checks are dynamically patched, meaning the check always returns true.
We definitely need to do something to make the CPU feature checks less prone to bugs like this, but for now the minimal fix is to use early_cpu_has_feature().
Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: 1c0eaf0f56d6 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
1c0eaf0f |
| 30-Jun-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/powernv: Tell OPAL about our MMU mode on POWER9
That will allow OPAL to configure the CPU in an optimal way.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
powerpc/powernv: Tell OPAL about our MMU mode on POWER9
That will allow OPAL to configure the CPU in an optimal way.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
5af50993 |
| 05-Apr-2017 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is nec
KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller
This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively.
This has been lightly tested on an actual system, including PCI pass-through with a TG3 device.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
83c49190 |
| 26-Apr-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/powernv: Fix missing attr initialisation in opal_export_attrs()
In opal_export_attrs() we dynamically allocate some bin_attributes. They're allocated with kmalloc() and although we initialis
powerpc/powernv: Fix missing attr initialisation in opal_export_attrs()
In opal_export_attrs() we dynamically allocate some bin_attributes. They're allocated with kmalloc() and although we initialise most of the fields, we don't initialise write() or mmap(), and in particular we don't initialise the lockdep related fields in the embedded struct attribute.
This leads to a lockdep warning at boot:
BUG: key c0000000f11906d8 not in .data! WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0 ... Call Trace: lockdep_init_map+0x288/0x2a0 (unreliable) __kernfs_create_file+0x8c/0x170 sysfs_add_file_mode_ns+0xc8/0x240 __machine_initcall_powernv_opal_init+0x60c/0x684 do_one_initcall+0x60/0x1c0 kernel_init_freeable+0x2f4/0x3d4 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0xb0
Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep fields.
Fixes: 11fe909d2362 ("powerpc/powernv: Add OPAL exports attributes to sysfs") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
11fe909d |
| 29-Mar-2017 |
Matt Brown <matthew.brown.dev@gmail.com> |
powerpc/powernv: Add OPAL exports attributes to sysfs
New versions of OPAL have a device node /ibm,opal/firmware/exports, each property of which describes a range of memory in OPAL that Linux might
powerpc/powernv: Add OPAL exports attributes to sysfs
New versions of OPAL have a device node /ibm,opal/firmware/exports, each property of which describes a range of memory in OPAL that Linux might want to export to userspace for debugging.
This patch adds a sysfs file under 'opal/exports' for each property found there, and makes it read-only by root.
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Drop counting of props, rename to attr, free on sysfs error, c'log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
63f44d65 |
| 03-Apr-2017 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/book3s: Print task info if we take a machine check in user mode
For an MCE (Machine Check Exception) that hits while in user mode MSR(PR=1), print the task info to the console MCE error log.
powerpc/book3s: Print task info if we take a machine check in user mode
For an MCE (Machine Check Exception) that hits while in user mode MSR(PR=1), print the task info to the console MCE error log. This may help to identify an application that triggered the MCE.
After this patch the MCE console looks like:
Severe Machine check interrupt [Recovered] NIP: [0000000010039778] PID: 762 Comm: ebizzy Initiator: CPU Error type: SLB [Multihit] Effective address: 0000000010039778
Severe Machine check interrupt [Not recovered] NIP: [0000000010039778] PID: 763 Comm: ebizzy Initiator: CPU Error type: UE [Page table walk ifetch] Effective address: 0000000010039778 ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
1363875b |
| 27-Feb-2017 |
Nicholas Piggin <npiggin@gmail.com> |
powerpc/64s: fix handling of non-synchronous machine checks
A synchronous machine check is an exception raised by the attempt to execute the current instruction. If the error can't be corrected, it
powerpc/64s: fix handling of non-synchronous machine checks
A synchronous machine check is an exception raised by the attempt to execute the current instruction. If the error can't be corrected, it can make sense to SIGBUS the currently running process.
In other cases, the error condition is not related to the current instruction, so killing the current process is not the right thing to do.
Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no practical change. It will be used to handle POWER9 asynchronous machine checks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
1d0761d2 |
| 13-Dec-2016 |
Alistair Popple <alistair@popple.id.au> |
powerpc/powernv: Initialise nest mmu
POWER9 contains an off core mmu called the nest mmu (NMMU). This is used by other hardware units on the chip to translate virtual addresses into real addresses.
powerpc/powernv: Initialise nest mmu
POWER9 contains an off core mmu called the nest mmu (NMMU). This is used by other hardware units on the chip to translate virtual addresses into real addresses. The unit attempting an address translation provides the majority of the context required for the translation request except for the base address of the partition table (ie. the PTCR) which needs to be programmed into the NMMU.
This patch adds a call to OPAL to set the PTCR for the nest mmu in opal_init().
Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
ffe6d810 |
| 20-Nov-2016 |
Paul Mackerras <paulus@ozlabs.org> |
powerpc/powernv: Define real-mode versions of OPAL XICS accessors
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi() and opal_int_set_mfrr(), for use by KVM real-mode code.
It a
powerpc/powernv: Define real-mode versions of OPAL XICS accessors
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi() and opal_int_set_mfrr(), for use by KVM real-mode code.
It also exports opal_int_set_mfrr() so that the modular part of KVM can use it to send IPIs.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18 |
|
#
9e4f51bd |
| 10-Aug-2016 |
Jack Miller <jack@codezen.org> |
powerpc/powernv: Simplify searching for compatible device nodes
This condenses the opal node searching into a single function that finds all compatible nodes, instead of just searching the ibm,opal
powerpc/powernv: Simplify searching for compatible device nodes
This condenses the opal node searching into a single function that finds all compatible nodes, instead of just searching the ibm,opal children, for ipmi, flash, and prd similar to how opal-i2c nodes are found.
Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.4.17 |
|
#
c74dd88e |
| 09-Aug-2016 |
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console m
powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console message still shows it as recovered. This patch fixes the MCE console messages.
Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: openbmc-4.4-20160804-1, v4.4.16, v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4 |
|
#
d3cbff1b |
| 05-Jul-2016 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc: Put exception configuration in a common place
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits
powerpc: Put exception configuration in a common place
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
a203658b |
| 03-Jul-2016 |
Benjamin Herrenschmidt <benh@kernel.crashing.org> |
powerpc/opal: Wake up kopald polling thread before waiting for events
On some environments (prototype machines, some simulators, etc...) there is no functional interrupt source to signal completion,
powerpc/opal: Wake up kopald polling thread before waiting for events
On some environments (prototype machines, some simulators, etc...) there is no functional interrupt source to signal completion, so we rely on the fairly slow OPAL heartbeat.
In a number of cases, the calls complete very quickly or even immediately. We've observed that it helps a lot to wakeup the OPAL heartbeat thread before waiting for event in those cases, it will call OPAL immediately to collect completions for anything that finished fast enough.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
43a1dd9b |
| 28-Jun-2016 |
Suraj Jitindar Singh <sjitindarsingh@gmail.com> |
powerpc/powernv: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machin
powerpc/powernv: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machines with FSPs.
This will allow status information to be presented on the display which is visible to a user.
The driver implements a character buffer which a user can read/write by accessing the device (/dev/op_panel). This buffer is then displayed on the operator panel display. Any attempt to write past the last character position will have no effect and attempts to write more characters than the size of the display will be truncated. The device may only be accessed by a single process at a time.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6, v4.4.10, openbmc-20160511-1, openbmc-20160505-1, v4.4.9, v4.4.8, v4.4.7, openbmc-20160329-2, openbmc-20160329-1, openbmc-20160321-1, v4.4.6, v4.5, v4.4.5, v4.4.4, v4.4.3, openbmc-20160222-1, v4.4.2, openbmc-20160212-1, openbmc-20160210-1 |
|
#
9b4fffa1 |
| 09-Feb-2016 |
Andrew Donnellan <andrew.donnellan@au1.ibm.com> |
powerpc/powernv: new function to access OPAL msglog
Currently, the OPAL msglog/console buffer is exposed as a sysfs file, with the sysfs read handler responsible for retrieving the log from the OPAL
powerpc/powernv: new function to access OPAL msglog
Currently, the OPAL msglog/console buffer is exposed as a sysfs file, with the sysfs read handler responsible for retrieving the log from the OPAL buffer. We'd like to be able to use it in xmon as well.
Refactor the OPAL msglog code to create a new function, opal_msglog_copy(), that copies to an arbitrary buffer. Separate the initialisation code into generic memcons init and sysfs file creation.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: openbmc-20160202-2, openbmc-20160202-1, v4.4.1, openbmc-20160127-1, openbmc-20160120-1, v4.4 |
|
#
dc3799bb |
| 21-Dec-2015 |
Andrew Donnellan <andrew.donnellan@au1.ibm.com> |
powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
Fix off-by-one error in opal_mce_check_early_recovery() when checking whether the NIP falls within OPAL space.
Signed-
powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
Fix off-by-one error in opal_mce_check_early_recovery() when checking whether the NIP falls within OPAL space.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: openbmc-20151217-1, openbmc-20151210-1, openbmc-20151202-1 |
|
#
affddff6 |
| 27-Nov-2015 |
Russell Currey <ruscur@russell.cc> |
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
On BMC machines, console output is controlled by the OPAL firmware and is only flushed when its pollers are called. When the
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
On BMC machines, console output is controlled by the OPAL firmware and is only flushed when its pollers are called. When the kernel is in a panic state, it no longer calls these pollers and thus console output does not completely flush, causing some output from the panic to be lost.
Output is only actually lost when the kernel is configured to not power off or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL flushes the console buffer as part of its power down routines. Before this patch, however, only partial output would be printed during the timeout wait.
This patch adds a new kmsg_dumper which gets called at panic time to ensure panic output is not lost. It accomplishes this by calling OPAL_CONSOLE_FLUSH in the OPAL API, and if that is not available, the pollers are called enough times to (hopefully) completely flush the buffer.
The flushing mechanism will only affect output printed at and before the kmsg_dump call in kernel/panic.c:panic(). As such, the "end Kernel panic" message may still be truncated as follows:
>Call Trace: >[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable) >[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4 >[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c >[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254 >[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350 >[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130 >[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac >---[ end Kernel panic - not
This functionality is implemented as a kmsg_dumper as it seems to be the most sensible way to introduce platform-specific functionality to the panic function.
Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
e4d54f71 |
| 09-Dec-2015 |
Stewart Smith <stewart@linux.vnet.ibm.com> |
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OP
powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels.
So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
7261aafc |
| 09-Dec-2015 |
Stewart Smith <stewart@linux.vnet.ibm.com> |
powerpc/powernv: Remove OPALv2 firmware define and references
OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3.
The probability of there b
powerpc/powernv: Remove OPALv2 firmware define and references
OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3.
The probability of there being an OPALv2 system still powered on anywhere inside IBM is approximately zero, let alone anyone expecting to run mainline kernels.
So, start to remove references to OPALv2.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
786842b6 |
| 09-Dec-2015 |
Stewart Smith <stewart@linux.vnet.ibm.com> |
powerpc/powernv: panic() on OPAL < V3
The OpenPower Abstraction Layer firmware went through a couple of iterations in the lab before being released. What we now know as OPAL advertises itself as OPA
powerpc/powernv: panic() on OPAL < V3
The OpenPower Abstraction Layer firmware went through a couple of iterations in the lab before being released. What we now know as OPAL advertises itself as OPALv3.
OPALv2 and OPALv1 never made it outside the lab, and the possibility of anyone at all ever building a mainline kernel today and expecting it to boot on such hardware is zero.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
#
98da62b7 |
| 10-Dec-2015 |
Stewart Smith <stewart@linux.vnet.ibm.com> |
powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
When running on newer OPAL firmware that supports sending extra OPAL_MSG types, we would print a warning on *every* message received.
This
powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
When running on newer OPAL firmware that supports sending extra OPAL_MSG types, we would print a warning on *every* message received.
This could be a problem for kernels that don't support OPAL_MSG_OCC on machines that are running real close to thermal limits and the OCC is throttling the chip. For a kernel that is paying attention to the message queue, we could get these notifications quite often.
Conceivably, future message types could also come fairly often, and printing that we didn't understand them 10,000 times provides no further information than printing them once.
Cc: stable@vger.kernel.org Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|
Revision tags: openbmc-20151123-1, openbmc-20151118-1, openbmc-20151104-1, v4.3, openbmc-20151102-1, openbmc-20151028-1 |
|
#
f2dd80ec |
| 23-Sep-2015 |
Daniel Axtens <dja@axtens.net> |
powerpc/powernv: Panic on unhandled Machine Check
All unrecovered machine check errors on PowerNV should cause an immediate panic. There are 2 reasons that this is the right policy: it's not safe to
powerpc/powernv: Panic on unhandled Machine Check
All unrecovered machine check errors on PowerNV should cause an immediate panic. There are 2 reasons that this is the right policy: it's not safe to continue, and we're already trying to reboot.
Firstly, if we go through the recovery process and do not successfully recover, we can't be sure about the state of the machine, and it is not safe to recover and proceed.
Linux knows about the following sources of Machine Check Errors: - Uncorrectable Errors (UE) - Effective - Real Address Translation (ERAT) - Segment Lookaside Buffer (SLB) - Translation Lookaside Buffer (TLB) - Unknown/Unrecognised
In the SLB, TLB and ERAT cases, we can further categorise these as parity errors, multihit errors or unknown/unrecognised.
We can handle SLB errors by flushing and reloading the SLB. We can handle TLB and ERAT multihit errors by flushing the TLB. (It appears we may not handle TLB and ERAT parity errors: I will investigate further and send a followup patch if appropriate.)
This leaves us with uncorrectable errors. Uncorrectable errors are usually the result of ECC memory detecting an error that it cannot correct, but they also crop up in the context of PCI cards failing during DMA writes, and during CAPI error events.
There are several types of UE, and there are 3 places a UE can occur: Skiboot, the kernel, and userspace. For Skiboot errors, we have the facility to make some recoverable. For userspace, we can simply kill (SIGBUS) the affected process. We have no meaningful way to deal with UEs in kernel space or in unrecoverable sections of Skiboot.
Currently, these unrecovered UEs fall through to machine_check_expection() in traps.c, which calls die(), which OOPSes and sends SIGBUS to the process. This sometimes allows us to stumble onwards. For example we've seen UEs kill the kernel eehd and khugepaged. However, the process killed could have held a lock, or it could have been a more important process, etc: we can no longer make any assertions about the state of the machine. Similarly if we see a UE in skiboot (and again we've seen this happen), we're not in a position where we can make any assertions about the state of the machine.
Likewise, for unknown or unrecognised errors, we're not able to say anything about the state of the machine.
Therefore, if we have an unrecovered MCE, the most appropriate thing to do is to panic.
The second reason is that since e784b6499d9c ("powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors."), we attempt a special OPAL reboot on an unhandled MCE. This is so the hardware can record error data for later debugging.
The comments in that commit assert that we are heading down the panic path anyway. At the moment this is not always true. With UEs in kernel space, for instance, they are marked as recoverable by the hardware, so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't panic() but would simply die() and OOPS. It doesn't make sense to be staggering on if we've just tried to reboot: we should panic().
Explicitly panic() on unrecovered MCEs on PowerNV. Update the comments appropriately.
This fixes some hangs following EEH events on cxlflash setups.
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|