| 087820e3 | 25-Mar-2020 |
Greg Kurz <groug@kaod.org> |
spapr: Drop CAS reboot flag
The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set
spapr: Drop CAS reboot flag
The CAS reboot flag is false by default and all the locations that could set it to true have been dropped. This means that all code blocks depending on the flag being set is dead code and the other code blocks should be executed always.
Just do that and drop the now uneeded CAS reboot flag. Fix a comment on the way to make checkpatch happy.
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158514994893.478799.11772512888322840990.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| edfdbf9c | 16-Mar-2020 |
Nicholas Piggin <npiggin@gmail.com> |
ppc/spapr: Add FWNMI System Reset state
The FWNMI option must deliver system reset interrupts to their registered address, and there are a few constraints on the handler addresses specified in PAPR.
ppc/spapr: Add FWNMI System Reset state
The FWNMI option must deliver system reset interrupts to their registered address, and there are a few constraints on the handler addresses specified in PAPR. Add the system reset address state and checks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-4-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviwed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 8af7e1fe | 16-Mar-2020 |
Nicholas Piggin <npiggin@gmail.com> |
ppc/spapr: Change FWNMI names
The option is called "FWNMI", and it involves more than just machine checks, also machine checks can be delivered without the FWNMI option, so re-name various things to
ppc/spapr: Change FWNMI names
The option is called "FWNMI", and it involves more than just machine checks, also machine checks can be delivered without the FWNMI option, so re-name various things to reflect that.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20200316142613.121089-3-npiggin@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 91335a5e | 21-Jan-2020 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Rename DT functions to newer naming convention
In the spapr code we've been gradually moving towards a convention that functions which create pieces of the device tree are called spapr_dt_*()
spapr: Rename DT functions to newer naming convention
In the spapr code we've been gradually moving towards a convention that functions which create pieces of the device tree are called spapr_dt_*(). This patch speeds that along by renaming most of the things that don't yet match that so that they do.
For now we leave the *_dt_populate() functions which are actual methods used in the DRCClass::dt_populate method.
While we're there we remove a few comments that don't really say anything useful.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org>
show more ...
|
| 4dba8722 | 15-Mar-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
spapr/rtas: Reserve space for RTAS blob and log
At the moment SLOF reserves space for RTAS and instantiates the RTAS blob which is 20 bytes binary blob calling an hypercall. The rest of the RTAS are
spapr/rtas: Reserve space for RTAS blob and log
At the moment SLOF reserves space for RTAS and instantiates the RTAS blob which is 20 bytes binary blob calling an hypercall. The rest of the RTAS area is a log which SLOF has no idea about but QEMU does.
This moves RTAS sizing to QEMU and this overrides the size from SLOF. The only remaining problem is that SLOF copies the number of bytes it reserved (2KB for now) so QEMU needs to reserve at least this much; SLOF will be fixed separately to check that rtas-size from QEMU is enough for those 20 bytes for the H_RTAS hcall.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200316011841.99970-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 395a20d3 | 10-Mar-2020 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
ppc/spapr: Move GPRs setup to one place
At the moment "pseries" starts in SLOF which only expects the FDT blob pointer in r3. As we are going to introduce a OpenFirmware support in QEMU, we will be
ppc/spapr: Move GPRs setup to one place
At the moment "pseries" starts in SLOF which only expects the FDT blob pointer in r3. As we are going to introduce a OpenFirmware support in QEMU, we will be booting OF clients directly and these expect a stack pointer in r1, Linux looks at r3/r4 for the initramdisk location (although vmlinux can find this from the device tree but zImage from distro kernels cannot).
This extends spapr_cpu_set_entry_state() to take more registers. This should cause no behavioral change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-2-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 1052ab67 | 19-Feb-2020 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Don't clamp RMA to 16GiB on new machine types
In spapr_machine_init() we clamp the size of the RMA to 16GiB and the comment saying why doesn't make a whole lot of sense. In fact, this was do
spapr: Don't clamp RMA to 16GiB on new machine types
In spapr_machine_init() we clamp the size of the RMA to 16GiB and the comment saying why doesn't make a whole lot of sense. In fact, this was done because the real mode handling code elsewhere limited the RMA in TCG mode to the maximum value configurable in LPCR[RMLS], 16GiB.
But, * Actually LPCR[RMLS] has been able to encode a 256GiB size for a very long time, we just didn't implement it properly in the softmmu * LPCR[RMLS] shouldn't really be relevant anyway, it only was because we used to abuse the RMOR based translation mode in order to handle the fact that we're not modelling the hypervisor parts of the cpu
We've now removed those limitations in the modelling so the 16GiB clamp no longer serves a function. However, we can't just remove the limit universally: that would break migration to earlier qemu versions, where the 16GiB RMLS limit still applies, no matter how bad the reasons for it are.
So, we replace the 16GiB clamp, with a clamp to a limit defined in the machine type class. We set it to 16 GiB for machine types 4.2 and earlier, but set it to 0 meaning unlimited for the new 5.0 machine type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
show more ...
|
| 8897ea5a | 27-Nov-2019 |
David Gibson <david@gibson.dropbear.id.au> |
spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU i
spapr: Don't attempt to clamp RMA to VRMA constraint
The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode.
The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit.
There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will *often* work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it.
In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years.
So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0).
Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>
show more ...
|
| 4b63db12 | 14-Feb-2020 |
Greg Kurz <groug@kaod.org> |
spapr: Don't use spapr_drc_needed() in CAS code
We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spa
spapr: Don't use spapr_drc_needed() in CAS code
We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spapr_drc_needed() function which is essentially a VMStateDescription .needed callback. Even if the condition for CAS reboot happens to be the same as for DRC migration, it looks wrong to piggyback a migration helper for this.
Introduce a helper with slightly more explicit name and use it in both CAS and DRC migration code. Since a subsequent patch will enhance this helper to cover the case of hot unplug, let's go for spapr_drc_transient(). While here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as well.
This doesn't change any behaviour.
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248180.3465937.9531405453362718771.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| b5fca656 | 09-Feb-2020 |
Shivaprasad G Bhat <sbhat@linux.ibm.com> |
spapr: Add Hcalls to support PAPR NVDIMM device
This patch implements few of the necessary hcalls for the nvdimm support.
PAPR semantics is such that each NVDIMM device is comprising of multiple SC
spapr: Add Hcalls to support PAPR NVDIMM device
This patch implements few of the necessary hcalls for the nvdimm support.
PAPR semantics is such that each NVDIMM device is comprising of multiple SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind each of the SCM blocks of the NVDIMM device using hcalls. There can be SCM block unbind requests in case of driver errors or unplug(not supported now) use cases. The NVDIMM label read/writes are done through hcalls.
Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind, unbind, and queries using hcalls on those blocks can come independently. This doesn't fit well into the qemu device semantics, where the map/unmap are done at the (whole)device/object level granularity. The patch doesnt actually bind/unbind on hcalls but let it happen at the device_add/del phase itself instead.
The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the region level granularity. Without interleaving, each virtual NVDIMM device is presented as a separate guest physical address range. So, there is no way a partial bind/unbind request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a virtual NVDIMM. Hence it is safe to do bind/unbind everything during the device_add/del.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <158131059899.2897.11515211602702956854.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| b28f0188 | 19-Feb-2020 |
Igor Mammedov <imammedo@redhat.com> |
ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM
memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code
ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM
memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region.
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200219160953.13771-67-imammedo@redhat.com>
show more ...
|
| 2500fb42 | 30-Jan-2020 |
Aravinda Prasad <arawinda.p@gmail.com> |
migration: Include migration support for machine check handling
This patch includes migration support for machine check handling. Especially this patch blocks VM migration requests until the machine
migration: Include migration support for machine check handling
This patch includes migration support for machine check handling. Especially this patch blocks VM migration requests until the machine check error handling is complete as these errors are specific to the source hardware and is irrelevant on the target hardware.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Do not set FWNMI cap in post_load, now its done in .apply hook] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| f03496bc | 30-Jan-2020 |
Aravinda Prasad <arawinda.p@gmail.com> |
ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
This patch adds support in QEMU to handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls.
The machine check notificat
ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls
This patch adds support in QEMU to handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls.
The machine check notification address is saved when the OS issues "ibm,nmi-register" RTAS call.
This patch also handles the case when multiple processors experience machine check at or about the same time by handling "ibm,nmi-interlock" call. In such cases, as per PAPR, subsequent processors serialize waiting for the first processor to issue the "ibm,nmi-interlock" call. The second processor that also received a machine check error waits till the first processor is done reading the error log. The first processor issues "ibm,nmi-interlock" call when the error log is consumed.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Register fwnmi RTAS calls in core_rtas_register_types() where other RTAS calls are registered] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-6-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 81fe70e4 | 30-Jan-2020 |
Aravinda Prasad <arawinda.p@gmail.com> |
target/ppc: Build rtas error log upon an MCE
Upon a machine check exception (MCE) in a guest address space, KVM causes a guest exit to enable QEMU to build and pass the error to the guest in the PAP
target/ppc: Build rtas error log upon an MCE
Upon a machine check exception (MCE) in a guest address space, KVM causes a guest exit to enable QEMU to build and pass the error to the guest in the PAPR defined rtas error log format.
This patch builds the rtas error log, copies it to the rtas_addr and then invokes the guest registered machine check handler. The handler in the guest takes suitable action(s) depending on the type and criticality of the error. For example, if an error is unrecoverable memory corruption in an application inside the guest, then the guest kernel sends a SIGBUS to the application. For recoverable errors, the guest performs recovery actions and logs the error.
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Assume SLOF has allocated enough room for rtas error log] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200130184423.20519-5-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|
| 9ac703ac | 30-Jan-2020 |
Aravinda Prasad <arawinda.p@gmail.com> |
target/ppc: Handle NMI guest exit
Memory error such as bit flips that cannot be corrected by hardware are passed on to the kernel for handling. If the memory address in error belongs to guest then t
target/ppc: Handle NMI guest exit
Memory error such as bit flips that cannot be corrected by hardware are passed on to the kernel for handling. If the memory address in error belongs to guest then the guest kernel is responsible for taking suitable action. Patch [1] enhances KVM to exit guest with exit reason set to KVM_EXIT_NMI in such cases. This patch handles KVM_EXIT_NMI exit.
[1] https://www.spinics.net/lists/kvm-ppc/msg12637.html (e20bbd3d and related commits)
Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com> [dwg: #ifdefs to fix compile for 32-bit target] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
show more ...
|