xref: /openbmc/qemu/docs/system/ppc/pseries.rst (revision c0cf6b412ecb099d49fe040d32fd5dd149f770d7)
1===================================
2pSeries family boards (``pseries``)
3===================================
4
5The Power machine para-virtualized environment described by the Linux on Power
6Architecture Reference ([LoPAR]_) document is called pSeries. This environment
7is also known as sPAPR, System p guests, or simply Power Linux guests (although
8it is capable of running other operating systems, such as AIX).
9
10Even though pSeries is designed to behave as a guest environment, it is also
11capable of acting as a hypervisor OS, providing, on that role, nested
12virtualization capabilities.
13
14Supported devices
15=================
16
17 * Multi processor support for many Power processors generations:
18   - POWER7, POWER7+
19   - POWER8, POWER8NVL
20   - POWER9
21   - Power10
22   - Power11
23   - Support for POWER5+ also exists, works with correct kernel/userspace
24 * Interrupt Controller
25    - XICS (POWER8)
26    - XIVE (Supported by below:)
27        - POWER9
28        - Power10
29        - Power11
30 * vPHB PCIe Host bridge.
31 * vscsi and vnet devices, compatible with the same devices available on a
32   PowerVM hypervisor with VIOS managing LPARs.
33 * Virtio based devices.
34 * PCIe device pass through.
35
36Missing devices
37===============
38
39 * SPICE support.
40
41Firmware
42========
43
44The pSeries platform in QEMU comes with 2 firmwares:
45
46`SLOF <https://github.com/aik/SLOF>`_ (Slimline Open Firmware) is an
47implementation of the `IEEE 1275-1994, Standard for Boot (Initialization
48Configuration) Firmware: Core Requirements and Practices
49<https://standards.ieee.org/standard/1275-1994.html>`_.
50
51SLOF performs bus scanning, PCI resource allocation, provides the client
52interface to boot from block devices and network.
53
54QEMU includes a prebuilt image of SLOF which is updated when a more recent
55version is required.
56
57VOF (Virtual Open Firmware) is a minimalistic firmware to work with
58``-machine pseries,x-vof=on``. When enabled, the firmware acts as a slim
59shim and QEMU implements parts of the IEEE 1275 Open Firmware interface.
60
61VOF does not have device drivers, does not do PCI resource allocation and
62relies on ``-kernel`` used with Linux kernels recent enough (v5.4+)
63to PCI resource assignment. It is ideal to use with petitboot.
64
65Booting via ``-kernel`` supports the following:
66
67+-------------------+-------------------+------------------+
68| kernel            | pseries,x-vof=off | pseries,x-vof=on |
69+===================+===================+==================+
70| vmlinux BE        |     ✓             |     ✓            |
71+-------------------+-------------------+------------------+
72| vmlinux LE        |     ✓             |     ✓            |
73+-------------------+-------------------+------------------+
74| zImage.pseries BE |     ✓¹            |     ✓¹           |
75+-------------------+-------------------+------------------+
76| zImage.pseries LE |     ✓             |     ✓            |
77+-------------------+-------------------+------------------+
78
79¹ must set kernel-addr=0
80
81Build directions
82================
83
84.. code-block:: bash
85
86  ./configure --target-list=ppc64-softmmu && make
87
88Running instructions
89====================
90
91Someone can select the pSeries machine type by running QEMU with the following
92options:
93
94.. code-block:: bash
95
96  qemu-system-ppc64 -M pseries <other QEMU arguments>
97
98sPAPR devices
99=============
100
101The sPAPR specification defines a set of para-virtualized devices, which are
102also supported by the pSeries machine in QEMU and can be instantiated with the
103``-device`` option:
104
105* ``spapr-vlan`` : a virtual network interface.
106* ``spapr-vscsi`` : a virtual SCSI disk interface.
107* ``spapr-rng`` : a pseudo-device for passing random number generator data to the
108  guest (see the `H_RANDOM hypercall feature
109  <https://wiki.qemu.org/Features/HRandomHypercall>`_ for details).
110* ``spapr-vty``: a virtual teletype.
111* ``spapr-pci-host-bridge``: a PCI host bridge.
112* ``tpm-spapr``: a Trusted Platform Module (TPM).
113* ``spapr-tpm-proxy``: a TPM proxy.
114
115These are compatible with the devices historically available for use when
116running the IBM PowerVM hypervisor with LPARs.
117
118However, since these devices have originally been specified with another
119hypervisor and non-Linux guests in mind, you should use the virtio counterparts
120(virtio-net, virtio-blk/scsi and virtio-rng for instance) if possible instead,
121since they will most probably give you better performance with Linux guests in a
122QEMU environment.
123
124The pSeries machine in QEMU is always instantiated with the following devices:
125
126* A NVRAM device (``spapr-nvram``).
127* A virtual teletype (``spapr-vty``).
128* A PCI host bridge (``spapr-pci-host-bridge``).
129
130Hence, it is not needed to add them manually, unless you use the ``-nodefaults``
131command line option in QEMU.
132
133In the case of the default ``spapr-nvram`` device, if someone wants to make the
134contents of the NVRAM device persistent, they will need to specify a PFLASH
135device when starting QEMU, i.e. either use
136``-drive if=pflash,file=<filename>,format=raw`` to set the default PFLASH
137device, or specify one with an ID
138(``-drive if=none,file=<filename>,format=raw,id=pfid``) and pass that ID to the
139NVRAM device with ``-global spapr-nvram.drive=pfid``.
140
141sPAPR specification
142-------------------
143
144The main source of documentation on the sPAPR standard is the [LoPAR]_ document.
145However, documentation specific to QEMU's implementation of the specification
146can  also be found in QEMU documentation:
147
148.. toctree::
149   :maxdepth: 1
150
151   ../../specs/ppc-spapr-hotplug.rst
152   ../../specs/ppc-spapr-hcalls.rst
153   ../../specs/ppc-spapr-numa.rst
154   ../../specs/ppc-spapr-uv-hcalls.rst
155   ../../specs/ppc-spapr-xive.rst
156
157Switching between the KVM-PR and KVM-HV kernel module
158=====================================================
159
160Currently, there are two implementations of KVM on Power, ``kvm_hv.ko`` and
161``kvm_pr.ko``.
162
163
164If a host supports both KVM modes, and both KVM kernel modules are loaded, it is
165possible to switch between the two modes with the ``kvm-type`` parameter:
166
167* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=PR`` to use the
168  ``kvm_pr.ko`` kernel module.
169* Use ``qemu-system-ppc64 -M pseries,accel=kvm,kvm-type=HV`` to use ``kvm_hv.ko``
170  instead.
171
172KVM-PR
173------
174
175KVM-PR uses the so-called **PR**\ oblem state of the PPC CPUs to run the guests,
176i.e. the virtual machine is run in user mode and all privileged instructions
177trap and have to be emulated by the host. That means you can run KVM-PR inside
178a pSeries guest (or a PowerVM LPAR for that matter), and that is where it has
179originated, as historically (prior to POWER7) it was not possible to run Linux
180on hypervisor mode on a Power processor (this function was restricted to
181PowerVM, the IBM proprietary hypervisor).
182
183Because all privileged instructions are trapped, guests that use a lot of
184privileged instructions run quite slow with KVM-PR. On the other hand, because
185of that, this kernel module can run on pretty much every PPC hardware, and is
186able to emulate a lot of guests CPUs. This module can even be used to run other
187PowerPC guests like an emulated PowerMac.
188
189As KVM-PR can be run inside a pSeries guest, it can also provide nested
190virtualization capabilities (i.e. running a guest from within a guest).
191
192It is important to notice that, as KVM-HV provides a much better execution
193performance, maintenance work has been much more focused on it in the past
194years. Maintenance for KVM-PR has been minimal.
195
196In order to run KVM-PR guests with POWER9 processors, someone will need to start
197QEMU with ``kernel_irqchip=off`` command line option.
198
199KVM-HV
200------
201
202KVM-HV uses the hypervisor mode of more recent Power processors, that allow
203access to the bare metal hardware directly. Although POWER7 had this capability,
204it was only starting with POWER8 that this was officially supported by IBM.
205
206Originally, KVM-HV was only available when running on a PowerNV platform (a.k.a.
207Power bare metal). Although it runs on a PowerNV platform, it can only be used
208to start pSeries guests. As the pSeries guest doesn't have access to the
209hypervisor mode of the Power CPU, it wasn't possible to run KVM-HV on a guest.
210This limitation has been lifted, and now it is possible to run KVM-HV inside
211pSeries guests as well, making nested virtualization possible with KVM-HV.
212
213As KVM-HV has access to privileged instructions, guests that use a lot of these
214can run much faster than with KVM-PR. On the other hand, the guest CPU has to be
215of the same type as the host CPU this way, e.g. it is not possible to specify an
216embedded PPC CPU for the guest with KVM-HV. However, there is at least the
217possibility to run the guest in a backward-compatibility mode of the previous
218CPUs generations, e.g. you can run a POWER7 guest on a POWER8 host by using
219``-cpu POWER8,compat=power7`` as parameter to QEMU.
220
221Modules support
222===============
223
224As noticed in the sections above, each module can run in a different
225environment. The following table shows with which environment each module can
226run. As long as you are in a supported environment, you can run KVM-PR or KVM-HV
227nested. Combinations not shown in the table are not available.
228
229+--------------+------------+------+-------------------+----------+--------+
230| Platform     | Host type  | Bits | Page table format | KVM-HV   | KVM-PR |
231+==============+============+======+===================+==========+========+
232| PowerNV      | bare metal | 32   | hash              | no       | yes    |
233|              |            |      +-------------------+----------+--------+
234|              |            |      | radix             | N/A      | N/A    |
235|              |            +------+-------------------+----------+--------+
236|              |            | 64   | hash              | yes      | yes    |
237|              |            |      +-------------------+----------+--------+
238|              |            |      | radix             | yes      | no     |
239+--------------+------------+------+-------------------+----------+--------+
240| pSeries [1]_ | PowerNV    | 32   | hash              | no       | yes    |
241|              |            |      +-------------------+----------+--------+
242|              |            |      | radix             | N/A      | N/A    |
243|              |            +------+-------------------+----------+--------+
244|              |            | 64   | hash              | no       | yes    |
245|              |            |      +-------------------+----------+--------+
246|              |            |      | radix             | yes [2]_ | no     |
247|              +------------+------+-------------------+----------+--------+
248|              | PowerVM    | 32   | hash              | no       | yes    |
249|              |            |      +-------------------+----------+--------+
250|              |            |      | radix             | N/A      | N/A    |
251|              |            +------+-------------------+----------+--------+
252|              |            | 64   | hash              | no       | yes    |
253|              |            |      +-------------------+----------+--------+
254|              |            |      | radix [3]_        | no       | yes    |
255+--------------+------------+------+-------------------+----------+--------+
256
257.. [1] On POWER9 DD2.1 processors, the page table format on the host and guest
258   must be the same.
259
260.. [2] KVM-HV cannot run nested on POWER8 machines.
261
262.. [3] Introduced on Power10 machines.
263
264
265.. _power-papr-protected-execution-facility-pef:
266
267POWER (PAPR) Protected Execution Facility (PEF)
268-----------------------------------------------
269
270Protected Execution Facility (PEF), also known as Secure Guest support
271is a feature found on IBM POWER9 and POWER10 processors.
272
273If a suitable firmware including an Ultravisor is installed, it adds
274an extra memory protection mode to the CPU.  The ultravisor manages a
275pool of secure memory which cannot be accessed by the hypervisor.
276
277When this feature is enabled in QEMU, a guest can use ultracalls to
278enter "secure mode".  This transfers most of its memory to secure
279memory, where it cannot be eavesdropped by a compromised hypervisor.
280
281Launching
282^^^^^^^^^
283
284To launch a guest which will be permitted to enter PEF secure mode::
285
286  $ qemu-system-ppc64 \
287      -object pef-guest,id=pef0 \
288      -machine confidential-guest-support=pef0 \
289      ...
290
291Live Migration
292^^^^^^^^^^^^^^
293
294Live migration is not yet implemented for PEF guests.  For
295consistency, QEMU currently prevents migration if the PEF feature is
296enabled, whether or not the guest has actually entered secure mode.
297
298
299Maintainer contact information
300==============================
301
302Cédric Le Goater <clg@kaod.org>
303
304Daniel Henrique Barboza <danielhb413@gmail.com>
305
306.. [LoPAR] `Linux on Power Architecture Reference document (LoPAR) revision
307   2.9 <https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200812.pdf>`_.
308