xref: /openbmc/linux/Documentation/powerpc/papr_hcalls.rst (revision c900529f3d9161bfde5cca0754f83b4d3c3e0220)
158b278f5SVaibhav Jain.. SPDX-License-Identifier: GPL-2.0
258b278f5SVaibhav Jain
358b278f5SVaibhav Jain===========================
458b278f5SVaibhav JainHypercall Op-codes (hcalls)
558b278f5SVaibhav Jain===========================
658b278f5SVaibhav Jain
758b278f5SVaibhav JainOverview
858b278f5SVaibhav Jain=========
958b278f5SVaibhav Jain
1058b278f5SVaibhav JainVirtualization on 64-bit Power Book3S Platforms is based on the PAPR
1158b278f5SVaibhav Jainspecification [1]_ which describes the run-time environment for a guest
1258b278f5SVaibhav Jainoperating system and how it should interact with the hypervisor for
1358b278f5SVaibhav Jainprivileged operations. Currently there are two PAPR compliant hypervisors:
1458b278f5SVaibhav Jain
1558b278f5SVaibhav Jain- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
1658b278f5SVaibhav Jain  IBM-i and  Linux as supported guests (termed as Logical Partitions
1758b278f5SVaibhav Jain  or LPARS). It supports the full PAPR specification.
1858b278f5SVaibhav Jain
1958b278f5SVaibhav Jain- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
2058b278f5SVaibhav Jain  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
2158b278f5SVaibhav Jain
2258b278f5SVaibhav JainOn PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
2358b278f5SVaibhav Jaina *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
2458b278f5SVaibhav Jainissue hypercalls to the hypervisor whenever it needs to perform an action
25*d56b699dSBjorn Helgaasthat is hypervisor privileged [3]_ or for other services managed by the
2658b278f5SVaibhav Jainhypervisor.
2758b278f5SVaibhav Jain
2858b278f5SVaibhav JainHence a Hypercall (hcall) is essentially a request by the pseries guest
2958b278f5SVaibhav Jainasking hypervisor to perform a privileged operation on behalf of the guest. The
3058b278f5SVaibhav Jainguest issues a with necessary input operands. The hypervisor after performing
3158b278f5SVaibhav Jainthe privilege operation returns a status code and output operands back to the
3258b278f5SVaibhav Jainguest.
3358b278f5SVaibhav Jain
3458b278f5SVaibhav JainHCALL ABI
3558b278f5SVaibhav Jain=========
3658b278f5SVaibhav JainThe ABI specification for a hcall between a pseries guest and PAPR hypervisor
3758b278f5SVaibhav Jainis covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
3858b278f5SVaibhav Jaindone via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
3958b278f5SVaibhav Jainand any in-arguments for the hcall are provided in registers *r4-r12*. If values
4058b278f5SVaibhav Jainhave to be passed through a memory buffer, the data stored in that buffer should be
4158b278f5SVaibhav Jainin Big-endian byte order.
4258b278f5SVaibhav Jain
43f8b42777SHe YingOnce control returns back to the guest after hypervisor has serviced the
4458b278f5SVaibhav Jain'HVCS' instruction the return value of the hcall is available in *r3* and any
4558b278f5SVaibhav Jainout values are returned in registers *r4-r12*. Again like in case of in-arguments,
4658b278f5SVaibhav Jainany out values stored in a memory buffer will be in Big-endian byte order.
4758b278f5SVaibhav Jain
4858b278f5SVaibhav JainPowerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
4958b278f5SVaibhav Jainin a arch specific header [4]_ to issue hcalls from the linux kernel
5058b278f5SVaibhav Jainrunning as pseries guest.
5158b278f5SVaibhav Jain
5258b278f5SVaibhav JainRegister Conventions
5358b278f5SVaibhav Jain====================
5458b278f5SVaibhav Jain
5558b278f5SVaibhav JainAny hcall should follow same register convention as described in section 2.2.1.1
5658b278f5SVaibhav Jainof "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
5758b278f5SVaibhav Jainsummarizes these conventions:
5858b278f5SVaibhav Jain
5958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
6058b278f5SVaibhav Jain| Register |Volatile  |  Purpose                                  |
6158b278f5SVaibhav Jain| Range    |(Y/N)     |                                           |
6258b278f5SVaibhav Jain+==========+==========+===========================================+
6358b278f5SVaibhav Jain|   r0     |    Y     |  Optional-usage                           |
6458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
6558b278f5SVaibhav Jain|   r1     |    N     |  Stack Pointer                            |
6658b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
6758b278f5SVaibhav Jain|   r2     |    N     |  TOC                                      |
6858b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
6958b278f5SVaibhav Jain|   r3     |    Y     |  hcall opcode/return value                |
7058b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
7158b278f5SVaibhav Jain|  r4-r10  |    Y     |  in and out values                        |
7258b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
7358b278f5SVaibhav Jain|   r11    |    Y     |  Optional-usage/Environmental pointer     |
7458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
7558b278f5SVaibhav Jain|   r12    |    Y     |  Optional-usage/Function entry address at |
7658b278f5SVaibhav Jain|          |          |  global entry point                       |
7758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
7858b278f5SVaibhav Jain|   r13    |    N     |  Thread-Pointer                           |
7958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
8058b278f5SVaibhav Jain|  r14-r31 |    N     |  Local Variables                          |
8158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
8258b278f5SVaibhav Jain|    LR    |    Y     |  Link Register                            |
8358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
8458b278f5SVaibhav Jain|   CTR    |    Y     |  Loop Counter                             |
8558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
8658b278f5SVaibhav Jain|   XER    |    Y     |  Fixed-point exception register.          |
8758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
8858b278f5SVaibhav Jain|  CR0-1   |    Y     |  Condition register fields.               |
8958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
9058b278f5SVaibhav Jain|  CR2-4   |    N     |  Condition register fields.               |
9158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
9258b278f5SVaibhav Jain|  CR5-7   |    Y     |  Condition register fields.               |
9358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
9458b278f5SVaibhav Jain|  Others  |    N     |                                           |
9558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+
9658b278f5SVaibhav Jain
9758b278f5SVaibhav JainDRC & DRC Indexes
9858b278f5SVaibhav Jain=================
9958b278f5SVaibhav Jain::
10058b278f5SVaibhav Jain
10158b278f5SVaibhav Jain     DR1                                  Guest
10258b278f5SVaibhav Jain     +--+        +------------+         +---------+
10358b278f5SVaibhav Jain     |  | <----> |            |         |  User   |
10458b278f5SVaibhav Jain     +--+  DRC1  |            |   DRC   |  Space  |
10558b278f5SVaibhav Jain                 |    PAPR    |  Index  +---------+
10658b278f5SVaibhav Jain     DR2         | Hypervisor |         |         |
10758b278f5SVaibhav Jain     +--+        |            | <-----> |  Kernel |
10858b278f5SVaibhav Jain     |  | <----> |            |  Hcall  |         |
10958b278f5SVaibhav Jain     +--+  DRC2  +------------+         +---------+
11058b278f5SVaibhav Jain
11158b278f5SVaibhav JainPAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
11258b278f5SVaibhav Jainavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
11358b278f5SVaibhav Jainan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
11458b278f5SVaibhav Jainto manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
11558b278f5SVaibhav Jaincalled DRC-Index. The DRC-index value is provided to the LPAR via device-tree
11658b278f5SVaibhav Jainwhere its present as an attribute in the device tree node associated with the
11758b278f5SVaibhav JainDR.
11858b278f5SVaibhav Jain
11958b278f5SVaibhav JainHCALL Return-values
12058b278f5SVaibhav Jain===================
12158b278f5SVaibhav Jain
12258b278f5SVaibhav JainAfter servicing the hcall, hypervisor sets the return-value in *r3* indicating
12358b278f5SVaibhav Jainsuccess or failure of the hcall. In case of a failure an error code indicates
12458b278f5SVaibhav Jainthe cause for error. These codes are defined and documented in arch specific
12558b278f5SVaibhav Jainheader [4]_.
12658b278f5SVaibhav Jain
12758b278f5SVaibhav JainIn some cases a hcall can potentially take a long time and need to be issued
12858b278f5SVaibhav Jainmultiple times in order to be completely serviced. These hcalls will usually
12958b278f5SVaibhav Jainaccept an opaque value *continue-token* within there argument list and a
13058b278f5SVaibhav Jainreturn value of *H_CONTINUE* indicates that hypervisor hasn't still finished
13158b278f5SVaibhav Jainservicing the hcall yet.
13258b278f5SVaibhav Jain
13358b278f5SVaibhav JainTo make such hcalls the guest need to set *continue-token == 0* for the
13458b278f5SVaibhav Jaininitial call and use the hypervisor returned value of *continue-token*
13558b278f5SVaibhav Jainfor each subsequent hcall until hypervisor returns a non *H_CONTINUE*
13658b278f5SVaibhav Jainreturn value.
13758b278f5SVaibhav Jain
13858b278f5SVaibhav JainHCALL Op-codes
13958b278f5SVaibhav Jain==============
14058b278f5SVaibhav Jain
14158b278f5SVaibhav JainBelow is a partial list of HCALLs that are supported by PHYP. For the
14258b278f5SVaibhav Jaincorresponding opcode values please look into the arch specific header [4]_:
14358b278f5SVaibhav Jain
14458b278f5SVaibhav Jain**H_SCM_READ_METADATA**
14558b278f5SVaibhav Jain
14658b278f5SVaibhav Jain| Input: *drcIndex, offset, buffer-address, numBytesToRead*
14758b278f5SVaibhav Jain| Out: *numBytesRead*
14858b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
14958b278f5SVaibhav Jain
150f8b42777SHe YingGiven a DRC Index of an NVDIMM, read N-bytes from the metadata area
15158b278f5SVaibhav Jainassociated with it, at a specified offset and copy it to provided buffer.
15258b278f5SVaibhav JainThe metadata area stores configuration information such as label information,
15358b278f5SVaibhav Jainbad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
15458b278f5SVaibhav Jainarea hence a separate access semantics is provided.
15558b278f5SVaibhav Jain
15658b278f5SVaibhav Jain**H_SCM_WRITE_METADATA**
15758b278f5SVaibhav Jain
15858b278f5SVaibhav Jain| Input: *drcIndex, offset, data, numBytesToWrite*
15958b278f5SVaibhav Jain| Out: *None*
16058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
16158b278f5SVaibhav Jain
16258b278f5SVaibhav JainGiven a DRC Index of an NVDIMM, write N-bytes to the metadata area
16358b278f5SVaibhav Jainassociated with it, at the specified offset and from the provided buffer.
16458b278f5SVaibhav Jain
16558b278f5SVaibhav Jain**H_SCM_BIND_MEM**
16658b278f5SVaibhav Jain
16758b278f5SVaibhav Jain| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
16858b278f5SVaibhav Jain| *targetLogicalMemoryAddress, continue-token*
16958b278f5SVaibhav Jain| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
17058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
17158b278f5SVaibhav Jain| *H_Too_Big, H_P5, H_Busy*
17258b278f5SVaibhav Jain
17358b278f5SVaibhav JainGiven a DRC-Index of an NVDIMM, map a continuous SCM blocks range
17458b278f5SVaibhav Jain*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
17558b278f5SVaibhav Jainat *targetLogicalMemoryAddress* within guest physical address space. In
17658b278f5SVaibhav Jaincase *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
17758b278f5SVaibhav Jainassigns a target address to the guest. The HCALL can fail if the Guest has
17858b278f5SVaibhav Jainan active PTE entry to the SCM block being bound.
17958b278f5SVaibhav Jain
18058b278f5SVaibhav Jain**H_SCM_UNBIND_MEM**
18158b278f5SVaibhav Jain| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
18258b278f5SVaibhav Jain| Out: numScmBlocksUnbound
18358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
18458b278f5SVaibhav Jain| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
18558b278f5SVaibhav Jain
18658b278f5SVaibhav JainGiven a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
18758b278f5SVaibhav Jainat *startingScmLogicalMemoryAddress* from guest physical address space. The
18858b278f5SVaibhav JainHCALL can fail if the Guest has an active PTE entry to the SCM block being
18958b278f5SVaibhav Jainunbound.
19058b278f5SVaibhav Jain
19158b278f5SVaibhav Jain**H_SCM_QUERY_BLOCK_MEM_BINDING**
19258b278f5SVaibhav Jain
19358b278f5SVaibhav Jain| Input: *drcIndex, scmBlockIndex*
19458b278f5SVaibhav Jain| Out: *Guest-Physical-Address*
19558b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
19658b278f5SVaibhav Jain
19758b278f5SVaibhav JainGiven a DRC-Index and an SCM Block index return the guest physical address to
19858b278f5SVaibhav Jainwhich the SCM block is mapped to.
19958b278f5SVaibhav Jain
20058b278f5SVaibhav Jain**H_SCM_QUERY_LOGICAL_MEM_BINDING**
20158b278f5SVaibhav Jain
20258b278f5SVaibhav Jain| Input: *Guest-Physical-Address*
20358b278f5SVaibhav Jain| Out: *drcIndex, scmBlockIndex*
20458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
20558b278f5SVaibhav Jain
20658b278f5SVaibhav JainGiven a guest physical address return which DRC Index and SCM block is mapped
20758b278f5SVaibhav Jainto that address.
20858b278f5SVaibhav Jain
20958b278f5SVaibhav Jain**H_SCM_UNBIND_ALL**
21058b278f5SVaibhav Jain
21158b278f5SVaibhav Jain| Input: *scmTargetScope, drcIndex*
21258b278f5SVaibhav Jain| Out: *None*
21358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
21458b278f5SVaibhav Jain| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
21558b278f5SVaibhav Jain
21658b278f5SVaibhav JainDepending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
21758b278f5SVaibhav Jainor all SCM blocks belonging to a single NVDIMM identified by its drcIndex
21858b278f5SVaibhav Jainfrom the LPAR memory.
21958b278f5SVaibhav Jain
22058b278f5SVaibhav Jain**H_SCM_HEALTH**
22158b278f5SVaibhav Jain
22258b278f5SVaibhav Jain| Input: drcIndex
223901e3490SVaibhav Jain| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
22458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_Hardware*
22558b278f5SVaibhav Jain
22658b278f5SVaibhav JainGiven a DRC Index return the info on predictive failure and overall health of
227901e3490SVaibhav Jainthe PMEM device. The asserted bits in the health-bitmap indicate one or more states
228901e3490SVaibhav Jain(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
229901e3490SVaibhav Jainwhich bits in health-bitmap are valid. The bits are reported in
230901e3490SVaibhav Jainreverse bit ordering for example a value of 0xC400000000000000
231901e3490SVaibhav Jainindicates bits 0, 1, and 5 are valid.
232901e3490SVaibhav Jain
233901e3490SVaibhav JainHealth Bitmap Flags:
234901e3490SVaibhav Jain
235901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
236901e3490SVaibhav Jain|  Bit |               Definition                                              |
237901e3490SVaibhav Jain+======+=======================================================================+
238901e3490SVaibhav Jain|  00  | PMEM device is unable to persist memory contents.                     |
239901e3490SVaibhav Jain|      | If the system is powered down, nothing will be saved.                 |
240901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
241901e3490SVaibhav Jain|  01  | PMEM device failed to persist memory contents. Either contents were   |
242901e3490SVaibhav Jain|      | not saved successfully on power down or were not restored properly on |
243901e3490SVaibhav Jain|      | power up.                                                             |
244901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
245901e3490SVaibhav Jain|  02  | PMEM device contents are persisted from previous IPL. The data from   |
246901e3490SVaibhav Jain|      | the last boot were successfully restored.                             |
247901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
248901e3490SVaibhav Jain|  03  | PMEM device contents are not persisted from previous IPL. There was no|
249901e3490SVaibhav Jain|      | data to restore from the last boot.                                   |
250901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
251901e3490SVaibhav Jain|  04  | PMEM device memory life remaining is critically low                   |
252901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
253901e3490SVaibhav Jain|  05  | PMEM device will be garded off next IPL due to failure                |
254901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
255901e3490SVaibhav Jain|  06  | PMEM device contents cannot persist due to current platform health    |
256901e3490SVaibhav Jain|      | status. A hardware failure may prevent data from being saved or       |
257901e3490SVaibhav Jain|      | restored.                                                             |
258901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
259901e3490SVaibhav Jain|  07  | PMEM device is unable to persist memory contents in certain conditions|
260901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
261901e3490SVaibhav Jain|  08  | PMEM device is encrypted                                              |
262901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
263901e3490SVaibhav Jain|  09  | PMEM device has successfully completed a requested erase or secure    |
264901e3490SVaibhav Jain|      | erase procedure.                                                      |
265901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
266901e3490SVaibhav Jain|10:63 | Reserved / Unused                                                     |
267901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+
26858b278f5SVaibhav Jain
26958b278f5SVaibhav Jain**H_SCM_PERFORMANCE_STATS**
27058b278f5SVaibhav Jain
27158b278f5SVaibhav Jain| Input: drcIndex, resultBuffer Addr
27258b278f5SVaibhav Jain| Out: None
27358b278f5SVaibhav Jain| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
27458b278f5SVaibhav Jain
27558b278f5SVaibhav JainGiven a DRC Index collect the performance statistics for NVDIMM and copy them
27658b278f5SVaibhav Jainto the resultBuffer.
27758b278f5SVaibhav Jain
27875b7c05eSShivaprasad G Bhat**H_SCM_FLUSH**
27975b7c05eSShivaprasad G Bhat
28075b7c05eSShivaprasad G Bhat| Input: *drcIndex, continue-token*
28175b7c05eSShivaprasad G Bhat| Out: *continue-token*
28275b7c05eSShivaprasad G Bhat| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
28375b7c05eSShivaprasad G Bhat
28475b7c05eSShivaprasad G BhatGiven a DRC Index Flush the data to backend NVDIMM device.
28575b7c05eSShivaprasad G Bhat
28675b7c05eSShivaprasad G BhatThe hcall returns H_BUSY when the flush takes longer time and the hcall needs
28775b7c05eSShivaprasad G Bhatto be issued multiple times in order to be completely serviced. The
28875b7c05eSShivaprasad G Bhat*continue-token* from the output to be passed in the argument list of
28975b7c05eSShivaprasad G Bhatsubsequent hcalls to the hypervisor until the hcall is completely serviced
29075b7c05eSShivaprasad G Bhatat which point H_SUCCESS or other error is returned by the hypervisor.
29175b7c05eSShivaprasad G Bhat
29258b278f5SVaibhav JainReferences
29358b278f5SVaibhav Jain==========
29458b278f5SVaibhav Jain.. [1] "Power Architecture Platform Reference"
29558b278f5SVaibhav Jain       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
29658b278f5SVaibhav Jain.. [2] "Linux on Power Architecture Platform Reference"
29758b278f5SVaibhav Jain       https://members.openpowerfoundation.org/document/dl/469
29858b278f5SVaibhav Jain.. [3] "Definitions and Notation" Book III-Section 14.5.3
29958b278f5SVaibhav Jain       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
30058b278f5SVaibhav Jain.. [4] arch/powerpc/include/asm/hvcall.h
30158b278f5SVaibhav Jain.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
30258b278f5SVaibhav Jain       https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
303