158b278f5SVaibhav Jain.. SPDX-License-Identifier: GPL-2.0 258b278f5SVaibhav Jain 358b278f5SVaibhav Jain=========================== 458b278f5SVaibhav JainHypercall Op-codes (hcalls) 558b278f5SVaibhav Jain=========================== 658b278f5SVaibhav Jain 758b278f5SVaibhav JainOverview 858b278f5SVaibhav Jain========= 958b278f5SVaibhav Jain 1058b278f5SVaibhav JainVirtualization on 64-bit Power Book3S Platforms is based on the PAPR 1158b278f5SVaibhav Jainspecification [1]_ which describes the run-time environment for a guest 1258b278f5SVaibhav Jainoperating system and how it should interact with the hypervisor for 1358b278f5SVaibhav Jainprivileged operations. Currently there are two PAPR compliant hypervisors: 1458b278f5SVaibhav Jain 1558b278f5SVaibhav Jain- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, 1658b278f5SVaibhav Jain IBM-i and Linux as supported guests (termed as Logical Partitions 1758b278f5SVaibhav Jain or LPARS). It supports the full PAPR specification. 1858b278f5SVaibhav Jain 1958b278f5SVaibhav Jain- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. 2058b278f5SVaibhav Jain Though it only implements a subset of PAPR specification called LoPAPR [2]_. 2158b278f5SVaibhav Jain 2258b278f5SVaibhav JainOn PPC64 arch a guest kernel running on top of a PAPR hypervisor is called 2358b278f5SVaibhav Jaina *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must 2458b278f5SVaibhav Jainissue hypercalls to the hypervisor whenever it needs to perform an action 25*d56b699dSBjorn Helgaasthat is hypervisor privileged [3]_ or for other services managed by the 2658b278f5SVaibhav Jainhypervisor. 2758b278f5SVaibhav Jain 2858b278f5SVaibhav JainHence a Hypercall (hcall) is essentially a request by the pseries guest 2958b278f5SVaibhav Jainasking hypervisor to perform a privileged operation on behalf of the guest. The 3058b278f5SVaibhav Jainguest issues a with necessary input operands. The hypervisor after performing 3158b278f5SVaibhav Jainthe privilege operation returns a status code and output operands back to the 3258b278f5SVaibhav Jainguest. 3358b278f5SVaibhav Jain 3458b278f5SVaibhav JainHCALL ABI 3558b278f5SVaibhav Jain========= 3658b278f5SVaibhav JainThe ABI specification for a hcall between a pseries guest and PAPR hypervisor 3758b278f5SVaibhav Jainis covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is 3858b278f5SVaibhav Jaindone via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* 3958b278f5SVaibhav Jainand any in-arguments for the hcall are provided in registers *r4-r12*. If values 4058b278f5SVaibhav Jainhave to be passed through a memory buffer, the data stored in that buffer should be 4158b278f5SVaibhav Jainin Big-endian byte order. 4258b278f5SVaibhav Jain 43f8b42777SHe YingOnce control returns back to the guest after hypervisor has serviced the 4458b278f5SVaibhav Jain'HVCS' instruction the return value of the hcall is available in *r3* and any 4558b278f5SVaibhav Jainout values are returned in registers *r4-r12*. Again like in case of in-arguments, 4658b278f5SVaibhav Jainany out values stored in a memory buffer will be in Big-endian byte order. 4758b278f5SVaibhav Jain 4858b278f5SVaibhav JainPowerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined 4958b278f5SVaibhav Jainin a arch specific header [4]_ to issue hcalls from the linux kernel 5058b278f5SVaibhav Jainrunning as pseries guest. 5158b278f5SVaibhav Jain 5258b278f5SVaibhav JainRegister Conventions 5358b278f5SVaibhav Jain==================== 5458b278f5SVaibhav Jain 5558b278f5SVaibhav JainAny hcall should follow same register convention as described in section 2.2.1.1 5658b278f5SVaibhav Jainof "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below 5758b278f5SVaibhav Jainsummarizes these conventions: 5858b278f5SVaibhav Jain 5958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6058b278f5SVaibhav Jain| Register |Volatile | Purpose | 6158b278f5SVaibhav Jain| Range |(Y/N) | | 6258b278f5SVaibhav Jain+==========+==========+===========================================+ 6358b278f5SVaibhav Jain| r0 | Y | Optional-usage | 6458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6558b278f5SVaibhav Jain| r1 | N | Stack Pointer | 6658b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6758b278f5SVaibhav Jain| r2 | N | TOC | 6858b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6958b278f5SVaibhav Jain| r3 | Y | hcall opcode/return value | 7058b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7158b278f5SVaibhav Jain| r4-r10 | Y | in and out values | 7258b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7358b278f5SVaibhav Jain| r11 | Y | Optional-usage/Environmental pointer | 7458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7558b278f5SVaibhav Jain| r12 | Y | Optional-usage/Function entry address at | 7658b278f5SVaibhav Jain| | | global entry point | 7758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7858b278f5SVaibhav Jain| r13 | N | Thread-Pointer | 7958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8058b278f5SVaibhav Jain| r14-r31 | N | Local Variables | 8158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8258b278f5SVaibhav Jain| LR | Y | Link Register | 8358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8458b278f5SVaibhav Jain| CTR | Y | Loop Counter | 8558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8658b278f5SVaibhav Jain| XER | Y | Fixed-point exception register. | 8758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8858b278f5SVaibhav Jain| CR0-1 | Y | Condition register fields. | 8958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9058b278f5SVaibhav Jain| CR2-4 | N | Condition register fields. | 9158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9258b278f5SVaibhav Jain| CR5-7 | Y | Condition register fields. | 9358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9458b278f5SVaibhav Jain| Others | N | | 9558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9658b278f5SVaibhav Jain 9758b278f5SVaibhav JainDRC & DRC Indexes 9858b278f5SVaibhav Jain================= 9958b278f5SVaibhav Jain:: 10058b278f5SVaibhav Jain 10158b278f5SVaibhav Jain DR1 Guest 10258b278f5SVaibhav Jain +--+ +------------+ +---------+ 10358b278f5SVaibhav Jain | | <----> | | | User | 10458b278f5SVaibhav Jain +--+ DRC1 | | DRC | Space | 10558b278f5SVaibhav Jain | PAPR | Index +---------+ 10658b278f5SVaibhav Jain DR2 | Hypervisor | | | 10758b278f5SVaibhav Jain +--+ | | <-----> | Kernel | 10858b278f5SVaibhav Jain | | <----> | | Hcall | | 10958b278f5SVaibhav Jain +--+ DRC2 +------------+ +---------+ 11058b278f5SVaibhav Jain 11158b278f5SVaibhav JainPAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc 11258b278f5SVaibhav Jainavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated to 11358b278f5SVaibhav Jainan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) 11458b278f5SVaibhav Jainto manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number 11558b278f5SVaibhav Jaincalled DRC-Index. The DRC-index value is provided to the LPAR via device-tree 11658b278f5SVaibhav Jainwhere its present as an attribute in the device tree node associated with the 11758b278f5SVaibhav JainDR. 11858b278f5SVaibhav Jain 11958b278f5SVaibhav JainHCALL Return-values 12058b278f5SVaibhav Jain=================== 12158b278f5SVaibhav Jain 12258b278f5SVaibhav JainAfter servicing the hcall, hypervisor sets the return-value in *r3* indicating 12358b278f5SVaibhav Jainsuccess or failure of the hcall. In case of a failure an error code indicates 12458b278f5SVaibhav Jainthe cause for error. These codes are defined and documented in arch specific 12558b278f5SVaibhav Jainheader [4]_. 12658b278f5SVaibhav Jain 12758b278f5SVaibhav JainIn some cases a hcall can potentially take a long time and need to be issued 12858b278f5SVaibhav Jainmultiple times in order to be completely serviced. These hcalls will usually 12958b278f5SVaibhav Jainaccept an opaque value *continue-token* within there argument list and a 13058b278f5SVaibhav Jainreturn value of *H_CONTINUE* indicates that hypervisor hasn't still finished 13158b278f5SVaibhav Jainservicing the hcall yet. 13258b278f5SVaibhav Jain 13358b278f5SVaibhav JainTo make such hcalls the guest need to set *continue-token == 0* for the 13458b278f5SVaibhav Jaininitial call and use the hypervisor returned value of *continue-token* 13558b278f5SVaibhav Jainfor each subsequent hcall until hypervisor returns a non *H_CONTINUE* 13658b278f5SVaibhav Jainreturn value. 13758b278f5SVaibhav Jain 13858b278f5SVaibhav JainHCALL Op-codes 13958b278f5SVaibhav Jain============== 14058b278f5SVaibhav Jain 14158b278f5SVaibhav JainBelow is a partial list of HCALLs that are supported by PHYP. For the 14258b278f5SVaibhav Jaincorresponding opcode values please look into the arch specific header [4]_: 14358b278f5SVaibhav Jain 14458b278f5SVaibhav Jain**H_SCM_READ_METADATA** 14558b278f5SVaibhav Jain 14658b278f5SVaibhav Jain| Input: *drcIndex, offset, buffer-address, numBytesToRead* 14758b278f5SVaibhav Jain| Out: *numBytesRead* 14858b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* 14958b278f5SVaibhav Jain 150f8b42777SHe YingGiven a DRC Index of an NVDIMM, read N-bytes from the metadata area 15158b278f5SVaibhav Jainassociated with it, at a specified offset and copy it to provided buffer. 15258b278f5SVaibhav JainThe metadata area stores configuration information such as label information, 15358b278f5SVaibhav Jainbad-blocks etc. The metadata area is located out-of-band of NVDIMM storage 15458b278f5SVaibhav Jainarea hence a separate access semantics is provided. 15558b278f5SVaibhav Jain 15658b278f5SVaibhav Jain**H_SCM_WRITE_METADATA** 15758b278f5SVaibhav Jain 15858b278f5SVaibhav Jain| Input: *drcIndex, offset, data, numBytesToWrite* 15958b278f5SVaibhav Jain| Out: *None* 16058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* 16158b278f5SVaibhav Jain 16258b278f5SVaibhav JainGiven a DRC Index of an NVDIMM, write N-bytes to the metadata area 16358b278f5SVaibhav Jainassociated with it, at the specified offset and from the provided buffer. 16458b278f5SVaibhav Jain 16558b278f5SVaibhav Jain**H_SCM_BIND_MEM** 16658b278f5SVaibhav Jain 16758b278f5SVaibhav Jain| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* 16858b278f5SVaibhav Jain| *targetLogicalMemoryAddress, continue-token* 16958b278f5SVaibhav Jain| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* 17058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* 17158b278f5SVaibhav Jain| *H_Too_Big, H_P5, H_Busy* 17258b278f5SVaibhav Jain 17358b278f5SVaibhav JainGiven a DRC-Index of an NVDIMM, map a continuous SCM blocks range 17458b278f5SVaibhav Jain*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest 17558b278f5SVaibhav Jainat *targetLogicalMemoryAddress* within guest physical address space. In 17658b278f5SVaibhav Jaincase *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor 17758b278f5SVaibhav Jainassigns a target address to the guest. The HCALL can fail if the Guest has 17858b278f5SVaibhav Jainan active PTE entry to the SCM block being bound. 17958b278f5SVaibhav Jain 18058b278f5SVaibhav Jain**H_SCM_UNBIND_MEM** 18158b278f5SVaibhav Jain| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind 18258b278f5SVaibhav Jain| Out: numScmBlocksUnbound 18358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* 18458b278f5SVaibhav Jain| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 18558b278f5SVaibhav Jain 18658b278f5SVaibhav JainGiven a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting 18758b278f5SVaibhav Jainat *startingScmLogicalMemoryAddress* from guest physical address space. The 18858b278f5SVaibhav JainHCALL can fail if the Guest has an active PTE entry to the SCM block being 18958b278f5SVaibhav Jainunbound. 19058b278f5SVaibhav Jain 19158b278f5SVaibhav Jain**H_SCM_QUERY_BLOCK_MEM_BINDING** 19258b278f5SVaibhav Jain 19358b278f5SVaibhav Jain| Input: *drcIndex, scmBlockIndex* 19458b278f5SVaibhav Jain| Out: *Guest-Physical-Address* 19558b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 19658b278f5SVaibhav Jain 19758b278f5SVaibhav JainGiven a DRC-Index and an SCM Block index return the guest physical address to 19858b278f5SVaibhav Jainwhich the SCM block is mapped to. 19958b278f5SVaibhav Jain 20058b278f5SVaibhav Jain**H_SCM_QUERY_LOGICAL_MEM_BINDING** 20158b278f5SVaibhav Jain 20258b278f5SVaibhav Jain| Input: *Guest-Physical-Address* 20358b278f5SVaibhav Jain| Out: *drcIndex, scmBlockIndex* 20458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 20558b278f5SVaibhav Jain 20658b278f5SVaibhav JainGiven a guest physical address return which DRC Index and SCM block is mapped 20758b278f5SVaibhav Jainto that address. 20858b278f5SVaibhav Jain 20958b278f5SVaibhav Jain**H_SCM_UNBIND_ALL** 21058b278f5SVaibhav Jain 21158b278f5SVaibhav Jain| Input: *scmTargetScope, drcIndex* 21258b278f5SVaibhav Jain| Out: *None* 21358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* 21458b278f5SVaibhav Jain| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 21558b278f5SVaibhav Jain 21658b278f5SVaibhav JainDepending on the Target scope unmap all SCM blocks belonging to all NVDIMMs 21758b278f5SVaibhav Jainor all SCM blocks belonging to a single NVDIMM identified by its drcIndex 21858b278f5SVaibhav Jainfrom the LPAR memory. 21958b278f5SVaibhav Jain 22058b278f5SVaibhav Jain**H_SCM_HEALTH** 22158b278f5SVaibhav Jain 22258b278f5SVaibhav Jain| Input: drcIndex 223901e3490SVaibhav Jain| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* 22458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_Hardware* 22558b278f5SVaibhav Jain 22658b278f5SVaibhav JainGiven a DRC Index return the info on predictive failure and overall health of 227901e3490SVaibhav Jainthe PMEM device. The asserted bits in the health-bitmap indicate one or more states 228901e3490SVaibhav Jain(described in table below) of the PMEM device and health-bit-valid-bitmap indicate 229901e3490SVaibhav Jainwhich bits in health-bitmap are valid. The bits are reported in 230901e3490SVaibhav Jainreverse bit ordering for example a value of 0xC400000000000000 231901e3490SVaibhav Jainindicates bits 0, 1, and 5 are valid. 232901e3490SVaibhav Jain 233901e3490SVaibhav JainHealth Bitmap Flags: 234901e3490SVaibhav Jain 235901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 236901e3490SVaibhav Jain| Bit | Definition | 237901e3490SVaibhav Jain+======+=======================================================================+ 238901e3490SVaibhav Jain| 00 | PMEM device is unable to persist memory contents. | 239901e3490SVaibhav Jain| | If the system is powered down, nothing will be saved. | 240901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 241901e3490SVaibhav Jain| 01 | PMEM device failed to persist memory contents. Either contents were | 242901e3490SVaibhav Jain| | not saved successfully on power down or were not restored properly on | 243901e3490SVaibhav Jain| | power up. | 244901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 245901e3490SVaibhav Jain| 02 | PMEM device contents are persisted from previous IPL. The data from | 246901e3490SVaibhav Jain| | the last boot were successfully restored. | 247901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 248901e3490SVaibhav Jain| 03 | PMEM device contents are not persisted from previous IPL. There was no| 249901e3490SVaibhav Jain| | data to restore from the last boot. | 250901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 251901e3490SVaibhav Jain| 04 | PMEM device memory life remaining is critically low | 252901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 253901e3490SVaibhav Jain| 05 | PMEM device will be garded off next IPL due to failure | 254901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 255901e3490SVaibhav Jain| 06 | PMEM device contents cannot persist due to current platform health | 256901e3490SVaibhav Jain| | status. A hardware failure may prevent data from being saved or | 257901e3490SVaibhav Jain| | restored. | 258901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 259901e3490SVaibhav Jain| 07 | PMEM device is unable to persist memory contents in certain conditions| 260901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 261901e3490SVaibhav Jain| 08 | PMEM device is encrypted | 262901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 263901e3490SVaibhav Jain| 09 | PMEM device has successfully completed a requested erase or secure | 264901e3490SVaibhav Jain| | erase procedure. | 265901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 266901e3490SVaibhav Jain|10:63 | Reserved / Unused | 267901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 26858b278f5SVaibhav Jain 26958b278f5SVaibhav Jain**H_SCM_PERFORMANCE_STATS** 27058b278f5SVaibhav Jain 27158b278f5SVaibhav Jain| Input: drcIndex, resultBuffer Addr 27258b278f5SVaibhav Jain| Out: None 27358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* 27458b278f5SVaibhav Jain 27558b278f5SVaibhav JainGiven a DRC Index collect the performance statistics for NVDIMM and copy them 27658b278f5SVaibhav Jainto the resultBuffer. 27758b278f5SVaibhav Jain 27875b7c05eSShivaprasad G Bhat**H_SCM_FLUSH** 27975b7c05eSShivaprasad G Bhat 28075b7c05eSShivaprasad G Bhat| Input: *drcIndex, continue-token* 28175b7c05eSShivaprasad G Bhat| Out: *continue-token* 28275b7c05eSShivaprasad G Bhat| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY* 28375b7c05eSShivaprasad G Bhat 28475b7c05eSShivaprasad G BhatGiven a DRC Index Flush the data to backend NVDIMM device. 28575b7c05eSShivaprasad G Bhat 28675b7c05eSShivaprasad G BhatThe hcall returns H_BUSY when the flush takes longer time and the hcall needs 28775b7c05eSShivaprasad G Bhatto be issued multiple times in order to be completely serviced. The 28875b7c05eSShivaprasad G Bhat*continue-token* from the output to be passed in the argument list of 28975b7c05eSShivaprasad G Bhatsubsequent hcalls to the hypervisor until the hcall is completely serviced 29075b7c05eSShivaprasad G Bhatat which point H_SUCCESS or other error is returned by the hypervisor. 29175b7c05eSShivaprasad G Bhat 29258b278f5SVaibhav JainReferences 29358b278f5SVaibhav Jain========== 29458b278f5SVaibhav Jain.. [1] "Power Architecture Platform Reference" 29558b278f5SVaibhav Jain https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference 29658b278f5SVaibhav Jain.. [2] "Linux on Power Architecture Platform Reference" 29758b278f5SVaibhav Jain https://members.openpowerfoundation.org/document/dl/469 29858b278f5SVaibhav Jain.. [3] "Definitions and Notation" Book III-Section 14.5.3 29958b278f5SVaibhav Jain https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 30058b278f5SVaibhav Jain.. [4] arch/powerpc/include/asm/hvcall.h 30158b278f5SVaibhav Jain.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" 30258b278f5SVaibhav Jain https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture 303