158b278f5SVaibhav Jain.. SPDX-License-Identifier: GPL-2.0 258b278f5SVaibhav Jain 358b278f5SVaibhav Jain=========================== 458b278f5SVaibhav JainHypercall Op-codes (hcalls) 558b278f5SVaibhav Jain=========================== 658b278f5SVaibhav Jain 758b278f5SVaibhav JainOverview 858b278f5SVaibhav Jain========= 958b278f5SVaibhav Jain 1058b278f5SVaibhav JainVirtualization on 64-bit Power Book3S Platforms is based on the PAPR 1158b278f5SVaibhav Jainspecification [1]_ which describes the run-time environment for a guest 1258b278f5SVaibhav Jainoperating system and how it should interact with the hypervisor for 1358b278f5SVaibhav Jainprivileged operations. Currently there are two PAPR compliant hypervisors: 1458b278f5SVaibhav Jain 1558b278f5SVaibhav Jain- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, 1658b278f5SVaibhav Jain IBM-i and Linux as supported guests (termed as Logical Partitions 1758b278f5SVaibhav Jain or LPARS). It supports the full PAPR specification. 1858b278f5SVaibhav Jain 1958b278f5SVaibhav Jain- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. 2058b278f5SVaibhav Jain Though it only implements a subset of PAPR specification called LoPAPR [2]_. 2158b278f5SVaibhav Jain 2258b278f5SVaibhav JainOn PPC64 arch a guest kernel running on top of a PAPR hypervisor is called 2358b278f5SVaibhav Jaina *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must 2458b278f5SVaibhav Jainissue hypercalls to the hypervisor whenever it needs to perform an action 2558b278f5SVaibhav Jainthat is hypervisor priviledged [3]_ or for other services managed by the 2658b278f5SVaibhav Jainhypervisor. 2758b278f5SVaibhav Jain 2858b278f5SVaibhav JainHence a Hypercall (hcall) is essentially a request by the pseries guest 2958b278f5SVaibhav Jainasking hypervisor to perform a privileged operation on behalf of the guest. The 3058b278f5SVaibhav Jainguest issues a with necessary input operands. The hypervisor after performing 3158b278f5SVaibhav Jainthe privilege operation returns a status code and output operands back to the 3258b278f5SVaibhav Jainguest. 3358b278f5SVaibhav Jain 3458b278f5SVaibhav JainHCALL ABI 3558b278f5SVaibhav Jain========= 3658b278f5SVaibhav JainThe ABI specification for a hcall between a pseries guest and PAPR hypervisor 3758b278f5SVaibhav Jainis covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is 3858b278f5SVaibhav Jaindone via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* 3958b278f5SVaibhav Jainand any in-arguments for the hcall are provided in registers *r4-r12*. If values 4058b278f5SVaibhav Jainhave to be passed through a memory buffer, the data stored in that buffer should be 4158b278f5SVaibhav Jainin Big-endian byte order. 4258b278f5SVaibhav Jain 4358b278f5SVaibhav JainOnce control is returns back to the guest after hypervisor has serviced the 4458b278f5SVaibhav Jain'HVCS' instruction the return value of the hcall is available in *r3* and any 4558b278f5SVaibhav Jainout values are returned in registers *r4-r12*. Again like in case of in-arguments, 4658b278f5SVaibhav Jainany out values stored in a memory buffer will be in Big-endian byte order. 4758b278f5SVaibhav Jain 4858b278f5SVaibhav JainPowerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined 4958b278f5SVaibhav Jainin a arch specific header [4]_ to issue hcalls from the linux kernel 5058b278f5SVaibhav Jainrunning as pseries guest. 5158b278f5SVaibhav Jain 5258b278f5SVaibhav JainRegister Conventions 5358b278f5SVaibhav Jain==================== 5458b278f5SVaibhav Jain 5558b278f5SVaibhav JainAny hcall should follow same register convention as described in section 2.2.1.1 5658b278f5SVaibhav Jainof "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below 5758b278f5SVaibhav Jainsummarizes these conventions: 5858b278f5SVaibhav Jain 5958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6058b278f5SVaibhav Jain| Register |Volatile | Purpose | 6158b278f5SVaibhav Jain| Range |(Y/N) | | 6258b278f5SVaibhav Jain+==========+==========+===========================================+ 6358b278f5SVaibhav Jain| r0 | Y | Optional-usage | 6458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6558b278f5SVaibhav Jain| r1 | N | Stack Pointer | 6658b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6758b278f5SVaibhav Jain| r2 | N | TOC | 6858b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 6958b278f5SVaibhav Jain| r3 | Y | hcall opcode/return value | 7058b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7158b278f5SVaibhav Jain| r4-r10 | Y | in and out values | 7258b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7358b278f5SVaibhav Jain| r11 | Y | Optional-usage/Environmental pointer | 7458b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7558b278f5SVaibhav Jain| r12 | Y | Optional-usage/Function entry address at | 7658b278f5SVaibhav Jain| | | global entry point | 7758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 7858b278f5SVaibhav Jain| r13 | N | Thread-Pointer | 7958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8058b278f5SVaibhav Jain| r14-r31 | N | Local Variables | 8158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8258b278f5SVaibhav Jain| LR | Y | Link Register | 8358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8458b278f5SVaibhav Jain| CTR | Y | Loop Counter | 8558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8658b278f5SVaibhav Jain| XER | Y | Fixed-point exception register. | 8758b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 8858b278f5SVaibhav Jain| CR0-1 | Y | Condition register fields. | 8958b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9058b278f5SVaibhav Jain| CR2-4 | N | Condition register fields. | 9158b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9258b278f5SVaibhav Jain| CR5-7 | Y | Condition register fields. | 9358b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9458b278f5SVaibhav Jain| Others | N | | 9558b278f5SVaibhav Jain+----------+----------+-------------------------------------------+ 9658b278f5SVaibhav Jain 9758b278f5SVaibhav JainDRC & DRC Indexes 9858b278f5SVaibhav Jain================= 9958b278f5SVaibhav Jain:: 10058b278f5SVaibhav Jain 10158b278f5SVaibhav Jain DR1 Guest 10258b278f5SVaibhav Jain +--+ +------------+ +---------+ 10358b278f5SVaibhav Jain | | <----> | | | User | 10458b278f5SVaibhav Jain +--+ DRC1 | | DRC | Space | 10558b278f5SVaibhav Jain | PAPR | Index +---------+ 10658b278f5SVaibhav Jain DR2 | Hypervisor | | | 10758b278f5SVaibhav Jain +--+ | | <-----> | Kernel | 10858b278f5SVaibhav Jain | | <----> | | Hcall | | 10958b278f5SVaibhav Jain +--+ DRC2 +------------+ +---------+ 11058b278f5SVaibhav Jain 11158b278f5SVaibhav JainPAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc 11258b278f5SVaibhav Jainavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated to 11358b278f5SVaibhav Jainan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) 11458b278f5SVaibhav Jainto manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number 11558b278f5SVaibhav Jaincalled DRC-Index. The DRC-index value is provided to the LPAR via device-tree 11658b278f5SVaibhav Jainwhere its present as an attribute in the device tree node associated with the 11758b278f5SVaibhav JainDR. 11858b278f5SVaibhav Jain 11958b278f5SVaibhav JainHCALL Return-values 12058b278f5SVaibhav Jain=================== 12158b278f5SVaibhav Jain 12258b278f5SVaibhav JainAfter servicing the hcall, hypervisor sets the return-value in *r3* indicating 12358b278f5SVaibhav Jainsuccess or failure of the hcall. In case of a failure an error code indicates 12458b278f5SVaibhav Jainthe cause for error. These codes are defined and documented in arch specific 12558b278f5SVaibhav Jainheader [4]_. 12658b278f5SVaibhav Jain 12758b278f5SVaibhav JainIn some cases a hcall can potentially take a long time and need to be issued 12858b278f5SVaibhav Jainmultiple times in order to be completely serviced. These hcalls will usually 12958b278f5SVaibhav Jainaccept an opaque value *continue-token* within there argument list and a 13058b278f5SVaibhav Jainreturn value of *H_CONTINUE* indicates that hypervisor hasn't still finished 13158b278f5SVaibhav Jainservicing the hcall yet. 13258b278f5SVaibhav Jain 13358b278f5SVaibhav JainTo make such hcalls the guest need to set *continue-token == 0* for the 13458b278f5SVaibhav Jaininitial call and use the hypervisor returned value of *continue-token* 13558b278f5SVaibhav Jainfor each subsequent hcall until hypervisor returns a non *H_CONTINUE* 13658b278f5SVaibhav Jainreturn value. 13758b278f5SVaibhav Jain 13858b278f5SVaibhav JainHCALL Op-codes 13958b278f5SVaibhav Jain============== 14058b278f5SVaibhav Jain 14158b278f5SVaibhav JainBelow is a partial list of HCALLs that are supported by PHYP. For the 14258b278f5SVaibhav Jaincorresponding opcode values please look into the arch specific header [4]_: 14358b278f5SVaibhav Jain 14458b278f5SVaibhav Jain**H_SCM_READ_METADATA** 14558b278f5SVaibhav Jain 14658b278f5SVaibhav Jain| Input: *drcIndex, offset, buffer-address, numBytesToRead* 14758b278f5SVaibhav Jain| Out: *numBytesRead* 14858b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* 14958b278f5SVaibhav Jain 15058b278f5SVaibhav JainGiven a DRC Index of an NVDIMM, read N-bytes from the the metadata area 15158b278f5SVaibhav Jainassociated with it, at a specified offset and copy it to provided buffer. 15258b278f5SVaibhav JainThe metadata area stores configuration information such as label information, 15358b278f5SVaibhav Jainbad-blocks etc. The metadata area is located out-of-band of NVDIMM storage 15458b278f5SVaibhav Jainarea hence a separate access semantics is provided. 15558b278f5SVaibhav Jain 15658b278f5SVaibhav Jain**H_SCM_WRITE_METADATA** 15758b278f5SVaibhav Jain 15858b278f5SVaibhav Jain| Input: *drcIndex, offset, data, numBytesToWrite* 15958b278f5SVaibhav Jain| Out: *None* 16058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* 16158b278f5SVaibhav Jain 16258b278f5SVaibhav JainGiven a DRC Index of an NVDIMM, write N-bytes to the metadata area 16358b278f5SVaibhav Jainassociated with it, at the specified offset and from the provided buffer. 16458b278f5SVaibhav Jain 16558b278f5SVaibhav Jain**H_SCM_BIND_MEM** 16658b278f5SVaibhav Jain 16758b278f5SVaibhav Jain| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* 16858b278f5SVaibhav Jain| *targetLogicalMemoryAddress, continue-token* 16958b278f5SVaibhav Jain| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* 17058b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* 17158b278f5SVaibhav Jain| *H_Too_Big, H_P5, H_Busy* 17258b278f5SVaibhav Jain 17358b278f5SVaibhav JainGiven a DRC-Index of an NVDIMM, map a continuous SCM blocks range 17458b278f5SVaibhav Jain*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest 17558b278f5SVaibhav Jainat *targetLogicalMemoryAddress* within guest physical address space. In 17658b278f5SVaibhav Jaincase *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor 17758b278f5SVaibhav Jainassigns a target address to the guest. The HCALL can fail if the Guest has 17858b278f5SVaibhav Jainan active PTE entry to the SCM block being bound. 17958b278f5SVaibhav Jain 18058b278f5SVaibhav Jain**H_SCM_UNBIND_MEM** 18158b278f5SVaibhav Jain| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind 18258b278f5SVaibhav Jain| Out: numScmBlocksUnbound 18358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* 18458b278f5SVaibhav Jain| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 18558b278f5SVaibhav Jain 18658b278f5SVaibhav JainGiven a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting 18758b278f5SVaibhav Jainat *startingScmLogicalMemoryAddress* from guest physical address space. The 18858b278f5SVaibhav JainHCALL can fail if the Guest has an active PTE entry to the SCM block being 18958b278f5SVaibhav Jainunbound. 19058b278f5SVaibhav Jain 19158b278f5SVaibhav Jain**H_SCM_QUERY_BLOCK_MEM_BINDING** 19258b278f5SVaibhav Jain 19358b278f5SVaibhav Jain| Input: *drcIndex, scmBlockIndex* 19458b278f5SVaibhav Jain| Out: *Guest-Physical-Address* 19558b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 19658b278f5SVaibhav Jain 19758b278f5SVaibhav JainGiven a DRC-Index and an SCM Block index return the guest physical address to 19858b278f5SVaibhav Jainwhich the SCM block is mapped to. 19958b278f5SVaibhav Jain 20058b278f5SVaibhav Jain**H_SCM_QUERY_LOGICAL_MEM_BINDING** 20158b278f5SVaibhav Jain 20258b278f5SVaibhav Jain| Input: *Guest-Physical-Address* 20358b278f5SVaibhav Jain| Out: *drcIndex, scmBlockIndex* 20458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 20558b278f5SVaibhav Jain 20658b278f5SVaibhav JainGiven a guest physical address return which DRC Index and SCM block is mapped 20758b278f5SVaibhav Jainto that address. 20858b278f5SVaibhav Jain 20958b278f5SVaibhav Jain**H_SCM_UNBIND_ALL** 21058b278f5SVaibhav Jain 21158b278f5SVaibhav Jain| Input: *scmTargetScope, drcIndex* 21258b278f5SVaibhav Jain| Out: *None* 21358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* 21458b278f5SVaibhav Jain| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 21558b278f5SVaibhav Jain 21658b278f5SVaibhav JainDepending on the Target scope unmap all SCM blocks belonging to all NVDIMMs 21758b278f5SVaibhav Jainor all SCM blocks belonging to a single NVDIMM identified by its drcIndex 21858b278f5SVaibhav Jainfrom the LPAR memory. 21958b278f5SVaibhav Jain 22058b278f5SVaibhav Jain**H_SCM_HEALTH** 22158b278f5SVaibhav Jain 22258b278f5SVaibhav Jain| Input: drcIndex 223*901e3490SVaibhav Jain| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* 22458b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_Hardware* 22558b278f5SVaibhav Jain 22658b278f5SVaibhav JainGiven a DRC Index return the info on predictive failure and overall health of 227*901e3490SVaibhav Jainthe PMEM device. The asserted bits in the health-bitmap indicate one or more states 228*901e3490SVaibhav Jain(described in table below) of the PMEM device and health-bit-valid-bitmap indicate 229*901e3490SVaibhav Jainwhich bits in health-bitmap are valid. The bits are reported in 230*901e3490SVaibhav Jainreverse bit ordering for example a value of 0xC400000000000000 231*901e3490SVaibhav Jainindicates bits 0, 1, and 5 are valid. 232*901e3490SVaibhav Jain 233*901e3490SVaibhav JainHealth Bitmap Flags: 234*901e3490SVaibhav Jain 235*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 236*901e3490SVaibhav Jain| Bit | Definition | 237*901e3490SVaibhav Jain+======+=======================================================================+ 238*901e3490SVaibhav Jain| 00 | PMEM device is unable to persist memory contents. | 239*901e3490SVaibhav Jain| | If the system is powered down, nothing will be saved. | 240*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 241*901e3490SVaibhav Jain| 01 | PMEM device failed to persist memory contents. Either contents were | 242*901e3490SVaibhav Jain| | not saved successfully on power down or were not restored properly on | 243*901e3490SVaibhav Jain| | power up. | 244*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 245*901e3490SVaibhav Jain| 02 | PMEM device contents are persisted from previous IPL. The data from | 246*901e3490SVaibhav Jain| | the last boot were successfully restored. | 247*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 248*901e3490SVaibhav Jain| 03 | PMEM device contents are not persisted from previous IPL. There was no| 249*901e3490SVaibhav Jain| | data to restore from the last boot. | 250*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 251*901e3490SVaibhav Jain| 04 | PMEM device memory life remaining is critically low | 252*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 253*901e3490SVaibhav Jain| 05 | PMEM device will be garded off next IPL due to failure | 254*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 255*901e3490SVaibhav Jain| 06 | PMEM device contents cannot persist due to current platform health | 256*901e3490SVaibhav Jain| | status. A hardware failure may prevent data from being saved or | 257*901e3490SVaibhav Jain| | restored. | 258*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 259*901e3490SVaibhav Jain| 07 | PMEM device is unable to persist memory contents in certain conditions| 260*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 261*901e3490SVaibhav Jain| 08 | PMEM device is encrypted | 262*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 263*901e3490SVaibhav Jain| 09 | PMEM device has successfully completed a requested erase or secure | 264*901e3490SVaibhav Jain| | erase procedure. | 265*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 266*901e3490SVaibhav Jain|10:63 | Reserved / Unused | 267*901e3490SVaibhav Jain+------+-----------------------------------------------------------------------+ 26858b278f5SVaibhav Jain 26958b278f5SVaibhav Jain**H_SCM_PERFORMANCE_STATS** 27058b278f5SVaibhav Jain 27158b278f5SVaibhav Jain| Input: drcIndex, resultBuffer Addr 27258b278f5SVaibhav Jain| Out: None 27358b278f5SVaibhav Jain| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* 27458b278f5SVaibhav Jain 27558b278f5SVaibhav JainGiven a DRC Index collect the performance statistics for NVDIMM and copy them 27658b278f5SVaibhav Jainto the resultBuffer. 27758b278f5SVaibhav Jain 27858b278f5SVaibhav JainReferences 27958b278f5SVaibhav Jain========== 28058b278f5SVaibhav Jain.. [1] "Power Architecture Platform Reference" 28158b278f5SVaibhav Jain https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference 28258b278f5SVaibhav Jain.. [2] "Linux on Power Architecture Platform Reference" 28358b278f5SVaibhav Jain https://members.openpowerfoundation.org/document/dl/469 28458b278f5SVaibhav Jain.. [3] "Definitions and Notation" Book III-Section 14.5.3 28558b278f5SVaibhav Jain https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 28658b278f5SVaibhav Jain.. [4] arch/powerpc/include/asm/hvcall.h 28758b278f5SVaibhav Jain.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" 28858b278f5SVaibhav Jain https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture 289