14d2e26a3SMauro Carvalho Chehab============================ 24d2e26a3SMauro Carvalho ChehabTransactional Memory support 34d2e26a3SMauro Carvalho Chehab============================ 44d2e26a3SMauro Carvalho Chehab 54d2e26a3SMauro Carvalho ChehabPOWER kernel support for this feature is currently limited to supporting 64d2e26a3SMauro Carvalho Chehabits use by user programs. It is not currently used by the kernel itself. 74d2e26a3SMauro Carvalho Chehab 84d2e26a3SMauro Carvalho ChehabThis file aims to sum up how it is supported by Linux and what behaviour you 94d2e26a3SMauro Carvalho Chehabcan expect from your user programs. 104d2e26a3SMauro Carvalho Chehab 114d2e26a3SMauro Carvalho Chehab 124d2e26a3SMauro Carvalho ChehabBasic overview 134d2e26a3SMauro Carvalho Chehab============== 144d2e26a3SMauro Carvalho Chehab 154d2e26a3SMauro Carvalho ChehabHardware Transactional Memory is supported on POWER8 processors, and is a 164d2e26a3SMauro Carvalho Chehabfeature that enables a different form of atomic memory access. Several new 174d2e26a3SMauro Carvalho Chehabinstructions are presented to delimit transactions; transactions are 184d2e26a3SMauro Carvalho Chehabguaranteed to either complete atomically or roll back and undo any partial 194d2e26a3SMauro Carvalho Chehabchanges. 204d2e26a3SMauro Carvalho Chehab 214d2e26a3SMauro Carvalho ChehabA simple transaction looks like this:: 224d2e26a3SMauro Carvalho Chehab 234d2e26a3SMauro Carvalho Chehab begin_move_money: 244d2e26a3SMauro Carvalho Chehab tbegin 254d2e26a3SMauro Carvalho Chehab beq abort_handler 264d2e26a3SMauro Carvalho Chehab 274d2e26a3SMauro Carvalho Chehab ld r4, SAVINGS_ACCT(r3) 284d2e26a3SMauro Carvalho Chehab ld r5, CURRENT_ACCT(r3) 294d2e26a3SMauro Carvalho Chehab subi r5, r5, 1 304d2e26a3SMauro Carvalho Chehab addi r4, r4, 1 314d2e26a3SMauro Carvalho Chehab std r4, SAVINGS_ACCT(r3) 324d2e26a3SMauro Carvalho Chehab std r5, CURRENT_ACCT(r3) 334d2e26a3SMauro Carvalho Chehab 344d2e26a3SMauro Carvalho Chehab tend 354d2e26a3SMauro Carvalho Chehab 364d2e26a3SMauro Carvalho Chehab b continue 374d2e26a3SMauro Carvalho Chehab 384d2e26a3SMauro Carvalho Chehab abort_handler: 394d2e26a3SMauro Carvalho Chehab ... test for odd failures ... 404d2e26a3SMauro Carvalho Chehab 414d2e26a3SMauro Carvalho Chehab /* Retry the transaction if it failed because it conflicted with 424d2e26a3SMauro Carvalho Chehab * someone else: */ 434d2e26a3SMauro Carvalho Chehab b begin_move_money 444d2e26a3SMauro Carvalho Chehab 454d2e26a3SMauro Carvalho Chehab 464d2e26a3SMauro Carvalho ChehabThe 'tbegin' instruction denotes the start point, and 'tend' the end point. 474d2e26a3SMauro Carvalho ChehabBetween these points the processor is in 'Transactional' state; any memory 484d2e26a3SMauro Carvalho Chehabreferences will complete in one go if there are no conflicts with other 494d2e26a3SMauro Carvalho Chehabtransactional or non-transactional accesses within the system. In this 504d2e26a3SMauro Carvalho Chehabexample, the transaction completes as though it were normal straight-line code 514d2e26a3SMauro Carvalho ChehabIF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an 524d2e26a3SMauro Carvalho Chehabatomic move of money from the current account to the savings account has been 534d2e26a3SMauro Carvalho Chehabperformed. Even though the normal ld/std instructions are used (note no 544d2e26a3SMauro Carvalho Chehablwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be 554d2e26a3SMauro Carvalho Chehabupdated, or neither will be updated. 564d2e26a3SMauro Carvalho Chehab 574d2e26a3SMauro Carvalho ChehabIf, in the meantime, there is a conflict with the locations accessed by the 584d2e26a3SMauro Carvalho Chehabtransaction, the transaction will be aborted by the CPU. Register and memory 594d2e26a3SMauro Carvalho Chehabstate will roll back to that at the 'tbegin', and control will continue from 604d2e26a3SMauro Carvalho Chehab'tbegin+4'. The branch to abort_handler will be taken this second time; the 614d2e26a3SMauro Carvalho Chehababort handler can check the cause of the failure, and retry. 624d2e26a3SMauro Carvalho Chehab 634d2e26a3SMauro Carvalho ChehabCheckpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR 644d2e26a3SMauro Carvalho Chehaband a few other status/flag regs; see the ISA for details. 654d2e26a3SMauro Carvalho Chehab 664d2e26a3SMauro Carvalho ChehabCauses of transaction aborts 674d2e26a3SMauro Carvalho Chehab============================ 684d2e26a3SMauro Carvalho Chehab 694d2e26a3SMauro Carvalho Chehab- Conflicts with cache lines used by other processors 704d2e26a3SMauro Carvalho Chehab- Signals 714d2e26a3SMauro Carvalho Chehab- Context switches 724d2e26a3SMauro Carvalho Chehab- See the ISA for full documentation of everything that will abort transactions. 734d2e26a3SMauro Carvalho Chehab 744d2e26a3SMauro Carvalho Chehab 754d2e26a3SMauro Carvalho ChehabSyscalls 764d2e26a3SMauro Carvalho Chehab======== 774d2e26a3SMauro Carvalho Chehab 784d2e26a3SMauro Carvalho ChehabSyscalls made from within an active transaction will not be performed and the 794d2e26a3SMauro Carvalho Chehabtransaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL 804d2e26a3SMauro Carvalho Chehab| TM_CAUSE_PERSISTENT. 814d2e26a3SMauro Carvalho Chehab 824d2e26a3SMauro Carvalho ChehabSyscalls made from within a suspended transaction are performed as normal and 834d2e26a3SMauro Carvalho Chehabthe transaction is not explicitly doomed by the kernel. However, what the 844d2e26a3SMauro Carvalho Chehabkernel does to perform the syscall may result in the transaction being doomed 854d2e26a3SMauro Carvalho Chehabby the hardware. The syscall is performed in suspended mode so any side 864d2e26a3SMauro Carvalho Chehabeffects will be persistent, independent of transaction success or failure. No 874d2e26a3SMauro Carvalho Chehabguarantees are provided by the kernel about which syscalls will affect 884d2e26a3SMauro Carvalho Chehabtransaction success. 894d2e26a3SMauro Carvalho Chehab 904d2e26a3SMauro Carvalho ChehabCare must be taken when relying on syscalls to abort during active transactions 914d2e26a3SMauro Carvalho Chehabif the calls are made via a library. Libraries may cache values (which may 924d2e26a3SMauro Carvalho Chehabgive the appearance of success) or perform operations that cause transaction 934d2e26a3SMauro Carvalho Chehabfailure before entering the kernel (which may produce different failure codes). 944d2e26a3SMauro Carvalho ChehabExamples are glibc's getpid() and lazy symbol resolution. 954d2e26a3SMauro Carvalho Chehab 964d2e26a3SMauro Carvalho Chehab 974d2e26a3SMauro Carvalho ChehabSignals 984d2e26a3SMauro Carvalho Chehab======= 994d2e26a3SMauro Carvalho Chehab 1004d2e26a3SMauro Carvalho ChehabDelivery of signals (both sync and async) during transactions provides a second 1014d2e26a3SMauro Carvalho Chehabthread state (ucontext/mcontext) to represent the second transactional register 1024d2e26a3SMauro Carvalho Chehabstate. Signal delivery 'treclaim's to capture both register states, so signals 1034d2e26a3SMauro Carvalho Chehababort transactions. The usual ucontext_t passed to the signal handler 1044d2e26a3SMauro Carvalho Chehabrepresents the checkpointed/original register state; the signal appears to have 1054d2e26a3SMauro Carvalho Chehabarisen at 'tbegin+4'. 1064d2e26a3SMauro Carvalho Chehab 1074d2e26a3SMauro Carvalho ChehabIf the sighandler ucontext has uc_link set, a second ucontext has been 1084d2e26a3SMauro Carvalho Chehabdelivered. For future compatibility the MSR.TS field should be checked to 1094d2e26a3SMauro Carvalho Chehabdetermine the transactional state -- if so, the second ucontext in uc->uc_link 1104d2e26a3SMauro Carvalho Chehabrepresents the active transactional registers at the point of the signal. 1114d2e26a3SMauro Carvalho Chehab 1124d2e26a3SMauro Carvalho ChehabFor 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS 1134d2e26a3SMauro Carvalho Chehabfield shows the transactional mode. 1144d2e26a3SMauro Carvalho Chehab 1154d2e26a3SMauro Carvalho ChehabFor 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32 1164d2e26a3SMauro Carvalho Chehabbits are stored in the MSR of the second ucontext, i.e. in 1174d2e26a3SMauro Carvalho Chehabuc->uc_link->uc_mcontext.regs->msr. The top word contains the transactional 1184d2e26a3SMauro Carvalho Chehabstate TS. 1194d2e26a3SMauro Carvalho Chehab 1204d2e26a3SMauro Carvalho ChehabHowever, basic signal handlers don't need to be aware of transactions 1214d2e26a3SMauro Carvalho Chehaband simply returning from the handler will deal with things correctly: 1224d2e26a3SMauro Carvalho Chehab 1234d2e26a3SMauro Carvalho ChehabTransaction-aware signal handlers can read the transactional register state 1244d2e26a3SMauro Carvalho Chehabfrom the second ucontext. This will be necessary for crash handlers to 1254d2e26a3SMauro Carvalho Chehabdetermine, for example, the address of the instruction causing the SIGSEGV. 1264d2e26a3SMauro Carvalho Chehab 1274d2e26a3SMauro Carvalho ChehabExample signal handler:: 1284d2e26a3SMauro Carvalho Chehab 1294d2e26a3SMauro Carvalho Chehab void crash_handler(int sig, siginfo_t *si, void *uc) 1304d2e26a3SMauro Carvalho Chehab { 1314d2e26a3SMauro Carvalho Chehab ucontext_t *ucp = uc; 1324d2e26a3SMauro Carvalho Chehab ucontext_t *transactional_ucp = ucp->uc_link; 1334d2e26a3SMauro Carvalho Chehab 1344d2e26a3SMauro Carvalho Chehab if (ucp_link) { 1354d2e26a3SMauro Carvalho Chehab u64 msr = ucp->uc_mcontext.regs->msr; 1364d2e26a3SMauro Carvalho Chehab /* May have transactional ucontext! */ 1374d2e26a3SMauro Carvalho Chehab #ifndef __powerpc64__ 1384d2e26a3SMauro Carvalho Chehab msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32; 1394d2e26a3SMauro Carvalho Chehab #endif 1404d2e26a3SMauro Carvalho Chehab if (MSR_TM_ACTIVE(msr)) { 1414d2e26a3SMauro Carvalho Chehab /* Yes, we crashed during a transaction. Oops. */ 1424d2e26a3SMauro Carvalho Chehab fprintf(stderr, "Transaction to be restarted at 0x%llx, but " 1434d2e26a3SMauro Carvalho Chehab "crashy instruction was at 0x%llx\n", 1444d2e26a3SMauro Carvalho Chehab ucp->uc_mcontext.regs->nip, 1454d2e26a3SMauro Carvalho Chehab transactional_ucp->uc_mcontext.regs->nip); 1464d2e26a3SMauro Carvalho Chehab } 1474d2e26a3SMauro Carvalho Chehab } 1484d2e26a3SMauro Carvalho Chehab 1494d2e26a3SMauro Carvalho Chehab fix_the_problem(ucp->dar); 1504d2e26a3SMauro Carvalho Chehab } 1514d2e26a3SMauro Carvalho Chehab 1524d2e26a3SMauro Carvalho ChehabWhen in an active transaction that takes a signal, we need to be careful with 1534d2e26a3SMauro Carvalho Chehabthe stack. It's possible that the stack has moved back up after the tbegin. 1544d2e26a3SMauro Carvalho ChehabThe obvious case here is when the tbegin is called inside a function that 1554d2e26a3SMauro Carvalho Chehabreturns before a tend. In this case, the stack is part of the checkpointed 1564d2e26a3SMauro Carvalho Chehabtransactional memory state. If we write over this non transactionally or in 1574d2e26a3SMauro Carvalho Chehabsuspend, we are in trouble because if we get a tm abort, the program counter and 1584d2e26a3SMauro Carvalho Chehabstack pointer will be back at the tbegin but our in memory stack won't be valid 1594d2e26a3SMauro Carvalho Chehabanymore. 1604d2e26a3SMauro Carvalho Chehab 1614d2e26a3SMauro Carvalho ChehabTo avoid this, when taking a signal in an active transaction, we need to use 1624d2e26a3SMauro Carvalho Chehabthe stack pointer from the checkpointed state, rather than the speculated 1634d2e26a3SMauro Carvalho Chehabstate. This ensures that the signal context (written tm suspended) will be 1644d2e26a3SMauro Carvalho Chehabwritten below the stack required for the rollback. The transaction is aborted 1654d2e26a3SMauro Carvalho Chehabbecause of the treclaim, so any memory written between the tbegin and the 1664d2e26a3SMauro Carvalho Chehabsignal will be rolled back anyway. 1674d2e26a3SMauro Carvalho Chehab 1684d2e26a3SMauro Carvalho ChehabFor signals taken in non-TM or suspended mode, we use the 1694d2e26a3SMauro Carvalho Chehabnormal/non-checkpointed stack pointer. 1704d2e26a3SMauro Carvalho Chehab 1714d2e26a3SMauro Carvalho ChehabAny transaction initiated inside a sighandler and suspended on return 1724d2e26a3SMauro Carvalho Chehabfrom the sighandler to the kernel will get reclaimed and discarded. 1734d2e26a3SMauro Carvalho Chehab 1744d2e26a3SMauro Carvalho ChehabFailure cause codes used by kernel 1754d2e26a3SMauro Carvalho Chehab================================== 1764d2e26a3SMauro Carvalho Chehab 1774d2e26a3SMauro Carvalho ChehabThese are defined in <asm/reg.h>, and distinguish different reasons why the 1784d2e26a3SMauro Carvalho Chehabkernel aborted a transaction: 1794d2e26a3SMauro Carvalho Chehab 1804d2e26a3SMauro Carvalho Chehab ====================== ================================ 1814d2e26a3SMauro Carvalho Chehab TM_CAUSE_RESCHED Thread was rescheduled. 1824d2e26a3SMauro Carvalho Chehab TM_CAUSE_TLBI Software TLB invalid. 1834d2e26a3SMauro Carvalho Chehab TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. 1844d2e26a3SMauro Carvalho Chehab TM_CAUSE_SYSCALL Syscall from active transaction. 1854d2e26a3SMauro Carvalho Chehab TM_CAUSE_SIGNAL Signal delivered. 1864d2e26a3SMauro Carvalho Chehab TM_CAUSE_MISC Currently unused. 1874d2e26a3SMauro Carvalho Chehab TM_CAUSE_ALIGNMENT Alignment fault. 1884d2e26a3SMauro Carvalho Chehab TM_CAUSE_EMULATE Emulation that touched memory. 1894d2e26a3SMauro Carvalho Chehab ====================== ================================ 1904d2e26a3SMauro Carvalho Chehab 1914d2e26a3SMauro Carvalho ChehabThese can be checked by the user program's abort handler as TEXASR[0:7]. If 192*f8b42777SHe Yingbit 7 is set, it indicates that the error is considered persistent. For example 1934d2e26a3SMauro Carvalho Chehaba TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not. 1944d2e26a3SMauro Carvalho Chehab 1954d2e26a3SMauro Carvalho ChehabGDB 1964d2e26a3SMauro Carvalho Chehab=== 1974d2e26a3SMauro Carvalho Chehab 1984d2e26a3SMauro Carvalho ChehabGDB and ptrace are not currently TM-aware. If one stops during a transaction, 1994d2e26a3SMauro Carvalho Chehabit looks like the transaction has just started (the checkpointed state is 2004d2e26a3SMauro Carvalho Chehabpresented). The transaction cannot then be continued and will take the failure 2014d2e26a3SMauro Carvalho Chehabhandler route. Furthermore, the transactional 2nd register state will be 2024d2e26a3SMauro Carvalho Chehabinaccessible. GDB can currently be used on programs using TM, but not sensibly 2034d2e26a3SMauro Carvalho Chehabin parts within transactions. 2044d2e26a3SMauro Carvalho Chehab 2054d2e26a3SMauro Carvalho ChehabPOWER9 2064d2e26a3SMauro Carvalho Chehab====== 2074d2e26a3SMauro Carvalho Chehab 2084d2e26a3SMauro Carvalho ChehabTM on POWER9 has issues with storing the complete register state. This 2094d2e26a3SMauro Carvalho Chehabis described in this commit:: 2104d2e26a3SMauro Carvalho Chehab 2114d2e26a3SMauro Carvalho Chehab commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7 2124d2e26a3SMauro Carvalho Chehab Author: Paul Mackerras <paulus@ozlabs.org> 2134d2e26a3SMauro Carvalho Chehab Date: Wed Mar 21 21:32:01 2018 +1100 2144d2e26a3SMauro Carvalho Chehab KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 2154d2e26a3SMauro Carvalho Chehab 2164d2e26a3SMauro Carvalho ChehabTo account for this different POWER9 chips have TM enabled in 2174d2e26a3SMauro Carvalho Chehabdifferent ways. 2184d2e26a3SMauro Carvalho Chehab 2194d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.01 and below, TM is disabled. ie 2204d2e26a3SMauro Carvalho ChehabHWCAP2[PPC_FEATURE2_HTM] is not set. 2214d2e26a3SMauro Carvalho Chehab 2224d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.1 TM is configured by firmware to always abort a 2234d2e26a3SMauro Carvalho Chehabtransaction when tm suspend occurs. So tsuspend will cause a 2244d2e26a3SMauro Carvalho Chehabtransaction to be aborted and rolled back. Kernel exceptions will also 2254d2e26a3SMauro Carvalho Chehabcause the transaction to be aborted and rolled back and the exception 2264d2e26a3SMauro Carvalho Chehabwill not occur. If userspace constructs a sigcontext that enables TM 2274d2e26a3SMauro Carvalho Chehabsuspend, the sigcontext will be rejected by the kernel. This mode is 2284d2e26a3SMauro Carvalho Chehabadvertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set. 2294d2e26a3SMauro Carvalho ChehabHWCAP2[PPC_FEATURE2_HTM] is not set in this mode. 2304d2e26a3SMauro Carvalho Chehab 2314d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as 2324d2e26a3SMauro Carvalho Chehabdescribed in commit 4bb3c7a0208f), hence TM is enabled for guests 2334d2e26a3SMauro Carvalho Chehabie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that 2344d2e26a3SMauro Carvalho Chehabmakes heavy use of TM suspend (tsuspend or kernel suspend) will result 2354d2e26a3SMauro Carvalho Chehabin traps into the hypervisor and hence will suffer a performance 2364d2e26a3SMauro Carvalho Chehabdegradation. Host userspace has TM disabled 2374d2e26a3SMauro Carvalho Chehabie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it 2384d2e26a3SMauro Carvalho Chehabat some point in the future if we bring the emulation into host 2394d2e26a3SMauro Carvalho Chehabuserspace context switching). 2404d2e26a3SMauro Carvalho Chehab 2414d2e26a3SMauro Carvalho ChehabPOWER9C DD1.2 and above are only available with POWERVM and hence 2424d2e26a3SMauro Carvalho ChehabLinux only runs as a guest. On these systems TM is emulated like on 2434d2e26a3SMauro Carvalho ChehabPOWER9N DD2.2. 2444d2e26a3SMauro Carvalho Chehab 2454d2e26a3SMauro Carvalho ChehabGuest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and 2464d2e26a3SMauro Carvalho ChehabPOWER9C DD1.2. Since earlier POWER9 processors don't support TM 2474d2e26a3SMauro Carvalho Chehabemulation, migration from POWER8 to POWER9 is not supported there. 248b8707e23SMichael Neuling 249b8707e23SMichael NeulingKernel implementation 250b8707e23SMichael Neuling===================== 251b8707e23SMichael Neuling 252b8707e23SMichael Neulingh/rfid mtmsrd quirk 253b8707e23SMichael Neuling------------------- 254b8707e23SMichael Neuling 255b8707e23SMichael NeulingAs defined in the ISA, rfid has a quirk which is useful in early 256b8707e23SMichael Neulingexception handling. When in a userspace transaction and we enter the 257b8707e23SMichael Neulingkernel via some exception, MSR will end up as TM=0 and TS=01 (ie. TM 258b8707e23SMichael Neulingoff but TM suspended). Regularly the kernel will want change bits in 259b8707e23SMichael Neulingthe MSR and will perform an rfid to do this. In this case rfid can 260b8707e23SMichael Neulinghave SRR0 TM = 0 and TS = 00 (ie. TM off and non transaction) and the 261b8707e23SMichael Neulingresulting MSR will retain TM = 0 and TS=01 from before (ie. stay in 262b8707e23SMichael Neulingsuspend). This is a quirk in the architecture as this would normally 263b8707e23SMichael Neulingbe a transition from TS=01 to TS=00 (ie. suspend -> non transactional) 264b8707e23SMichael Neulingwhich is an illegal transition. 265b8707e23SMichael Neuling 266b8707e23SMichael NeulingThis quirk is described the architecture in the definition of rfid 267b8707e23SMichael Neulingwith these lines: 268b8707e23SMichael Neuling 269b8707e23SMichael Neuling if (MSR 29:31 ¬ = 0b010 | SRR1 29:31 ¬ = 0b000) then 270b8707e23SMichael Neuling MSR 29:31 <- SRR1 29:31 271b8707e23SMichael Neuling 272b8707e23SMichael Neulinghrfid and mtmsrd have the same quirk. 273b8707e23SMichael Neuling 274*f8b42777SHe YingThe Linux kernel uses this quirk in its early exception handling. 275