14d2e26a3SMauro Carvalho Chehab============================
24d2e26a3SMauro Carvalho ChehabTransactional Memory support
34d2e26a3SMauro Carvalho Chehab============================
44d2e26a3SMauro Carvalho Chehab
54d2e26a3SMauro Carvalho ChehabPOWER kernel support for this feature is currently limited to supporting
64d2e26a3SMauro Carvalho Chehabits use by user programs.  It is not currently used by the kernel itself.
74d2e26a3SMauro Carvalho Chehab
84d2e26a3SMauro Carvalho ChehabThis file aims to sum up how it is supported by Linux and what behaviour you
94d2e26a3SMauro Carvalho Chehabcan expect from your user programs.
104d2e26a3SMauro Carvalho Chehab
114d2e26a3SMauro Carvalho Chehab
124d2e26a3SMauro Carvalho ChehabBasic overview
134d2e26a3SMauro Carvalho Chehab==============
144d2e26a3SMauro Carvalho Chehab
154d2e26a3SMauro Carvalho ChehabHardware Transactional Memory is supported on POWER8 processors, and is a
164d2e26a3SMauro Carvalho Chehabfeature that enables a different form of atomic memory access.  Several new
174d2e26a3SMauro Carvalho Chehabinstructions are presented to delimit transactions; transactions are
184d2e26a3SMauro Carvalho Chehabguaranteed to either complete atomically or roll back and undo any partial
194d2e26a3SMauro Carvalho Chehabchanges.
204d2e26a3SMauro Carvalho Chehab
214d2e26a3SMauro Carvalho ChehabA simple transaction looks like this::
224d2e26a3SMauro Carvalho Chehab
234d2e26a3SMauro Carvalho Chehab  begin_move_money:
244d2e26a3SMauro Carvalho Chehab    tbegin
254d2e26a3SMauro Carvalho Chehab    beq   abort_handler
264d2e26a3SMauro Carvalho Chehab
274d2e26a3SMauro Carvalho Chehab    ld    r4, SAVINGS_ACCT(r3)
284d2e26a3SMauro Carvalho Chehab    ld    r5, CURRENT_ACCT(r3)
294d2e26a3SMauro Carvalho Chehab    subi  r5, r5, 1
304d2e26a3SMauro Carvalho Chehab    addi  r4, r4, 1
314d2e26a3SMauro Carvalho Chehab    std   r4, SAVINGS_ACCT(r3)
324d2e26a3SMauro Carvalho Chehab    std   r5, CURRENT_ACCT(r3)
334d2e26a3SMauro Carvalho Chehab
344d2e26a3SMauro Carvalho Chehab    tend
354d2e26a3SMauro Carvalho Chehab
364d2e26a3SMauro Carvalho Chehab    b     continue
374d2e26a3SMauro Carvalho Chehab
384d2e26a3SMauro Carvalho Chehab  abort_handler:
394d2e26a3SMauro Carvalho Chehab    ... test for odd failures ...
404d2e26a3SMauro Carvalho Chehab
414d2e26a3SMauro Carvalho Chehab    /* Retry the transaction if it failed because it conflicted with
424d2e26a3SMauro Carvalho Chehab     * someone else: */
434d2e26a3SMauro Carvalho Chehab    b     begin_move_money
444d2e26a3SMauro Carvalho Chehab
454d2e26a3SMauro Carvalho Chehab
464d2e26a3SMauro Carvalho ChehabThe 'tbegin' instruction denotes the start point, and 'tend' the end point.
474d2e26a3SMauro Carvalho ChehabBetween these points the processor is in 'Transactional' state; any memory
484d2e26a3SMauro Carvalho Chehabreferences will complete in one go if there are no conflicts with other
494d2e26a3SMauro Carvalho Chehabtransactional or non-transactional accesses within the system.  In this
504d2e26a3SMauro Carvalho Chehabexample, the transaction completes as though it were normal straight-line code
514d2e26a3SMauro Carvalho ChehabIF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
524d2e26a3SMauro Carvalho Chehabatomic move of money from the current account to the savings account has been
534d2e26a3SMauro Carvalho Chehabperformed.  Even though the normal ld/std instructions are used (note no
544d2e26a3SMauro Carvalho Chehablwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
554d2e26a3SMauro Carvalho Chehabupdated, or neither will be updated.
564d2e26a3SMauro Carvalho Chehab
574d2e26a3SMauro Carvalho ChehabIf, in the meantime, there is a conflict with the locations accessed by the
584d2e26a3SMauro Carvalho Chehabtransaction, the transaction will be aborted by the CPU.  Register and memory
594d2e26a3SMauro Carvalho Chehabstate will roll back to that at the 'tbegin', and control will continue from
604d2e26a3SMauro Carvalho Chehab'tbegin+4'.  The branch to abort_handler will be taken this second time; the
614d2e26a3SMauro Carvalho Chehababort handler can check the cause of the failure, and retry.
624d2e26a3SMauro Carvalho Chehab
634d2e26a3SMauro Carvalho ChehabCheckpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
644d2e26a3SMauro Carvalho Chehaband a few other status/flag regs; see the ISA for details.
654d2e26a3SMauro Carvalho Chehab
664d2e26a3SMauro Carvalho ChehabCauses of transaction aborts
674d2e26a3SMauro Carvalho Chehab============================
684d2e26a3SMauro Carvalho Chehab
694d2e26a3SMauro Carvalho Chehab- Conflicts with cache lines used by other processors
704d2e26a3SMauro Carvalho Chehab- Signals
714d2e26a3SMauro Carvalho Chehab- Context switches
724d2e26a3SMauro Carvalho Chehab- See the ISA for full documentation of everything that will abort transactions.
734d2e26a3SMauro Carvalho Chehab
744d2e26a3SMauro Carvalho Chehab
754d2e26a3SMauro Carvalho ChehabSyscalls
764d2e26a3SMauro Carvalho Chehab========
774d2e26a3SMauro Carvalho Chehab
784d2e26a3SMauro Carvalho ChehabSyscalls made from within an active transaction will not be performed and the
794d2e26a3SMauro Carvalho Chehabtransaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
804d2e26a3SMauro Carvalho Chehab| TM_CAUSE_PERSISTENT.
814d2e26a3SMauro Carvalho Chehab
824d2e26a3SMauro Carvalho ChehabSyscalls made from within a suspended transaction are performed as normal and
834d2e26a3SMauro Carvalho Chehabthe transaction is not explicitly doomed by the kernel.  However, what the
844d2e26a3SMauro Carvalho Chehabkernel does to perform the syscall may result in the transaction being doomed
854d2e26a3SMauro Carvalho Chehabby the hardware.  The syscall is performed in suspended mode so any side
864d2e26a3SMauro Carvalho Chehabeffects will be persistent, independent of transaction success or failure.  No
874d2e26a3SMauro Carvalho Chehabguarantees are provided by the kernel about which syscalls will affect
884d2e26a3SMauro Carvalho Chehabtransaction success.
894d2e26a3SMauro Carvalho Chehab
904d2e26a3SMauro Carvalho ChehabCare must be taken when relying on syscalls to abort during active transactions
914d2e26a3SMauro Carvalho Chehabif the calls are made via a library.  Libraries may cache values (which may
924d2e26a3SMauro Carvalho Chehabgive the appearance of success) or perform operations that cause transaction
934d2e26a3SMauro Carvalho Chehabfailure before entering the kernel (which may produce different failure codes).
944d2e26a3SMauro Carvalho ChehabExamples are glibc's getpid() and lazy symbol resolution.
954d2e26a3SMauro Carvalho Chehab
964d2e26a3SMauro Carvalho Chehab
974d2e26a3SMauro Carvalho ChehabSignals
984d2e26a3SMauro Carvalho Chehab=======
994d2e26a3SMauro Carvalho Chehab
1004d2e26a3SMauro Carvalho ChehabDelivery of signals (both sync and async) during transactions provides a second
1014d2e26a3SMauro Carvalho Chehabthread state (ucontext/mcontext) to represent the second transactional register
1024d2e26a3SMauro Carvalho Chehabstate.  Signal delivery 'treclaim's to capture both register states, so signals
1034d2e26a3SMauro Carvalho Chehababort transactions.  The usual ucontext_t passed to the signal handler
1044d2e26a3SMauro Carvalho Chehabrepresents the checkpointed/original register state; the signal appears to have
1054d2e26a3SMauro Carvalho Chehabarisen at 'tbegin+4'.
1064d2e26a3SMauro Carvalho Chehab
1074d2e26a3SMauro Carvalho ChehabIf the sighandler ucontext has uc_link set, a second ucontext has been
1084d2e26a3SMauro Carvalho Chehabdelivered.  For future compatibility the MSR.TS field should be checked to
1094d2e26a3SMauro Carvalho Chehabdetermine the transactional state -- if so, the second ucontext in uc->uc_link
1104d2e26a3SMauro Carvalho Chehabrepresents the active transactional registers at the point of the signal.
1114d2e26a3SMauro Carvalho Chehab
1124d2e26a3SMauro Carvalho ChehabFor 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS
1134d2e26a3SMauro Carvalho Chehabfield shows the transactional mode.
1144d2e26a3SMauro Carvalho Chehab
1154d2e26a3SMauro Carvalho ChehabFor 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32
1164d2e26a3SMauro Carvalho Chehabbits are stored in the MSR of the second ucontext, i.e. in
1174d2e26a3SMauro Carvalho Chehabuc->uc_link->uc_mcontext.regs->msr.  The top word contains the transactional
1184d2e26a3SMauro Carvalho Chehabstate TS.
1194d2e26a3SMauro Carvalho Chehab
1204d2e26a3SMauro Carvalho ChehabHowever, basic signal handlers don't need to be aware of transactions
1214d2e26a3SMauro Carvalho Chehaband simply returning from the handler will deal with things correctly:
1224d2e26a3SMauro Carvalho Chehab
1234d2e26a3SMauro Carvalho ChehabTransaction-aware signal handlers can read the transactional register state
1244d2e26a3SMauro Carvalho Chehabfrom the second ucontext.  This will be necessary for crash handlers to
1254d2e26a3SMauro Carvalho Chehabdetermine, for example, the address of the instruction causing the SIGSEGV.
1264d2e26a3SMauro Carvalho Chehab
1274d2e26a3SMauro Carvalho ChehabExample signal handler::
1284d2e26a3SMauro Carvalho Chehab
1294d2e26a3SMauro Carvalho Chehab    void crash_handler(int sig, siginfo_t *si, void *uc)
1304d2e26a3SMauro Carvalho Chehab    {
1314d2e26a3SMauro Carvalho Chehab      ucontext_t *ucp = uc;
1324d2e26a3SMauro Carvalho Chehab      ucontext_t *transactional_ucp = ucp->uc_link;
1334d2e26a3SMauro Carvalho Chehab
1344d2e26a3SMauro Carvalho Chehab      if (ucp_link) {
1354d2e26a3SMauro Carvalho Chehab        u64 msr = ucp->uc_mcontext.regs->msr;
1364d2e26a3SMauro Carvalho Chehab        /* May have transactional ucontext! */
1374d2e26a3SMauro Carvalho Chehab  #ifndef __powerpc64__
1384d2e26a3SMauro Carvalho Chehab        msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32;
1394d2e26a3SMauro Carvalho Chehab  #endif
1404d2e26a3SMauro Carvalho Chehab        if (MSR_TM_ACTIVE(msr)) {
1414d2e26a3SMauro Carvalho Chehab           /* Yes, we crashed during a transaction.  Oops. */
1424d2e26a3SMauro Carvalho Chehab   fprintf(stderr, "Transaction to be restarted at 0x%llx, but "
1434d2e26a3SMauro Carvalho Chehab                           "crashy instruction was at 0x%llx\n",
1444d2e26a3SMauro Carvalho Chehab                           ucp->uc_mcontext.regs->nip,
1454d2e26a3SMauro Carvalho Chehab                           transactional_ucp->uc_mcontext.regs->nip);
1464d2e26a3SMauro Carvalho Chehab        }
1474d2e26a3SMauro Carvalho Chehab      }
1484d2e26a3SMauro Carvalho Chehab
1494d2e26a3SMauro Carvalho Chehab      fix_the_problem(ucp->dar);
1504d2e26a3SMauro Carvalho Chehab    }
1514d2e26a3SMauro Carvalho Chehab
1524d2e26a3SMauro Carvalho ChehabWhen in an active transaction that takes a signal, we need to be careful with
1534d2e26a3SMauro Carvalho Chehabthe stack.  It's possible that the stack has moved back up after the tbegin.
1544d2e26a3SMauro Carvalho ChehabThe obvious case here is when the tbegin is called inside a function that
1554d2e26a3SMauro Carvalho Chehabreturns before a tend.  In this case, the stack is part of the checkpointed
1564d2e26a3SMauro Carvalho Chehabtransactional memory state.  If we write over this non transactionally or in
1574d2e26a3SMauro Carvalho Chehabsuspend, we are in trouble because if we get a tm abort, the program counter and
1584d2e26a3SMauro Carvalho Chehabstack pointer will be back at the tbegin but our in memory stack won't be valid
1594d2e26a3SMauro Carvalho Chehabanymore.
1604d2e26a3SMauro Carvalho Chehab
1614d2e26a3SMauro Carvalho ChehabTo avoid this, when taking a signal in an active transaction, we need to use
1624d2e26a3SMauro Carvalho Chehabthe stack pointer from the checkpointed state, rather than the speculated
1634d2e26a3SMauro Carvalho Chehabstate.  This ensures that the signal context (written tm suspended) will be
1644d2e26a3SMauro Carvalho Chehabwritten below the stack required for the rollback.  The transaction is aborted
1654d2e26a3SMauro Carvalho Chehabbecause of the treclaim, so any memory written between the tbegin and the
1664d2e26a3SMauro Carvalho Chehabsignal will be rolled back anyway.
1674d2e26a3SMauro Carvalho Chehab
1684d2e26a3SMauro Carvalho ChehabFor signals taken in non-TM or suspended mode, we use the
1694d2e26a3SMauro Carvalho Chehabnormal/non-checkpointed stack pointer.
1704d2e26a3SMauro Carvalho Chehab
1714d2e26a3SMauro Carvalho ChehabAny transaction initiated inside a sighandler and suspended on return
1724d2e26a3SMauro Carvalho Chehabfrom the sighandler to the kernel will get reclaimed and discarded.
1734d2e26a3SMauro Carvalho Chehab
1744d2e26a3SMauro Carvalho ChehabFailure cause codes used by kernel
1754d2e26a3SMauro Carvalho Chehab==================================
1764d2e26a3SMauro Carvalho Chehab
1774d2e26a3SMauro Carvalho ChehabThese are defined in <asm/reg.h>, and distinguish different reasons why the
1784d2e26a3SMauro Carvalho Chehabkernel aborted a transaction:
1794d2e26a3SMauro Carvalho Chehab
1804d2e26a3SMauro Carvalho Chehab ====================== ================================
1814d2e26a3SMauro Carvalho Chehab TM_CAUSE_RESCHED       Thread was rescheduled.
1824d2e26a3SMauro Carvalho Chehab TM_CAUSE_TLBI          Software TLB invalid.
1834d2e26a3SMauro Carvalho Chehab TM_CAUSE_FAC_UNAV      FP/VEC/VSX unavailable trap.
1844d2e26a3SMauro Carvalho Chehab TM_CAUSE_SYSCALL       Syscall from active transaction.
1854d2e26a3SMauro Carvalho Chehab TM_CAUSE_SIGNAL        Signal delivered.
1864d2e26a3SMauro Carvalho Chehab TM_CAUSE_MISC          Currently unused.
1874d2e26a3SMauro Carvalho Chehab TM_CAUSE_ALIGNMENT     Alignment fault.
1884d2e26a3SMauro Carvalho Chehab TM_CAUSE_EMULATE       Emulation that touched memory.
1894d2e26a3SMauro Carvalho Chehab ====================== ================================
1904d2e26a3SMauro Carvalho Chehab
1914d2e26a3SMauro Carvalho ChehabThese can be checked by the user program's abort handler as TEXASR[0:7].  If
192*f8b42777SHe Yingbit 7 is set, it indicates that the error is considered persistent.  For example
1934d2e26a3SMauro Carvalho Chehaba TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.
1944d2e26a3SMauro Carvalho Chehab
1954d2e26a3SMauro Carvalho ChehabGDB
1964d2e26a3SMauro Carvalho Chehab===
1974d2e26a3SMauro Carvalho Chehab
1984d2e26a3SMauro Carvalho ChehabGDB and ptrace are not currently TM-aware.  If one stops during a transaction,
1994d2e26a3SMauro Carvalho Chehabit looks like the transaction has just started (the checkpointed state is
2004d2e26a3SMauro Carvalho Chehabpresented).  The transaction cannot then be continued and will take the failure
2014d2e26a3SMauro Carvalho Chehabhandler route.  Furthermore, the transactional 2nd register state will be
2024d2e26a3SMauro Carvalho Chehabinaccessible.  GDB can currently be used on programs using TM, but not sensibly
2034d2e26a3SMauro Carvalho Chehabin parts within transactions.
2044d2e26a3SMauro Carvalho Chehab
2054d2e26a3SMauro Carvalho ChehabPOWER9
2064d2e26a3SMauro Carvalho Chehab======
2074d2e26a3SMauro Carvalho Chehab
2084d2e26a3SMauro Carvalho ChehabTM on POWER9 has issues with storing the complete register state. This
2094d2e26a3SMauro Carvalho Chehabis described in this commit::
2104d2e26a3SMauro Carvalho Chehab
2114d2e26a3SMauro Carvalho Chehab    commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
2124d2e26a3SMauro Carvalho Chehab    Author: Paul Mackerras <paulus@ozlabs.org>
2134d2e26a3SMauro Carvalho Chehab    Date:   Wed Mar 21 21:32:01 2018 +1100
2144d2e26a3SMauro Carvalho Chehab    KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
2154d2e26a3SMauro Carvalho Chehab
2164d2e26a3SMauro Carvalho ChehabTo account for this different POWER9 chips have TM enabled in
2174d2e26a3SMauro Carvalho Chehabdifferent ways.
2184d2e26a3SMauro Carvalho Chehab
2194d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.01 and below, TM is disabled. ie
2204d2e26a3SMauro Carvalho ChehabHWCAP2[PPC_FEATURE2_HTM] is not set.
2214d2e26a3SMauro Carvalho Chehab
2224d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.1 TM is configured by firmware to always abort a
2234d2e26a3SMauro Carvalho Chehabtransaction when tm suspend occurs. So tsuspend will cause a
2244d2e26a3SMauro Carvalho Chehabtransaction to be aborted and rolled back. Kernel exceptions will also
2254d2e26a3SMauro Carvalho Chehabcause the transaction to be aborted and rolled back and the exception
2264d2e26a3SMauro Carvalho Chehabwill not occur. If userspace constructs a sigcontext that enables TM
2274d2e26a3SMauro Carvalho Chehabsuspend, the sigcontext will be rejected by the kernel. This mode is
2284d2e26a3SMauro Carvalho Chehabadvertised to users with HWCAP2[PPC_FEATURE2_HTM_NO_SUSPEND] set.
2294d2e26a3SMauro Carvalho ChehabHWCAP2[PPC_FEATURE2_HTM] is not set in this mode.
2304d2e26a3SMauro Carvalho Chehab
2314d2e26a3SMauro Carvalho ChehabOn POWER9N DD2.2 and above, KVM and POWERVM emulate TM for guests (as
2324d2e26a3SMauro Carvalho Chehabdescribed in commit 4bb3c7a0208f), hence TM is enabled for guests
2334d2e26a3SMauro Carvalho Chehabie. HWCAP2[PPC_FEATURE2_HTM] is set for guest userspace. Guests that
2344d2e26a3SMauro Carvalho Chehabmakes heavy use of TM suspend (tsuspend or kernel suspend) will result
2354d2e26a3SMauro Carvalho Chehabin traps into the hypervisor and hence will suffer a performance
2364d2e26a3SMauro Carvalho Chehabdegradation. Host userspace has TM disabled
2374d2e26a3SMauro Carvalho Chehabie. HWCAP2[PPC_FEATURE2_HTM] is not set. (although we make enable it
2384d2e26a3SMauro Carvalho Chehabat some point in the future if we bring the emulation into host
2394d2e26a3SMauro Carvalho Chehabuserspace context switching).
2404d2e26a3SMauro Carvalho Chehab
2414d2e26a3SMauro Carvalho ChehabPOWER9C DD1.2 and above are only available with POWERVM and hence
2424d2e26a3SMauro Carvalho ChehabLinux only runs as a guest. On these systems TM is emulated like on
2434d2e26a3SMauro Carvalho ChehabPOWER9N DD2.2.
2444d2e26a3SMauro Carvalho Chehab
2454d2e26a3SMauro Carvalho ChehabGuest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and
2464d2e26a3SMauro Carvalho ChehabPOWER9C DD1.2. Since earlier POWER9 processors don't support TM
2474d2e26a3SMauro Carvalho Chehabemulation, migration from POWER8 to POWER9 is not supported there.
248b8707e23SMichael Neuling
249b8707e23SMichael NeulingKernel implementation
250b8707e23SMichael Neuling=====================
251b8707e23SMichael Neuling
252b8707e23SMichael Neulingh/rfid mtmsrd quirk
253b8707e23SMichael Neuling-------------------
254b8707e23SMichael Neuling
255b8707e23SMichael NeulingAs defined in the ISA, rfid has a quirk which is useful in early
256b8707e23SMichael Neulingexception handling. When in a userspace transaction and we enter the
257b8707e23SMichael Neulingkernel via some exception, MSR will end up as TM=0 and TS=01 (ie. TM
258b8707e23SMichael Neulingoff but TM suspended). Regularly the kernel will want change bits in
259b8707e23SMichael Neulingthe MSR and will perform an rfid to do this. In this case rfid can
260b8707e23SMichael Neulinghave SRR0 TM = 0 and TS = 00 (ie. TM off and non transaction) and the
261b8707e23SMichael Neulingresulting MSR will retain TM = 0 and TS=01 from before (ie. stay in
262b8707e23SMichael Neulingsuspend). This is a quirk in the architecture as this would normally
263b8707e23SMichael Neulingbe a transition from TS=01 to TS=00 (ie. suspend -> non transactional)
264b8707e23SMichael Neulingwhich is an illegal transition.
265b8707e23SMichael Neuling
266b8707e23SMichael NeulingThis quirk is described the architecture in the definition of rfid
267b8707e23SMichael Neulingwith these lines:
268b8707e23SMichael Neuling
269b8707e23SMichael Neuling  if (MSR 29:31 ¬ = 0b010 | SRR1 29:31 ¬ = 0b000) then
270b8707e23SMichael Neuling     MSR 29:31 <- SRR1 29:31
271b8707e23SMichael Neuling
272b8707e23SMichael Neulinghrfid and mtmsrd have the same quirk.
273b8707e23SMichael Neuling
274*f8b42777SHe YingThe Linux kernel uses this quirk in its early exception handling.
275