1.. SPDX-License-Identifier: GPL-2.0 2 3.. _kernel_hacking_locktypes: 4 5========================== 6Lock types and their rules 7========================== 8 9Introduction 10============ 11 12The kernel provides a variety of locking primitives which can be divided 13into two categories: 14 15 - Sleeping locks 16 - Spinning locks 17 18This document conceptually describes these lock types and provides rules 19for their nesting, including the rules for use under PREEMPT_RT. 20 21 22Lock categories 23=============== 24 25Sleeping locks 26-------------- 27 28Sleeping locks can only be acquired in preemptible task context. 29 30Although implementations allow try_lock() from other contexts, it is 31necessary to carefully evaluate the safety of unlock() as well as of 32try_lock(). Furthermore, it is also necessary to evaluate the debugging 33versions of these primitives. In short, don't acquire sleeping locks from 34other contexts unless there is no other option. 35 36Sleeping lock types: 37 38 - mutex 39 - rt_mutex 40 - semaphore 41 - rw_semaphore 42 - ww_mutex 43 - percpu_rw_semaphore 44 45On PREEMPT_RT kernels, these lock types are converted to sleeping locks: 46 47 - spinlock_t 48 - rwlock_t 49 50Spinning locks 51-------------- 52 53 - raw_spinlock_t 54 - bit spinlocks 55 56On non-PREEMPT_RT kernels, these lock types are also spinning locks: 57 58 - spinlock_t 59 - rwlock_t 60 61Spinning locks implicitly disable preemption and the lock / unlock functions 62can have suffixes which apply further protections: 63 64 =================== ==================================================== 65 _bh() Disable / enable bottom halves (soft interrupts) 66 _irq() Disable / enable interrupts 67 _irqsave/restore() Save and disable / restore interrupt disabled state 68 =================== ==================================================== 69 70Owner semantics 71=============== 72 73The aforementioned lock types except semaphores have strict owner 74semantics: 75 76 The context (task) that acquired the lock must release it. 77 78rw_semaphores have a special interface which allows non-owner release for 79readers. 80 81 82rtmutex 83======= 84 85RT-mutexes are mutexes with support for priority inheritance (PI). 86 87PI has limitations on non-PREEMPT_RT kernels due to preemption and 88interrupt disabled sections. 89 90PI clearly cannot preempt preemption-disabled or interrupt-disabled 91regions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernels 92execute most such regions of code in preemptible task context, especially 93interrupt handlers and soft interrupts. This conversion allows spinlock_t 94and rwlock_t to be implemented via RT-mutexes. 95 96 97semaphore 98========= 99 100semaphore is a counting semaphore implementation. 101 102Semaphores are often used for both serialization and waiting, but new use 103cases should instead use separate serialization and wait mechanisms, such 104as mutexes and completions. 105 106semaphores and PREEMPT_RT 107---------------------------- 108 109PREEMPT_RT does not change the semaphore implementation because counting 110semaphores have no concept of owners, thus preventing PREEMPT_RT from 111providing priority inheritance for semaphores. After all, an unknown 112owner cannot be boosted. As a consequence, blocking on semaphores can 113result in priority inversion. 114 115 116rw_semaphore 117============ 118 119rw_semaphore is a multiple readers and single writer lock mechanism. 120 121On non-PREEMPT_RT kernels the implementation is fair, thus preventing 122writer starvation. 123 124rw_semaphore complies by default with the strict owner semantics, but there 125exist special-purpose interfaces that allow non-owner release for readers. 126These interfaces work independent of the kernel configuration. 127 128rw_semaphore and PREEMPT_RT 129--------------------------- 130 131PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based 132implementation, thus changing the fairness: 133 134 Because an rw_semaphore writer cannot grant its priority to multiple 135 readers, a preempted low-priority reader will continue holding its lock, 136 thus starving even high-priority writers. In contrast, because readers 137 can grant their priority to a writer, a preempted low-priority writer will 138 have its priority boosted until it releases the lock, thus preventing that 139 writer from starving readers. 140 141 142raw_spinlock_t and spinlock_t 143============================= 144 145raw_spinlock_t 146-------------- 147 148raw_spinlock_t is a strict spinning lock implementation regardless of the 149kernel configuration including PREEMPT_RT enabled kernels. 150 151raw_spinlock_t is a strict spinning lock implementation in all kernels, 152including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical 153core code, low-level interrupt handling and places where disabling 154preemption or interrupts is required, for example, to safely access 155hardware state. raw_spinlock_t can sometimes also be used when the 156critical section is tiny, thus avoiding RT-mutex overhead. 157 158spinlock_t 159---------- 160 161The semantics of spinlock_t change with the state of PREEMPT_RT. 162 163On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has 164exactly the same semantics. 165 166spinlock_t and PREEMPT_RT 167------------------------- 168 169On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation 170based on rt_mutex which changes the semantics: 171 172 - Preemption is not disabled. 173 174 - The hard interrupt related suffixes for spin_lock / spin_unlock 175 operations (_irq, _irqsave / _irqrestore) do not affect the CPU's 176 interrupt disabled state. 177 178 - The soft interrupt related suffix (_bh()) still disables softirq 179 handlers. 180 181 Non-PREEMPT_RT kernels disable preemption to get this effect. 182 183 PREEMPT_RT kernels use a per-CPU lock for serialization which keeps 184 preemption disabled. The lock disables softirq handlers and also 185 prevents reentrancy due to task preemption. 186 187PREEMPT_RT kernels preserve all other spinlock_t semantics: 188 189 - Tasks holding a spinlock_t do not migrate. Non-PREEMPT_RT kernels 190 avoid migration by disabling preemption. PREEMPT_RT kernels instead 191 disable migration, which ensures that pointers to per-CPU variables 192 remain valid even if the task is preempted. 193 194 - Task state is preserved across spinlock acquisition, ensuring that the 195 task-state rules apply to all kernel configurations. Non-PREEMPT_RT 196 kernels leave task state untouched. However, PREEMPT_RT must change 197 task state if the task blocks during acquisition. Therefore, it saves 198 the current task state before blocking and the corresponding lock wakeup 199 restores it, as shown below:: 200 201 task->state = TASK_INTERRUPTIBLE 202 lock() 203 block() 204 task->saved_state = task->state 205 task->state = TASK_UNINTERRUPTIBLE 206 schedule() 207 lock wakeup 208 task->state = task->saved_state 209 210 Other types of wakeups would normally unconditionally set the task state 211 to RUNNING, but that does not work here because the task must remain 212 blocked until the lock becomes available. Therefore, when a non-lock 213 wakeup attempts to awaken a task blocked waiting for a spinlock, it 214 instead sets the saved state to RUNNING. Then, when the lock 215 acquisition completes, the lock wakeup sets the task state to the saved 216 state, in this case setting it to RUNNING:: 217 218 task->state = TASK_INTERRUPTIBLE 219 lock() 220 block() 221 task->saved_state = task->state 222 task->state = TASK_UNINTERRUPTIBLE 223 schedule() 224 non lock wakeup 225 task->saved_state = TASK_RUNNING 226 227 lock wakeup 228 task->state = task->saved_state 229 230 This ensures that the real wakeup cannot be lost. 231 232 233rwlock_t 234======== 235 236rwlock_t is a multiple readers and single writer lock mechanism. 237 238Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the 239suffix rules of spinlock_t apply accordingly. The implementation is fair, 240thus preventing writer starvation. 241 242rwlock_t and PREEMPT_RT 243----------------------- 244 245PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based 246implementation, thus changing semantics: 247 248 - All the spinlock_t changes also apply to rwlock_t. 249 250 - Because an rwlock_t writer cannot grant its priority to multiple 251 readers, a preempted low-priority reader will continue holding its lock, 252 thus starving even high-priority writers. In contrast, because readers 253 can grant their priority to a writer, a preempted low-priority writer 254 will have its priority boosted until it releases the lock, thus 255 preventing that writer from starving readers. 256 257 258PREEMPT_RT caveats 259================== 260 261spinlock_t and rwlock_t 262----------------------- 263 264These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels 265have a few implications. For example, on a non-PREEMPT_RT kernel the 266following code sequence works as expected:: 267 268 local_irq_disable(); 269 spin_lock(&lock); 270 271and is fully equivalent to:: 272 273 spin_lock_irq(&lock); 274 275Same applies to rwlock_t and the _irqsave() suffix variants. 276 277On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a 278fully preemptible context. Instead, use spin_lock_irq() or 279spin_lock_irqsave() and their unlock counterparts. In cases where the 280interrupt disabling and locking must remain separate, PREEMPT_RT offers a 281local_lock mechanism. Acquiring the local_lock pins the task to a CPU, 282allowing things like per-CPU interrupt disabled locks to be acquired. 283However, this approach should be used only where absolutely necessary. 284 285 286raw_spinlock_t 287-------------- 288 289Acquiring a raw_spinlock_t disables preemption and possibly also 290interrupts, so the critical section must avoid acquiring a regular 291spinlock_t or rwlock_t, for example, the critical section must avoid 292allocating memory. Thus, on a non-PREEMPT_RT kernel the following code 293works perfectly:: 294 295 raw_spin_lock(&lock); 296 p = kmalloc(sizeof(*p), GFP_ATOMIC); 297 298But this code fails on PREEMPT_RT kernels because the memory allocator is 299fully preemptible and therefore cannot be invoked from truly atomic 300contexts. However, it is perfectly fine to invoke the memory allocator 301while holding normal non-raw spinlocks because they do not disable 302preemption on PREEMPT_RT kernels:: 303 304 spin_lock(&lock); 305 p = kmalloc(sizeof(*p), GFP_ATOMIC); 306 307 308bit spinlocks 309------------- 310 311PREEMPT_RT cannot substitute bit spinlocks because a single bit is too 312small to accommodate an RT-mutex. Therefore, the semantics of bit 313spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t 314caveats also apply to bit spinlocks. 315 316Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT 317using conditional (#ifdef'ed) code changes at the usage site. In contrast, 318usage-site changes are not needed for the spinlock_t substitution. 319Instead, conditionals in header files and the core locking implemementation 320enable the compiler to do the substitution transparently. 321 322 323Lock type nesting rules 324======================= 325 326The most basic rules are: 327 328 - Lock types of the same lock category (sleeping, spinning) can nest 329 arbitrarily as long as they respect the general lock ordering rules to 330 prevent deadlocks. 331 332 - Sleeping lock types cannot nest inside spinning lock types. 333 334 - Spinning lock types can nest inside sleeping lock types. 335 336These constraints apply both in PREEMPT_RT and otherwise. 337 338The fact that PREEMPT_RT changes the lock category of spinlock_t and 339rwlock_t from spinning to sleeping means that they cannot be acquired while 340holding a raw spinlock. This results in the following nesting ordering: 341 342 1) Sleeping locks 343 2) spinlock_t and rwlock_t 344 3) raw_spinlock_t and bit spinlocks 345 346Lockdep will complain if these constraints are violated, both in 347PREEMPT_RT and otherwise. 348