1.. SPDX-License-Identifier: GPL-2.0
2
3.. _kernel_hacking_locktypes:
4
5==========================
6Lock types and their rules
7==========================
8
9Introduction
10============
11
12The kernel provides a variety of locking primitives which can be divided
13into two categories:
14
15 - Sleeping locks
16 - Spinning locks
17
18This document conceptually describes these lock types and provides rules
19for their nesting, including the rules for use under PREEMPT_RT.
20
21
22Lock categories
23===============
24
25Sleeping locks
26--------------
27
28Sleeping locks can only be acquired in preemptible task context.
29
30Although implementations allow try_lock() from other contexts, it is
31necessary to carefully evaluate the safety of unlock() as well as of
32try_lock().  Furthermore, it is also necessary to evaluate the debugging
33versions of these primitives.  In short, don't acquire sleeping locks from
34other contexts unless there is no other option.
35
36Sleeping lock types:
37
38 - mutex
39 - rt_mutex
40 - semaphore
41 - rw_semaphore
42 - ww_mutex
43 - percpu_rw_semaphore
44
45On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
46
47 - spinlock_t
48 - rwlock_t
49
50Spinning locks
51--------------
52
53 - raw_spinlock_t
54 - bit spinlocks
55
56On non-PREEMPT_RT kernels, these lock types are also spinning locks:
57
58 - spinlock_t
59 - rwlock_t
60
61Spinning locks implicitly disable preemption and the lock / unlock functions
62can have suffixes which apply further protections:
63
64 ===================  ====================================================
65 _bh()                Disable / enable bottom halves (soft interrupts)
66 _irq()               Disable / enable interrupts
67 _irqsave/restore()   Save and disable / restore interrupt disabled state
68 ===================  ====================================================
69
70Owner semantics
71===============
72
73The aforementioned lock types except semaphores have strict owner
74semantics:
75
76  The context (task) that acquired the lock must release it.
77
78rw_semaphores have a special interface which allows non-owner release for
79readers.
80
81
82rtmutex
83=======
84
85RT-mutexes are mutexes with support for priority inheritance (PI).
86
87PI has limitations on non-PREEMPT_RT kernels due to preemption and
88interrupt disabled sections.
89
90PI clearly cannot preempt preemption-disabled or interrupt-disabled
91regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
92execute most such regions of code in preemptible task context, especially
93interrupt handlers and soft interrupts.  This conversion allows spinlock_t
94and rwlock_t to be implemented via RT-mutexes.
95
96
97semaphore
98=========
99
100semaphore is a counting semaphore implementation.
101
102Semaphores are often used for both serialization and waiting, but new use
103cases should instead use separate serialization and wait mechanisms, such
104as mutexes and completions.
105
106semaphores and PREEMPT_RT
107----------------------------
108
109PREEMPT_RT does not change the semaphore implementation because counting
110semaphores have no concept of owners, thus preventing PREEMPT_RT from
111providing priority inheritance for semaphores.  After all, an unknown
112owner cannot be boosted. As a consequence, blocking on semaphores can
113result in priority inversion.
114
115
116rw_semaphore
117============
118
119rw_semaphore is a multiple readers and single writer lock mechanism.
120
121On non-PREEMPT_RT kernels the implementation is fair, thus preventing
122writer starvation.
123
124rw_semaphore complies by default with the strict owner semantics, but there
125exist special-purpose interfaces that allow non-owner release for readers.
126These interfaces work independent of the kernel configuration.
127
128rw_semaphore and PREEMPT_RT
129---------------------------
130
131PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
132implementation, thus changing the fairness:
133
134 Because an rw_semaphore writer cannot grant its priority to multiple
135 readers, a preempted low-priority reader will continue holding its lock,
136 thus starving even high-priority writers.  In contrast, because readers
137 can grant their priority to a writer, a preempted low-priority writer will
138 have its priority boosted until it releases the lock, thus preventing that
139 writer from starving readers.
140
141
142raw_spinlock_t and spinlock_t
143=============================
144
145raw_spinlock_t
146--------------
147
148raw_spinlock_t is a strict spinning lock implementation regardless of the
149kernel configuration including PREEMPT_RT enabled kernels.
150
151raw_spinlock_t is a strict spinning lock implementation in all kernels,
152including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
153core code, low-level interrupt handling and places where disabling
154preemption or interrupts is required, for example, to safely access
155hardware state.  raw_spinlock_t can sometimes also be used when the
156critical section is tiny, thus avoiding RT-mutex overhead.
157
158spinlock_t
159----------
160
161The semantics of spinlock_t change with the state of PREEMPT_RT.
162
163On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
164exactly the same semantics.
165
166spinlock_t and PREEMPT_RT
167-------------------------
168
169On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
170based on rt_mutex which changes the semantics:
171
172 - Preemption is not disabled.
173
174 - The hard interrupt related suffixes for spin_lock / spin_unlock
175   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
176   interrupt disabled state.
177
178 - The soft interrupt related suffix (_bh()) still disables softirq
179   handlers.
180
181   Non-PREEMPT_RT kernels disable preemption to get this effect.
182
183   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
184   preemption disabled. The lock disables softirq handlers and also
185   prevents reentrancy due to task preemption.
186
187PREEMPT_RT kernels preserve all other spinlock_t semantics:
188
189 - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
190   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
191   disable migration, which ensures that pointers to per-CPU variables
192   remain valid even if the task is preempted.
193
194 - Task state is preserved across spinlock acquisition, ensuring that the
195   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
196   kernels leave task state untouched.  However, PREEMPT_RT must change
197   task state if the task blocks during acquisition.  Therefore, it saves
198   the current task state before blocking and the corresponding lock wakeup
199   restores it, as shown below::
200
201    task->state = TASK_INTERRUPTIBLE
202     lock()
203       block()
204         task->saved_state = task->state
205	 task->state = TASK_UNINTERRUPTIBLE
206	 schedule()
207					lock wakeup
208					  task->state = task->saved_state
209
210   Other types of wakeups would normally unconditionally set the task state
211   to RUNNING, but that does not work here because the task must remain
212   blocked until the lock becomes available.  Therefore, when a non-lock
213   wakeup attempts to awaken a task blocked waiting for a spinlock, it
214   instead sets the saved state to RUNNING.  Then, when the lock
215   acquisition completes, the lock wakeup sets the task state to the saved
216   state, in this case setting it to RUNNING::
217
218    task->state = TASK_INTERRUPTIBLE
219     lock()
220       block()
221         task->saved_state = task->state
222	 task->state = TASK_UNINTERRUPTIBLE
223	 schedule()
224					non lock wakeup
225					  task->saved_state = TASK_RUNNING
226
227					lock wakeup
228					  task->state = task->saved_state
229
230   This ensures that the real wakeup cannot be lost.
231
232
233rwlock_t
234========
235
236rwlock_t is a multiple readers and single writer lock mechanism.
237
238Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
239suffix rules of spinlock_t apply accordingly. The implementation is fair,
240thus preventing writer starvation.
241
242rwlock_t and PREEMPT_RT
243-----------------------
244
245PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
246implementation, thus changing semantics:
247
248 - All the spinlock_t changes also apply to rwlock_t.
249
250 - Because an rwlock_t writer cannot grant its priority to multiple
251   readers, a preempted low-priority reader will continue holding its lock,
252   thus starving even high-priority writers.  In contrast, because readers
253   can grant their priority to a writer, a preempted low-priority writer
254   will have its priority boosted until it releases the lock, thus
255   preventing that writer from starving readers.
256
257
258PREEMPT_RT caveats
259==================
260
261spinlock_t and rwlock_t
262-----------------------
263
264These changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
265have a few implications.  For example, on a non-PREEMPT_RT kernel the
266following code sequence works as expected::
267
268   local_irq_disable();
269   spin_lock(&lock);
270
271and is fully equivalent to::
272
273   spin_lock_irq(&lock);
274
275Same applies to rwlock_t and the _irqsave() suffix variants.
276
277On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
278fully preemptible context.  Instead, use spin_lock_irq() or
279spin_lock_irqsave() and their unlock counterparts.  In cases where the
280interrupt disabling and locking must remain separate, PREEMPT_RT offers a
281local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
282allowing things like per-CPU interrupt disabled locks to be acquired.
283However, this approach should be used only where absolutely necessary.
284
285
286raw_spinlock_t
287--------------
288
289Acquiring a raw_spinlock_t disables preemption and possibly also
290interrupts, so the critical section must avoid acquiring a regular
291spinlock_t or rwlock_t, for example, the critical section must avoid
292allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
293works perfectly::
294
295  raw_spin_lock(&lock);
296  p = kmalloc(sizeof(*p), GFP_ATOMIC);
297
298But this code fails on PREEMPT_RT kernels because the memory allocator is
299fully preemptible and therefore cannot be invoked from truly atomic
300contexts.  However, it is perfectly fine to invoke the memory allocator
301while holding normal non-raw spinlocks because they do not disable
302preemption on PREEMPT_RT kernels::
303
304  spin_lock(&lock);
305  p = kmalloc(sizeof(*p), GFP_ATOMIC);
306
307
308bit spinlocks
309-------------
310
311PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
312small to accommodate an RT-mutex.  Therefore, the semantics of bit
313spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
314caveats also apply to bit spinlocks.
315
316Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
317using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
318usage-site changes are not needed for the spinlock_t substitution.
319Instead, conditionals in header files and the core locking implemementation
320enable the compiler to do the substitution transparently.
321
322
323Lock type nesting rules
324=======================
325
326The most basic rules are:
327
328  - Lock types of the same lock category (sleeping, spinning) can nest
329    arbitrarily as long as they respect the general lock ordering rules to
330    prevent deadlocks.
331
332  - Sleeping lock types cannot nest inside spinning lock types.
333
334  - Spinning lock types can nest inside sleeping lock types.
335
336These constraints apply both in PREEMPT_RT and otherwise.
337
338The fact that PREEMPT_RT changes the lock category of spinlock_t and
339rwlock_t from spinning to sleeping means that they cannot be acquired while
340holding a raw spinlock.  This results in the following nesting ordering:
341
342  1) Sleeping locks
343  2) spinlock_t and rwlock_t
344  3) raw_spinlock_t and bit spinlocks
345
346Lockdep will complain if these constraints are violated, both in
347PREEMPT_RT and otherwise.
348