xref: /openbmc/linux/Documentation/locking/mutex-design.rst (revision 4b4193256c8d3bc3a5397b5cd9494c2ad386317d)
1387b1468SMauro Carvalho Chehab=======================
2387b1468SMauro Carvalho ChehabGeneric Mutex Subsystem
3387b1468SMauro Carvalho Chehab=======================
4387b1468SMauro Carvalho Chehab
5387b1468SMauro Carvalho Chehabstarted by Ingo Molnar <mingo@redhat.com>
6387b1468SMauro Carvalho Chehab
7387b1468SMauro Carvalho Chehabupdated by Davidlohr Bueso <davidlohr@hp.com>
8387b1468SMauro Carvalho Chehab
9387b1468SMauro Carvalho ChehabWhat are mutexes?
10387b1468SMauro Carvalho Chehab-----------------
11387b1468SMauro Carvalho Chehab
12387b1468SMauro Carvalho ChehabIn the Linux kernel, mutexes refer to a particular locking primitive
13387b1468SMauro Carvalho Chehabthat enforces serialization on shared memory systems, and not only to
14387b1468SMauro Carvalho Chehabthe generic term referring to 'mutual exclusion' found in academia
15387b1468SMauro Carvalho Chehabor similar theoretical text books. Mutexes are sleeping locks which
16387b1468SMauro Carvalho Chehabbehave similarly to binary semaphores, and were introduced in 2006[1]
17387b1468SMauro Carvalho Chehabas an alternative to these. This new data structure provided a number
18387b1468SMauro Carvalho Chehabof advantages, including simpler interfaces, and at that time smaller
19387b1468SMauro Carvalho Chehabcode (see Disadvantages).
20387b1468SMauro Carvalho Chehab
210288199eSAlexander A. Klimov[1] https://lwn.net/Articles/164802/
22387b1468SMauro Carvalho Chehab
23387b1468SMauro Carvalho ChehabImplementation
24387b1468SMauro Carvalho Chehab--------------
25387b1468SMauro Carvalho Chehab
26387b1468SMauro Carvalho ChehabMutexes are represented by 'struct mutex', defined in include/linux/mutex.h
27387b1468SMauro Carvalho Chehaband implemented in kernel/locking/mutex.c. These locks use an atomic variable
28387b1468SMauro Carvalho Chehab(->owner) to keep track of the lock state during its lifetime.  Field owner
29387b1468SMauro Carvalho Chehabactually contains `struct task_struct *` to the current lock owner and it is
30387b1468SMauro Carvalho Chehabtherefore NULL if not currently owned. Since task_struct pointers are aligned
31*7d64394bSRandy Dunlapto at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
32387b1468SMauro Carvalho Chehabif waiter list is non-empty).  In its most basic form it also includes a
33387b1468SMauro Carvalho Chehabwait-queue and a spinlock that serializes access to it. Furthermore,
34387b1468SMauro Carvalho ChehabCONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
35387b1468SMauro Carvalho Chehabbelow in (ii).
36387b1468SMauro Carvalho Chehab
37387b1468SMauro Carvalho ChehabWhen acquiring a mutex, there are three possible paths that can be
38387b1468SMauro Carvalho Chehabtaken, depending on the state of the lock:
39387b1468SMauro Carvalho Chehab
40387b1468SMauro Carvalho Chehab(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with
41387b1468SMauro Carvalho Chehab    the current task. This only works in the uncontended case (cmpxchg() checks
42387b1468SMauro Carvalho Chehab    against 0UL, so all 3 state bits above have to be 0). If the lock is
43387b1468SMauro Carvalho Chehab    contended it goes to the next possible path.
44387b1468SMauro Carvalho Chehab
45387b1468SMauro Carvalho Chehab(ii) midpath: aka optimistic spinning, tries to spin for acquisition
46387b1468SMauro Carvalho Chehab     while the lock owner is running and there are no other tasks ready
47387b1468SMauro Carvalho Chehab     to run that have higher priority (need_resched). The rationale is
48387b1468SMauro Carvalho Chehab     that if the lock owner is running, it is likely to release the lock
49387b1468SMauro Carvalho Chehab     soon. The mutex spinners are queued up using MCS lock so that only
50387b1468SMauro Carvalho Chehab     one spinner can compete for the mutex.
51387b1468SMauro Carvalho Chehab
52387b1468SMauro Carvalho Chehab     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
53387b1468SMauro Carvalho Chehab     with the desirable properties of being fair and with each cpu trying
54387b1468SMauro Carvalho Chehab     to acquire the lock spinning on a local variable. It avoids expensive
55387b1468SMauro Carvalho Chehab     cacheline bouncing that common test-and-set spinlock implementations
56387b1468SMauro Carvalho Chehab     incur. An MCS-like lock is specially tailored for optimistic spinning
57387b1468SMauro Carvalho Chehab     for sleeping lock implementation. An important feature of the customized
58387b1468SMauro Carvalho Chehab     MCS lock is that it has the extra property that spinners are able to exit
59387b1468SMauro Carvalho Chehab     the MCS spinlock queue when they need to reschedule. This further helps
60387b1468SMauro Carvalho Chehab     avoid situations where MCS spinners that need to reschedule would continue
61387b1468SMauro Carvalho Chehab     waiting to spin on mutex owner, only to go directly to slowpath upon
62387b1468SMauro Carvalho Chehab     obtaining the MCS lock.
63387b1468SMauro Carvalho Chehab
64387b1468SMauro Carvalho Chehab
65387b1468SMauro Carvalho Chehab(iii) slowpath: last resort, if the lock is still unable to be acquired,
66387b1468SMauro Carvalho Chehab      the task is added to the wait-queue and sleeps until woken up by the
67387b1468SMauro Carvalho Chehab      unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
68387b1468SMauro Carvalho Chehab
69387b1468SMauro Carvalho ChehabWhile formally kernel mutexes are sleepable locks, it is path (ii) that
70387b1468SMauro Carvalho Chehabmakes them more practically a hybrid type. By simply not interrupting a
71387b1468SMauro Carvalho Chehabtask and busy-waiting for a few cycles instead of immediately sleeping,
72387b1468SMauro Carvalho Chehabthe performance of this lock has been seen to significantly improve a
73387b1468SMauro Carvalho Chehabnumber of workloads. Note that this technique is also used for rw-semaphores.
74387b1468SMauro Carvalho Chehab
75387b1468SMauro Carvalho ChehabSemantics
76387b1468SMauro Carvalho Chehab---------
77387b1468SMauro Carvalho Chehab
78387b1468SMauro Carvalho ChehabThe mutex subsystem checks and enforces the following rules:
79387b1468SMauro Carvalho Chehab
80387b1468SMauro Carvalho Chehab    - Only one task can hold the mutex at a time.
81387b1468SMauro Carvalho Chehab    - Only the owner can unlock the mutex.
82387b1468SMauro Carvalho Chehab    - Multiple unlocks are not permitted.
83387b1468SMauro Carvalho Chehab    - Recursive locking/unlocking is not permitted.
84387b1468SMauro Carvalho Chehab    - A mutex must only be initialized via the API (see below).
85387b1468SMauro Carvalho Chehab    - A task may not exit with a mutex held.
86387b1468SMauro Carvalho Chehab    - Memory areas where held locks reside must not be freed.
87387b1468SMauro Carvalho Chehab    - Held mutexes must not be reinitialized.
88387b1468SMauro Carvalho Chehab    - Mutexes may not be used in hardware or software interrupt
89387b1468SMauro Carvalho Chehab      contexts such as tasklets and timers.
90387b1468SMauro Carvalho Chehab
91387b1468SMauro Carvalho ChehabThese semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
92387b1468SMauro Carvalho ChehabIn addition, the mutex debugging code also implements a number of other
93387b1468SMauro Carvalho Chehabfeatures that make lock debugging easier and faster:
94387b1468SMauro Carvalho Chehab
95387b1468SMauro Carvalho Chehab    - Uses symbolic names of mutexes, whenever they are printed
96387b1468SMauro Carvalho Chehab      in debug output.
97387b1468SMauro Carvalho Chehab    - Point-of-acquire tracking, symbolic lookup of function names,
98387b1468SMauro Carvalho Chehab      list of all locks held in the system, printout of them.
99387b1468SMauro Carvalho Chehab    - Owner tracking.
100387b1468SMauro Carvalho Chehab    - Detects self-recursing locks and prints out all relevant info.
101387b1468SMauro Carvalho Chehab    - Detects multi-task circular deadlocks and prints out all affected
102387b1468SMauro Carvalho Chehab      locks and tasks (and only those tasks).
103387b1468SMauro Carvalho Chehab
104387b1468SMauro Carvalho Chehab
105387b1468SMauro Carvalho ChehabInterfaces
106387b1468SMauro Carvalho Chehab----------
107387b1468SMauro Carvalho ChehabStatically define the mutex::
108387b1468SMauro Carvalho Chehab
109387b1468SMauro Carvalho Chehab   DEFINE_MUTEX(name);
110387b1468SMauro Carvalho Chehab
111387b1468SMauro Carvalho ChehabDynamically initialize the mutex::
112387b1468SMauro Carvalho Chehab
113387b1468SMauro Carvalho Chehab   mutex_init(mutex);
114387b1468SMauro Carvalho Chehab
115387b1468SMauro Carvalho ChehabAcquire the mutex, uninterruptible::
116387b1468SMauro Carvalho Chehab
117387b1468SMauro Carvalho Chehab   void mutex_lock(struct mutex *lock);
118387b1468SMauro Carvalho Chehab   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
119387b1468SMauro Carvalho Chehab   int  mutex_trylock(struct mutex *lock);
120387b1468SMauro Carvalho Chehab
121387b1468SMauro Carvalho ChehabAcquire the mutex, interruptible::
122387b1468SMauro Carvalho Chehab
123387b1468SMauro Carvalho Chehab   int mutex_lock_interruptible_nested(struct mutex *lock,
124387b1468SMauro Carvalho Chehab				       unsigned int subclass);
125387b1468SMauro Carvalho Chehab   int mutex_lock_interruptible(struct mutex *lock);
126387b1468SMauro Carvalho Chehab
127387b1468SMauro Carvalho ChehabAcquire the mutex, interruptible, if dec to 0::
128387b1468SMauro Carvalho Chehab
129387b1468SMauro Carvalho Chehab   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
130387b1468SMauro Carvalho Chehab
131387b1468SMauro Carvalho ChehabUnlock the mutex::
132387b1468SMauro Carvalho Chehab
133387b1468SMauro Carvalho Chehab   void mutex_unlock(struct mutex *lock);
134387b1468SMauro Carvalho Chehab
135387b1468SMauro Carvalho ChehabTest if the mutex is taken::
136387b1468SMauro Carvalho Chehab
137387b1468SMauro Carvalho Chehab   int mutex_is_locked(struct mutex *lock);
138387b1468SMauro Carvalho Chehab
139387b1468SMauro Carvalho ChehabDisadvantages
140387b1468SMauro Carvalho Chehab-------------
141387b1468SMauro Carvalho Chehab
142387b1468SMauro Carvalho ChehabUnlike its original design and purpose, 'struct mutex' is among the largest
143387b1468SMauro Carvalho Chehablocks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore'
144387b1468SMauro Carvalho Chehabis 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU
145387b1468SMauro Carvalho Chehabcache and memory footprint.
146387b1468SMauro Carvalho Chehab
147387b1468SMauro Carvalho ChehabWhen to use mutexes
148387b1468SMauro Carvalho Chehab-------------------
149387b1468SMauro Carvalho Chehab
150387b1468SMauro Carvalho ChehabUnless the strict semantics of mutexes are unsuitable and/or the critical
151387b1468SMauro Carvalho Chehabregion prevents the lock from being shared, always prefer them to any other
152387b1468SMauro Carvalho Chehablocking primitive.
153