1387b1468SMauro Carvalho Chehab======================= 2387b1468SMauro Carvalho ChehabGeneric Mutex Subsystem 3387b1468SMauro Carvalho Chehab======================= 4387b1468SMauro Carvalho Chehab 5387b1468SMauro Carvalho Chehabstarted by Ingo Molnar <mingo@redhat.com> 6387b1468SMauro Carvalho Chehab 7387b1468SMauro Carvalho Chehabupdated by Davidlohr Bueso <davidlohr@hp.com> 8387b1468SMauro Carvalho Chehab 9387b1468SMauro Carvalho ChehabWhat are mutexes? 10387b1468SMauro Carvalho Chehab----------------- 11387b1468SMauro Carvalho Chehab 12387b1468SMauro Carvalho ChehabIn the Linux kernel, mutexes refer to a particular locking primitive 13387b1468SMauro Carvalho Chehabthat enforces serialization on shared memory systems, and not only to 14387b1468SMauro Carvalho Chehabthe generic term referring to 'mutual exclusion' found in academia 15387b1468SMauro Carvalho Chehabor similar theoretical text books. Mutexes are sleeping locks which 16387b1468SMauro Carvalho Chehabbehave similarly to binary semaphores, and were introduced in 2006[1] 17387b1468SMauro Carvalho Chehabas an alternative to these. This new data structure provided a number 18387b1468SMauro Carvalho Chehabof advantages, including simpler interfaces, and at that time smaller 19387b1468SMauro Carvalho Chehabcode (see Disadvantages). 20387b1468SMauro Carvalho Chehab 210288199eSAlexander A. Klimov[1] https://lwn.net/Articles/164802/ 22387b1468SMauro Carvalho Chehab 23387b1468SMauro Carvalho ChehabImplementation 24387b1468SMauro Carvalho Chehab-------------- 25387b1468SMauro Carvalho Chehab 26387b1468SMauro Carvalho ChehabMutexes are represented by 'struct mutex', defined in include/linux/mutex.h 27387b1468SMauro Carvalho Chehaband implemented in kernel/locking/mutex.c. These locks use an atomic variable 28387b1468SMauro Carvalho Chehab(->owner) to keep track of the lock state during its lifetime. Field owner 29387b1468SMauro Carvalho Chehabactually contains `struct task_struct *` to the current lock owner and it is 30387b1468SMauro Carvalho Chehabtherefore NULL if not currently owned. Since task_struct pointers are aligned 31*7d64394bSRandy Dunlapto at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., 32387b1468SMauro Carvalho Chehabif waiter list is non-empty). In its most basic form it also includes a 33387b1468SMauro Carvalho Chehabwait-queue and a spinlock that serializes access to it. Furthermore, 34387b1468SMauro Carvalho ChehabCONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described 35387b1468SMauro Carvalho Chehabbelow in (ii). 36387b1468SMauro Carvalho Chehab 37387b1468SMauro Carvalho ChehabWhen acquiring a mutex, there are three possible paths that can be 38387b1468SMauro Carvalho Chehabtaken, depending on the state of the lock: 39387b1468SMauro Carvalho Chehab 40387b1468SMauro Carvalho Chehab(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with 41387b1468SMauro Carvalho Chehab the current task. This only works in the uncontended case (cmpxchg() checks 42387b1468SMauro Carvalho Chehab against 0UL, so all 3 state bits above have to be 0). If the lock is 43387b1468SMauro Carvalho Chehab contended it goes to the next possible path. 44387b1468SMauro Carvalho Chehab 45387b1468SMauro Carvalho Chehab(ii) midpath: aka optimistic spinning, tries to spin for acquisition 46387b1468SMauro Carvalho Chehab while the lock owner is running and there are no other tasks ready 47387b1468SMauro Carvalho Chehab to run that have higher priority (need_resched). The rationale is 48387b1468SMauro Carvalho Chehab that if the lock owner is running, it is likely to release the lock 49387b1468SMauro Carvalho Chehab soon. The mutex spinners are queued up using MCS lock so that only 50387b1468SMauro Carvalho Chehab one spinner can compete for the mutex. 51387b1468SMauro Carvalho Chehab 52387b1468SMauro Carvalho Chehab The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock 53387b1468SMauro Carvalho Chehab with the desirable properties of being fair and with each cpu trying 54387b1468SMauro Carvalho Chehab to acquire the lock spinning on a local variable. It avoids expensive 55387b1468SMauro Carvalho Chehab cacheline bouncing that common test-and-set spinlock implementations 56387b1468SMauro Carvalho Chehab incur. An MCS-like lock is specially tailored for optimistic spinning 57387b1468SMauro Carvalho Chehab for sleeping lock implementation. An important feature of the customized 58387b1468SMauro Carvalho Chehab MCS lock is that it has the extra property that spinners are able to exit 59387b1468SMauro Carvalho Chehab the MCS spinlock queue when they need to reschedule. This further helps 60387b1468SMauro Carvalho Chehab avoid situations where MCS spinners that need to reschedule would continue 61387b1468SMauro Carvalho Chehab waiting to spin on mutex owner, only to go directly to slowpath upon 62387b1468SMauro Carvalho Chehab obtaining the MCS lock. 63387b1468SMauro Carvalho Chehab 64387b1468SMauro Carvalho Chehab 65387b1468SMauro Carvalho Chehab(iii) slowpath: last resort, if the lock is still unable to be acquired, 66387b1468SMauro Carvalho Chehab the task is added to the wait-queue and sleeps until woken up by the 67387b1468SMauro Carvalho Chehab unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. 68387b1468SMauro Carvalho Chehab 69387b1468SMauro Carvalho ChehabWhile formally kernel mutexes are sleepable locks, it is path (ii) that 70387b1468SMauro Carvalho Chehabmakes them more practically a hybrid type. By simply not interrupting a 71387b1468SMauro Carvalho Chehabtask and busy-waiting for a few cycles instead of immediately sleeping, 72387b1468SMauro Carvalho Chehabthe performance of this lock has been seen to significantly improve a 73387b1468SMauro Carvalho Chehabnumber of workloads. Note that this technique is also used for rw-semaphores. 74387b1468SMauro Carvalho Chehab 75387b1468SMauro Carvalho ChehabSemantics 76387b1468SMauro Carvalho Chehab--------- 77387b1468SMauro Carvalho Chehab 78387b1468SMauro Carvalho ChehabThe mutex subsystem checks and enforces the following rules: 79387b1468SMauro Carvalho Chehab 80387b1468SMauro Carvalho Chehab - Only one task can hold the mutex at a time. 81387b1468SMauro Carvalho Chehab - Only the owner can unlock the mutex. 82387b1468SMauro Carvalho Chehab - Multiple unlocks are not permitted. 83387b1468SMauro Carvalho Chehab - Recursive locking/unlocking is not permitted. 84387b1468SMauro Carvalho Chehab - A mutex must only be initialized via the API (see below). 85387b1468SMauro Carvalho Chehab - A task may not exit with a mutex held. 86387b1468SMauro Carvalho Chehab - Memory areas where held locks reside must not be freed. 87387b1468SMauro Carvalho Chehab - Held mutexes must not be reinitialized. 88387b1468SMauro Carvalho Chehab - Mutexes may not be used in hardware or software interrupt 89387b1468SMauro Carvalho Chehab contexts such as tasklets and timers. 90387b1468SMauro Carvalho Chehab 91387b1468SMauro Carvalho ChehabThese semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. 92387b1468SMauro Carvalho ChehabIn addition, the mutex debugging code also implements a number of other 93387b1468SMauro Carvalho Chehabfeatures that make lock debugging easier and faster: 94387b1468SMauro Carvalho Chehab 95387b1468SMauro Carvalho Chehab - Uses symbolic names of mutexes, whenever they are printed 96387b1468SMauro Carvalho Chehab in debug output. 97387b1468SMauro Carvalho Chehab - Point-of-acquire tracking, symbolic lookup of function names, 98387b1468SMauro Carvalho Chehab list of all locks held in the system, printout of them. 99387b1468SMauro Carvalho Chehab - Owner tracking. 100387b1468SMauro Carvalho Chehab - Detects self-recursing locks and prints out all relevant info. 101387b1468SMauro Carvalho Chehab - Detects multi-task circular deadlocks and prints out all affected 102387b1468SMauro Carvalho Chehab locks and tasks (and only those tasks). 103387b1468SMauro Carvalho Chehab 104387b1468SMauro Carvalho Chehab 105387b1468SMauro Carvalho ChehabInterfaces 106387b1468SMauro Carvalho Chehab---------- 107387b1468SMauro Carvalho ChehabStatically define the mutex:: 108387b1468SMauro Carvalho Chehab 109387b1468SMauro Carvalho Chehab DEFINE_MUTEX(name); 110387b1468SMauro Carvalho Chehab 111387b1468SMauro Carvalho ChehabDynamically initialize the mutex:: 112387b1468SMauro Carvalho Chehab 113387b1468SMauro Carvalho Chehab mutex_init(mutex); 114387b1468SMauro Carvalho Chehab 115387b1468SMauro Carvalho ChehabAcquire the mutex, uninterruptible:: 116387b1468SMauro Carvalho Chehab 117387b1468SMauro Carvalho Chehab void mutex_lock(struct mutex *lock); 118387b1468SMauro Carvalho Chehab void mutex_lock_nested(struct mutex *lock, unsigned int subclass); 119387b1468SMauro Carvalho Chehab int mutex_trylock(struct mutex *lock); 120387b1468SMauro Carvalho Chehab 121387b1468SMauro Carvalho ChehabAcquire the mutex, interruptible:: 122387b1468SMauro Carvalho Chehab 123387b1468SMauro Carvalho Chehab int mutex_lock_interruptible_nested(struct mutex *lock, 124387b1468SMauro Carvalho Chehab unsigned int subclass); 125387b1468SMauro Carvalho Chehab int mutex_lock_interruptible(struct mutex *lock); 126387b1468SMauro Carvalho Chehab 127387b1468SMauro Carvalho ChehabAcquire the mutex, interruptible, if dec to 0:: 128387b1468SMauro Carvalho Chehab 129387b1468SMauro Carvalho Chehab int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); 130387b1468SMauro Carvalho Chehab 131387b1468SMauro Carvalho ChehabUnlock the mutex:: 132387b1468SMauro Carvalho Chehab 133387b1468SMauro Carvalho Chehab void mutex_unlock(struct mutex *lock); 134387b1468SMauro Carvalho Chehab 135387b1468SMauro Carvalho ChehabTest if the mutex is taken:: 136387b1468SMauro Carvalho Chehab 137387b1468SMauro Carvalho Chehab int mutex_is_locked(struct mutex *lock); 138387b1468SMauro Carvalho Chehab 139387b1468SMauro Carvalho ChehabDisadvantages 140387b1468SMauro Carvalho Chehab------------- 141387b1468SMauro Carvalho Chehab 142387b1468SMauro Carvalho ChehabUnlike its original design and purpose, 'struct mutex' is among the largest 143387b1468SMauro Carvalho Chehablocks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore' 144387b1468SMauro Carvalho Chehabis 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU 145387b1468SMauro Carvalho Chehabcache and memory footprint. 146387b1468SMauro Carvalho Chehab 147387b1468SMauro Carvalho ChehabWhen to use mutexes 148387b1468SMauro Carvalho Chehab------------------- 149387b1468SMauro Carvalho Chehab 150387b1468SMauro Carvalho ChehabUnless the strict semantics of mutexes are unsuitable and/or the critical 151387b1468SMauro Carvalho Chehabregion prevents the lock from being shared, always prefer them to any other 152387b1468SMauro Carvalho Chehablocking primitive. 153