1e790a4ceSJonathan Corbet====================================== 2e790a4ceSJonathan Corbetvlocks for Bare-Metal Mutual Exclusion 3e790a4ceSJonathan Corbet====================================== 4e790a4ceSJonathan Corbet 5e790a4ceSJonathan CorbetVoting Locks, or "vlocks" provide a simple low-level mutual exclusion 6e790a4ceSJonathan Corbetmechanism, with reasonable but minimal requirements on the memory 7e790a4ceSJonathan Corbetsystem. 8e790a4ceSJonathan Corbet 9e790a4ceSJonathan CorbetThese are intended to be used to coordinate critical activity among CPUs 10e790a4ceSJonathan Corbetwhich are otherwise non-coherent, in situations where the hardware 11e790a4ceSJonathan Corbetprovides no other mechanism to support this and ordinary spinlocks 12e790a4ceSJonathan Corbetcannot be used. 13e790a4ceSJonathan Corbet 14e790a4ceSJonathan Corbet 15e790a4ceSJonathan Corbetvlocks make use of the atomicity provided by the memory system for 16e790a4ceSJonathan Corbetwrites to a single memory location. To arbitrate, every CPU "votes for 17e790a4ceSJonathan Corbetitself", by storing a unique number to a common memory location. The 18e790a4ceSJonathan Corbetfinal value seen in that memory location when all the votes have been 19e790a4ceSJonathan Corbetcast identifies the winner. 20e790a4ceSJonathan Corbet 21e790a4ceSJonathan CorbetIn order to make sure that the election produces an unambiguous result 22e790a4ceSJonathan Corbetin finite time, a CPU will only enter the election in the first place if 23e790a4ceSJonathan Corbetno winner has been chosen and the election does not appear to have 24e790a4ceSJonathan Corbetstarted yet. 25e790a4ceSJonathan Corbet 26e790a4ceSJonathan Corbet 27e790a4ceSJonathan CorbetAlgorithm 28e790a4ceSJonathan Corbet--------- 29e790a4ceSJonathan Corbet 30e790a4ceSJonathan CorbetThe easiest way to explain the vlocks algorithm is with some pseudo-code:: 31e790a4ceSJonathan Corbet 32e790a4ceSJonathan Corbet 33e790a4ceSJonathan Corbet int currently_voting[NR_CPUS] = { 0, }; 34e790a4ceSJonathan Corbet int last_vote = -1; /* no votes yet */ 35e790a4ceSJonathan Corbet 36e790a4ceSJonathan Corbet bool vlock_trylock(int this_cpu) 37e790a4ceSJonathan Corbet { 38e790a4ceSJonathan Corbet /* signal our desire to vote */ 39e790a4ceSJonathan Corbet currently_voting[this_cpu] = 1; 40e790a4ceSJonathan Corbet if (last_vote != -1) { 41e790a4ceSJonathan Corbet /* someone already volunteered himself */ 42e790a4ceSJonathan Corbet currently_voting[this_cpu] = 0; 43e790a4ceSJonathan Corbet return false; /* not ourself */ 44e790a4ceSJonathan Corbet } 45e790a4ceSJonathan Corbet 46e790a4ceSJonathan Corbet /* let's suggest ourself */ 47e790a4ceSJonathan Corbet last_vote = this_cpu; 48e790a4ceSJonathan Corbet currently_voting[this_cpu] = 0; 49e790a4ceSJonathan Corbet 50e790a4ceSJonathan Corbet /* then wait until everyone else is done voting */ 51e790a4ceSJonathan Corbet for_each_cpu(i) { 52e790a4ceSJonathan Corbet while (currently_voting[i] != 0) 53e790a4ceSJonathan Corbet /* wait */; 54e790a4ceSJonathan Corbet } 55e790a4ceSJonathan Corbet 56e790a4ceSJonathan Corbet /* result */ 57e790a4ceSJonathan Corbet if (last_vote == this_cpu) 58e790a4ceSJonathan Corbet return true; /* we won */ 59e790a4ceSJonathan Corbet return false; 60e790a4ceSJonathan Corbet } 61e790a4ceSJonathan Corbet 62e790a4ceSJonathan Corbet bool vlock_unlock(void) 63e790a4ceSJonathan Corbet { 64e790a4ceSJonathan Corbet last_vote = -1; 65e790a4ceSJonathan Corbet } 66e790a4ceSJonathan Corbet 67e790a4ceSJonathan Corbet 68e790a4ceSJonathan CorbetThe currently_voting[] array provides a way for the CPUs to determine 69e790a4ceSJonathan Corbetwhether an election is in progress, and plays a role analogous to the 70e790a4ceSJonathan Corbet"entering" array in Lamport's bakery algorithm [1]. 71e790a4ceSJonathan Corbet 72e790a4ceSJonathan CorbetHowever, once the election has started, the underlying memory system 73e790a4ceSJonathan Corbetatomicity is used to pick the winner. This avoids the need for a static 74e790a4ceSJonathan Corbetpriority rule to act as a tie-breaker, or any counters which could 75e790a4ceSJonathan Corbetoverflow. 76e790a4ceSJonathan Corbet 77e790a4ceSJonathan CorbetAs long as the last_vote variable is globally visible to all CPUs, it 78e790a4ceSJonathan Corbetwill contain only one value that won't change once every CPU has cleared 79e790a4ceSJonathan Corbetits currently_voting flag. 80e790a4ceSJonathan Corbet 81e790a4ceSJonathan Corbet 82e790a4ceSJonathan CorbetFeatures and limitations 83e790a4ceSJonathan Corbet------------------------ 84e790a4ceSJonathan Corbet 85e790a4ceSJonathan Corbet * vlocks are not intended to be fair. In the contended case, it is the 86e790a4ceSJonathan Corbet _last_ CPU which attempts to get the lock which will be most likely 87e790a4ceSJonathan Corbet to win. 88e790a4ceSJonathan Corbet 89e790a4ceSJonathan Corbet vlocks are therefore best suited to situations where it is necessary 90e790a4ceSJonathan Corbet to pick a unique winner, but it does not matter which CPU actually 91e790a4ceSJonathan Corbet wins. 92e790a4ceSJonathan Corbet 93e790a4ceSJonathan Corbet * Like other similar mechanisms, vlocks will not scale well to a large 94e790a4ceSJonathan Corbet number of CPUs. 95e790a4ceSJonathan Corbet 96e790a4ceSJonathan Corbet vlocks can be cascaded in a voting hierarchy to permit better scaling 97e790a4ceSJonathan Corbet if necessary, as in the following hypothetical example for 4096 CPUs:: 98e790a4ceSJonathan Corbet 99e790a4ceSJonathan Corbet /* first level: local election */ 100e790a4ceSJonathan Corbet my_town = towns[(this_cpu >> 4) & 0xf]; 101e790a4ceSJonathan Corbet I_won = vlock_trylock(my_town, this_cpu & 0xf); 102e790a4ceSJonathan Corbet if (I_won) { 103e790a4ceSJonathan Corbet /* we won the town election, let's go for the state */ 104e790a4ceSJonathan Corbet my_state = states[(this_cpu >> 8) & 0xf]; 105e790a4ceSJonathan Corbet I_won = vlock_lock(my_state, this_cpu & 0xf)); 106e790a4ceSJonathan Corbet if (I_won) { 107e790a4ceSJonathan Corbet /* and so on */ 108e790a4ceSJonathan Corbet I_won = vlock_lock(the_whole_country, this_cpu & 0xf]; 109e790a4ceSJonathan Corbet if (I_won) { 110e790a4ceSJonathan Corbet /* ... */ 111e790a4ceSJonathan Corbet } 112e790a4ceSJonathan Corbet vlock_unlock(the_whole_country); 113e790a4ceSJonathan Corbet } 114e790a4ceSJonathan Corbet vlock_unlock(my_state); 115e790a4ceSJonathan Corbet } 116e790a4ceSJonathan Corbet vlock_unlock(my_town); 117e790a4ceSJonathan Corbet 118e790a4ceSJonathan Corbet 119e790a4ceSJonathan CorbetARM implementation 120e790a4ceSJonathan Corbet------------------ 121e790a4ceSJonathan Corbet 122e790a4ceSJonathan CorbetThe current ARM implementation [2] contains some optimisations beyond 123e790a4ceSJonathan Corbetthe basic algorithm: 124e790a4ceSJonathan Corbet 125e790a4ceSJonathan Corbet * By packing the members of the currently_voting array close together, 126e790a4ceSJonathan Corbet we can read the whole array in one transaction (providing the number 127e790a4ceSJonathan Corbet of CPUs potentially contending the lock is small enough). This 128e790a4ceSJonathan Corbet reduces the number of round-trips required to external memory. 129e790a4ceSJonathan Corbet 130e790a4ceSJonathan Corbet In the ARM implementation, this means that we can use a single load 131e790a4ceSJonathan Corbet and comparison:: 132e790a4ceSJonathan Corbet 133e790a4ceSJonathan Corbet LDR Rt, [Rn] 134e790a4ceSJonathan Corbet CMP Rt, #0 135e790a4ceSJonathan Corbet 136e790a4ceSJonathan Corbet ...in place of code equivalent to:: 137e790a4ceSJonathan Corbet 138e790a4ceSJonathan Corbet LDRB Rt, [Rn] 139e790a4ceSJonathan Corbet CMP Rt, #0 140e790a4ceSJonathan Corbet LDRBEQ Rt, [Rn, #1] 141e790a4ceSJonathan Corbet CMPEQ Rt, #0 142e790a4ceSJonathan Corbet LDRBEQ Rt, [Rn, #2] 143e790a4ceSJonathan Corbet CMPEQ Rt, #0 144e790a4ceSJonathan Corbet LDRBEQ Rt, [Rn, #3] 145e790a4ceSJonathan Corbet CMPEQ Rt, #0 146e790a4ceSJonathan Corbet 147e790a4ceSJonathan Corbet This cuts down on the fast-path latency, as well as potentially 148e790a4ceSJonathan Corbet reducing bus contention in contended cases. 149e790a4ceSJonathan Corbet 150e790a4ceSJonathan Corbet The optimisation relies on the fact that the ARM memory system 151e790a4ceSJonathan Corbet guarantees coherency between overlapping memory accesses of 152e790a4ceSJonathan Corbet different sizes, similarly to many other architectures. Note that 153e790a4ceSJonathan Corbet we do not care which element of currently_voting appears in which 154e790a4ceSJonathan Corbet bits of Rt, so there is no need to worry about endianness in this 155e790a4ceSJonathan Corbet optimisation. 156e790a4ceSJonathan Corbet 157e790a4ceSJonathan Corbet If there are too many CPUs to read the currently_voting array in 158*d56b699dSBjorn Helgaas one transaction then multiple transactions are still required. The 159e790a4ceSJonathan Corbet implementation uses a simple loop of word-sized loads for this 160e790a4ceSJonathan Corbet case. The number of transactions is still fewer than would be 161e790a4ceSJonathan Corbet required if bytes were loaded individually. 162e790a4ceSJonathan Corbet 163e790a4ceSJonathan Corbet 164e790a4ceSJonathan Corbet In principle, we could aggregate further by using LDRD or LDM, but 165e790a4ceSJonathan Corbet to keep the code simple this was not attempted in the initial 166e790a4ceSJonathan Corbet implementation. 167e790a4ceSJonathan Corbet 168e790a4ceSJonathan Corbet 169e790a4ceSJonathan Corbet * vlocks are currently only used to coordinate between CPUs which are 170e790a4ceSJonathan Corbet unable to enable their caches yet. This means that the 171e790a4ceSJonathan Corbet implementation removes many of the barriers which would be required 172e790a4ceSJonathan Corbet when executing the algorithm in cached memory. 173e790a4ceSJonathan Corbet 174e790a4ceSJonathan Corbet packing of the currently_voting array does not work with cached 175e790a4ceSJonathan Corbet memory unless all CPUs contending the lock are cache-coherent, due 176e790a4ceSJonathan Corbet to cache writebacks from one CPU clobbering values written by other 177e790a4ceSJonathan Corbet CPUs. (Though if all the CPUs are cache-coherent, you should be 178e790a4ceSJonathan Corbet probably be using proper spinlocks instead anyway). 179e790a4ceSJonathan Corbet 180e790a4ceSJonathan Corbet 181e790a4ceSJonathan Corbet * The "no votes yet" value used for the last_vote variable is 0 (not 182e790a4ceSJonathan Corbet -1 as in the pseudocode). This allows statically-allocated vlocks 183e790a4ceSJonathan Corbet to be implicitly initialised to an unlocked state simply by putting 184e790a4ceSJonathan Corbet them in .bss. 185e790a4ceSJonathan Corbet 186e790a4ceSJonathan Corbet An offset is added to each CPU's ID for the purpose of setting this 187e790a4ceSJonathan Corbet variable, so that no CPU uses the value 0 for its ID. 188e790a4ceSJonathan Corbet 189e790a4ceSJonathan Corbet 190e790a4ceSJonathan CorbetColophon 191e790a4ceSJonathan Corbet-------- 192e790a4ceSJonathan Corbet 193e790a4ceSJonathan CorbetOriginally created and documented by Dave Martin for Linaro Limited, for 194e790a4ceSJonathan Corbetuse in ARM-based big.LITTLE platforms, with review and input gratefully 195e790a4ceSJonathan Corbetreceived from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for 196e790a4ceSJonathan Corbetgrabbing most of this text out of the relevant mail thread and writing 197e790a4ceSJonathan Corbetup the pseudocode. 198e790a4ceSJonathan Corbet 199e790a4ceSJonathan CorbetCopyright (C) 2012-2013 Linaro Limited 200e790a4ceSJonathan CorbetDistributed under the terms of Version 2 of the GNU General Public 201e790a4ceSJonathan CorbetLicense, as defined in linux/COPYING. 202e790a4ceSJonathan Corbet 203e790a4ceSJonathan Corbet 204e790a4ceSJonathan CorbetReferences 205e790a4ceSJonathan Corbet---------- 206e790a4ceSJonathan Corbet 207e790a4ceSJonathan Corbet[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming 208e790a4ceSJonathan Corbet Problem", Communications of the ACM 17, 8 (August 1974), 453-455. 209e790a4ceSJonathan Corbet 210e790a4ceSJonathan Corbet https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm 211e790a4ceSJonathan Corbet 212e790a4ceSJonathan Corbet[2] linux/arch/arm/common/vlock.S, www.kernel.org. 213