1DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt) 2=================================================== 3 4QEMU often uses reference counts to track data structures that are being 5accessed and should not be freed. For example, a loop that invoke 6callbacks like this is not safe: 7 8 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 9 if (ioh->revents & G_IO_OUT) { 10 ioh->fd_write(ioh->opaque); 11 } 12 } 13 14QLIST_FOREACH_SAFE protects against deletion of the current node (ioh) 15by stashing away its "next" pointer. However, ioh->fd_write could 16actually delete the next node from the list. The simplest way to 17avoid this is to mark the node as deleted, and remove it from the 18list in the above loop: 19 20 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 21 if (ioh->deleted) { 22 QLIST_REMOVE(ioh, next); 23 g_free(ioh); 24 } else { 25 if (ioh->revents & G_IO_OUT) { 26 ioh->fd_write(ioh->opaque); 27 } 28 } 29 } 30 31If however this loop must also be reentrant, i.e. it is possible that 32ioh->fd_write invokes the loop again, some kind of counting is needed: 33 34 walking_handlers++; 35 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 36 if (ioh->deleted) { 37 if (walking_handlers == 1) { 38 QLIST_REMOVE(ioh, next); 39 g_free(ioh); 40 } 41 } else { 42 if (ioh->revents & G_IO_OUT) { 43 ioh->fd_write(ioh->opaque); 44 } 45 } 46 } 47 walking_handlers--; 48 49One may think of using the RCU primitives, rcu_read_lock() and 50rcu_read_unlock(); effectively, the RCU nesting count would take 51the place of the walking_handlers global variable. Indeed, 52reference counting and RCU have similar purposes, but their usage in 53general is complementary: 54 55- reference counting is fine-grained and limited to a single data 56 structure; RCU delays reclamation of *all* RCU-protected data 57 structures; 58 59- reference counting works even in the presence of code that keeps 60 a reference for a long time; RCU critical sections in principle 61 should be kept short; 62 63- reference counting is often applied to code that is not thread-safe 64 but is reentrant; in fact, usage of reference counting in QEMU predates 65 the introduction of threads by many years. RCU is generally used to 66 protect readers from other threads freeing memory after concurrent 67 modifications to a data structure. 68 69- reclaiming data can be done by a separate thread in the case of RCU; 70 this can improve performance, but also delay reclamation undesirably. 71 With reference counting, reclamation is deterministic. 72 73This file documents QemuLockCnt, an abstraction for using reference 74counting in code that has to be both thread-safe and reentrant. 75 76 77QemuLockCnt concepts 78-------------------- 79 80A QemuLockCnt comprises both a counter and a mutex; it has primitives 81to increment and decrement the counter, and to take and release the 82mutex. The counter notes how many visits to the data structures are 83taking place (the visits could be from different threads, or there could 84be multiple reentrant visits from the same thread). The basic rules 85governing the counter/mutex pair then are the following: 86 87- Data protected by the QemuLockCnt must not be freed unless the 88 counter is zero and the mutex is taken. 89 90- A new visit cannot be started while the counter is zero and the 91 mutex is taken. 92 93Most of the time, the mutex protects all writes to the data structure, 94not just frees, though there could be cases where this is not necessary. 95 96Reads, instead, can be done without taking the mutex, as long as the 97readers and writers use the same macros that are used for RCU, for 98example atomic_rcu_read, atomic_rcu_set, QLIST_FOREACH_RCU, etc. This is 99because the reads are done outside a lock and a set or QLIST_INSERT_HEAD 100can happen concurrently with the read. The RCU API ensures that the 101processor and the compiler see all required memory barriers. 102 103This could be implemented simply by protecting the counter with the 104mutex, for example: 105 106 // (1) 107 qemu_mutex_lock(&walking_handlers_mutex); 108 walking_handlers++; 109 qemu_mutex_unlock(&walking_handlers_mutex); 110 111 ... 112 113 // (2) 114 qemu_mutex_lock(&walking_handlers_mutex); 115 if (--walking_handlers == 0) { 116 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 117 if (ioh->deleted) { 118 QLIST_REMOVE(ioh, next); 119 g_free(ioh); 120 } 121 } 122 } 123 qemu_mutex_unlock(&walking_handlers_mutex); 124 125Here, no frees can happen in the code represented by the ellipsis. 126If another thread is executing critical section (2), that part of 127the code cannot be entered, because the thread will not be able 128to increment the walking_handlers variable. And of course 129during the visit any other thread will see a nonzero value for 130walking_handlers, as in the single-threaded code. 131 132Note that it is possible for multiple concurrent accesses to delay 133the cleanup arbitrarily; in other words, for the walking_handlers 134counter to never become zero. For this reason, this technique is 135more easily applicable if concurrent access to the structure is rare. 136 137However, critical sections are easy to forget since you have to do 138them for each modification of the counter. QemuLockCnt ensures that 139all modifications of the counter take the lock appropriately, and it 140can also be more efficient in two ways: 141 142- it avoids taking the lock for many operations (for example 143 incrementing the counter while it is non-zero); 144 145- on some platforms, one can implement QemuLockCnt to hold the lock 146 and the mutex in a single word, making the fast path no more expensive 147 than simply managing a counter using atomic operations (see 148 docs/devel/atomics.txt). This can be very helpful if concurrent access to 149 the data structure is expected to be rare. 150 151 152Using the same mutex for frees and writes can still incur some small 153inefficiencies; for example, a visit can never start if the counter is 154zero and the mutex is taken---even if the mutex is taken by a write, 155which in principle need not block a visit of the data structure. 156However, these are usually not a problem if any of the following 157assumptions are valid: 158 159- concurrent access is possible but rare 160 161- writes are rare 162 163- writes are frequent, but this kind of write (e.g. appending to a 164 list) has a very small critical section. 165 166For example, QEMU uses QemuLockCnt to manage an AioContext's list of 167bottom halves and file descriptor handlers. Modifications to the list 168of file descriptor handlers are rare. Creation of a new bottom half is 169frequent and can happen on a fast path; however: 1) it is almost never 170concurrent with a visit to the list of bottom halves; 2) it only has 171three instructions in the critical path, two assignments and a smp_wmb(). 172 173 174QemuLockCnt API 175--------------- 176 177The QemuLockCnt API is described in include/qemu/thread.h. 178 179 180QemuLockCnt usage 181----------------- 182 183This section explains the typical usage patterns for QemuLockCnt functions. 184 185Setting a variable to a non-NULL value can be done between 186qemu_lockcnt_lock and qemu_lockcnt_unlock: 187 188 qemu_lockcnt_lock(&xyz_lockcnt); 189 if (!xyz) { 190 new_xyz = g_new(XYZ, 1); 191 ... 192 atomic_rcu_set(&xyz, new_xyz); 193 } 194 qemu_lockcnt_unlock(&xyz_lockcnt); 195 196Accessing the value can be done between qemu_lockcnt_inc and 197qemu_lockcnt_dec: 198 199 qemu_lockcnt_inc(&xyz_lockcnt); 200 if (xyz) { 201 XYZ *p = atomic_rcu_read(&xyz); 202 ... 203 /* Accesses can now be done through "p". */ 204 } 205 qemu_lockcnt_dec(&xyz_lockcnt); 206 207Freeing the object can similarly use qemu_lockcnt_lock and 208qemu_lockcnt_unlock, but you also need to ensure that the count 209is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc 210takes the QemuLockCnt's lock, the count cannot become non-zero while 211the object is being freed. Freeing an object looks like this: 212 213 qemu_lockcnt_lock(&xyz_lockcnt); 214 if (!qemu_lockcnt_count(&xyz_lockcnt)) { 215 g_free(xyz); 216 xyz = NULL; 217 } 218 qemu_lockcnt_unlock(&xyz_lockcnt); 219 220If an object has to be freed right after a visit, you can combine 221the decrement, the locking and the check on count as follows: 222 223 qemu_lockcnt_inc(&xyz_lockcnt); 224 if (xyz) { 225 XYZ *p = atomic_rcu_read(&xyz); 226 ... 227 /* Accesses can now be done through "p". */ 228 } 229 if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) { 230 g_free(xyz); 231 xyz = NULL; 232 qemu_lockcnt_unlock(&xyz_lockcnt); 233 } 234 235QemuLockCnt can also be used to access a list as follows: 236 237 qemu_lockcnt_inc(&io_handlers_lockcnt); 238 QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) { 239 if (ioh->revents & G_IO_OUT) { 240 ioh->fd_write(ioh->opaque); 241 } 242 } 243 244 if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) { 245 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 246 if (ioh->deleted) { 247 QLIST_REMOVE(ioh, next); 248 g_free(ioh); 249 } 250 } 251 qemu_lockcnt_unlock(&io_handlers_lockcnt); 252 } 253 254Again, the RCU primitives are used because new items can be added to the 255list during the walk. QLIST_FOREACH_RCU ensures that the processor and 256the compiler see the appropriate memory barriers. 257 258An alternative pattern uses qemu_lockcnt_dec_if_lock: 259 260 qemu_lockcnt_inc(&io_handlers_lockcnt); 261 QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) { 262 if (ioh->deleted) { 263 if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) { 264 QLIST_REMOVE(ioh, next); 265 g_free(ioh); 266 qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt); 267 } 268 } else { 269 if (ioh->revents & G_IO_OUT) { 270 ioh->fd_write(ioh->opaque); 271 } 272 } 273 } 274 qemu_lockcnt_dec(&io_handlers_lockcnt); 275 276Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock, 277because there is no special task to do if the count goes from 1 to 0. 278