1Locked Counters (aka ``QemuLockCnt``) 2===================================== 3 4QEMU often uses reference counts to track data structures that are being 5accessed and should not be freed. For example, a loop that invoke 6callbacks like this is not safe:: 7 8 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 9 if (ioh->revents & G_IO_OUT) { 10 ioh->fd_write(ioh->opaque); 11 } 12 } 13 14``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``) 15by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could 16actually delete the next node from the list. The simplest way to 17avoid this is to mark the node as deleted, and remove it from the 18list in the above loop:: 19 20 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 21 if (ioh->deleted) { 22 QLIST_REMOVE(ioh, next); 23 g_free(ioh); 24 } else { 25 if (ioh->revents & G_IO_OUT) { 26 ioh->fd_write(ioh->opaque); 27 } 28 } 29 } 30 31If however this loop must also be reentrant, i.e. it is possible that 32``ioh->fd_write`` invokes the loop again, some kind of counting is needed:: 33 34 walking_handlers++; 35 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 36 if (ioh->deleted) { 37 if (walking_handlers == 1) { 38 QLIST_REMOVE(ioh, next); 39 g_free(ioh); 40 } 41 } else { 42 if (ioh->revents & G_IO_OUT) { 43 ioh->fd_write(ioh->opaque); 44 } 45 } 46 } 47 walking_handlers--; 48 49One may think of using the RCU primitives, ``rcu_read_lock()`` and 50``rcu_read_unlock()``; effectively, the RCU nesting count would take 51the place of the walking_handlers global variable. Indeed, 52reference counting and RCU have similar purposes, but their usage in 53general is complementary: 54 55- reference counting is fine-grained and limited to a single data 56 structure; RCU delays reclamation of *all* RCU-protected data 57 structures; 58 59- reference counting works even in the presence of code that keeps 60 a reference for a long time; RCU critical sections in principle 61 should be kept short; 62 63- reference counting is often applied to code that is not thread-safe 64 but is reentrant; in fact, usage of reference counting in QEMU predates 65 the introduction of threads by many years. RCU is generally used to 66 protect readers from other threads freeing memory after concurrent 67 modifications to a data structure. 68 69- reclaiming data can be done by a separate thread in the case of RCU; 70 this can improve performance, but also delay reclamation undesirably. 71 With reference counting, reclamation is deterministic. 72 73This file documents ``QemuLockCnt``, an abstraction for using reference 74counting in code that has to be both thread-safe and reentrant. 75 76 77``QemuLockCnt`` concepts 78------------------------ 79 80A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives 81to increment and decrement the counter, and to take and release the 82mutex. The counter notes how many visits to the data structures are 83taking place (the visits could be from different threads, or there could 84be multiple reentrant visits from the same thread). The basic rules 85governing the counter/mutex pair then are the following: 86 87- Data protected by the QemuLockCnt must not be freed unless the 88 counter is zero and the mutex is taken. 89 90- A new visit cannot be started while the counter is zero and the 91 mutex is taken. 92 93Most of the time, the mutex protects all writes to the data structure, 94not just frees, though there could be cases where this is not necessary. 95 96Reads, instead, can be done without taking the mutex, as long as the 97readers and writers use the same macros that are used for RCU, for 98example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``, 99etc. This is because the reads are done outside a lock and a set 100or ``QLIST_INSERT_HEAD`` 101can happen concurrently with the read. The RCU API ensures that the 102processor and the compiler see all required memory barriers. 103 104This could be implemented simply by protecting the counter with the 105mutex, for example:: 106 107 // (1) 108 qemu_mutex_lock(&walking_handlers_mutex); 109 walking_handlers++; 110 qemu_mutex_unlock(&walking_handlers_mutex); 111 112 ... 113 114 // (2) 115 qemu_mutex_lock(&walking_handlers_mutex); 116 if (--walking_handlers == 0) { 117 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 118 if (ioh->deleted) { 119 QLIST_REMOVE(ioh, next); 120 g_free(ioh); 121 } 122 } 123 } 124 qemu_mutex_unlock(&walking_handlers_mutex); 125 126Here, no frees can happen in the code represented by the ellipsis. 127If another thread is executing critical section (2), that part of 128the code cannot be entered, because the thread will not be able 129to increment the ``walking_handlers`` variable. And of course 130during the visit any other thread will see a nonzero value for 131``walking_handlers``, as in the single-threaded code. 132 133Note that it is possible for multiple concurrent accesses to delay 134the cleanup arbitrarily; in other words, for the ``walking_handlers`` 135counter to never become zero. For this reason, this technique is 136more easily applicable if concurrent access to the structure is rare. 137 138However, critical sections are easy to forget since you have to do 139them for each modification of the counter. ``QemuLockCnt`` ensures that 140all modifications of the counter take the lock appropriately, and it 141can also be more efficient in two ways: 142 143- it avoids taking the lock for many operations (for example 144 incrementing the counter while it is non-zero); 145 146- on some platforms, one can implement ``QemuLockCnt`` to hold the lock 147 and the mutex in a single word, making the fast path no more expensive 148 than simply managing a counter using atomic operations (see 149 :doc:`atomics`). This can be very helpful if concurrent access to 150 the data structure is expected to be rare. 151 152 153Using the same mutex for frees and writes can still incur some small 154inefficiencies; for example, a visit can never start if the counter is 155zero and the mutex is taken -- even if the mutex is taken by a write, 156which in principle need not block a visit of the data structure. 157However, these are usually not a problem if any of the following 158assumptions are valid: 159 160- concurrent access is possible but rare 161 162- writes are rare 163 164- writes are frequent, but this kind of write (e.g. appending to a 165 list) has a very small critical section. 166 167For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of 168bottom halves and file descriptor handlers. Modifications to the list 169of file descriptor handlers are rare. Creation of a new bottom half is 170frequent and can happen on a fast path; however: 1) it is almost never 171concurrent with a visit to the list of bottom halves; 2) it only has 172three instructions in the critical path, two assignments and a ``smp_wmb()``. 173 174 175``QemuLockCnt`` API 176------------------- 177 178.. kernel-doc:: include/qemu/lockcnt.h 179 180 181``QemuLockCnt`` usage 182--------------------- 183 184This section explains the typical usage patterns for ``QemuLockCnt`` functions. 185 186Setting a variable to a non-NULL value can be done between 187``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``:: 188 189 qemu_lockcnt_lock(&xyz_lockcnt); 190 if (!xyz) { 191 new_xyz = g_new(XYZ, 1); 192 ... 193 qatomic_rcu_set(&xyz, new_xyz); 194 } 195 qemu_lockcnt_unlock(&xyz_lockcnt); 196 197Accessing the value can be done between ``qemu_lockcnt_inc`` and 198``qemu_lockcnt_dec``:: 199 200 qemu_lockcnt_inc(&xyz_lockcnt); 201 if (xyz) { 202 XYZ *p = qatomic_rcu_read(&xyz); 203 ... 204 /* Accesses can now be done through "p". */ 205 } 206 qemu_lockcnt_dec(&xyz_lockcnt); 207 208Freeing the object can similarly use ``qemu_lockcnt_lock`` and 209``qemu_lockcnt_unlock``, but you also need to ensure that the count 210is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc`` 211takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while 212the object is being freed. Freeing an object looks like this:: 213 214 qemu_lockcnt_lock(&xyz_lockcnt); 215 if (!qemu_lockcnt_count(&xyz_lockcnt)) { 216 g_free(xyz); 217 xyz = NULL; 218 } 219 qemu_lockcnt_unlock(&xyz_lockcnt); 220 221If an object has to be freed right after a visit, you can combine 222the decrement, the locking and the check on count as follows:: 223 224 qemu_lockcnt_inc(&xyz_lockcnt); 225 if (xyz) { 226 XYZ *p = qatomic_rcu_read(&xyz); 227 ... 228 /* Accesses can now be done through "p". */ 229 } 230 if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) { 231 g_free(xyz); 232 xyz = NULL; 233 qemu_lockcnt_unlock(&xyz_lockcnt); 234 } 235 236``QemuLockCnt`` can also be used to access a list as follows:: 237 238 qemu_lockcnt_inc(&io_handlers_lockcnt); 239 QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) { 240 if (ioh->revents & G_IO_OUT) { 241 ioh->fd_write(ioh->opaque); 242 } 243 } 244 245 if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) { 246 QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 247 if (ioh->deleted) { 248 QLIST_REMOVE(ioh, next); 249 g_free(ioh); 250 } 251 } 252 qemu_lockcnt_unlock(&io_handlers_lockcnt); 253 } 254 255Again, the RCU primitives are used because new items can be added to the 256list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and 257the compiler see the appropriate memory barriers. 258 259An alternative pattern uses ``qemu_lockcnt_dec_if_lock``:: 260 261 qemu_lockcnt_inc(&io_handlers_lockcnt); 262 QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) { 263 if (ioh->deleted) { 264 if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) { 265 QLIST_REMOVE(ioh, next); 266 g_free(ioh); 267 qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt); 268 } 269 } else { 270 if (ioh->revents & G_IO_OUT) { 271 ioh->fd_write(ioh->opaque); 272 } 273 } 274 } 275 qemu_lockcnt_dec(&io_handlers_lockcnt); 276 277Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``, 278because there is no special task to do if the count goes from 1 to 0. 279