1362dbb4fSPeter MaydellLocked Counters (aka ``QemuLockCnt``) 2362dbb4fSPeter Maydell===================================== 3362dbb4fSPeter Maydell 4362dbb4fSPeter MaydellQEMU often uses reference counts to track data structures that are being 5362dbb4fSPeter Maydellaccessed and should not be freed. For example, a loop that invoke 6362dbb4fSPeter Maydellcallbacks like this is not safe:: 7362dbb4fSPeter Maydell 8362dbb4fSPeter Maydell QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 9362dbb4fSPeter Maydell if (ioh->revents & G_IO_OUT) { 10362dbb4fSPeter Maydell ioh->fd_write(ioh->opaque); 11362dbb4fSPeter Maydell } 12362dbb4fSPeter Maydell } 13362dbb4fSPeter Maydell 14362dbb4fSPeter Maydell``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``) 15362dbb4fSPeter Maydellby stashing away its ``next`` pointer. However, ``ioh->fd_write`` could 16362dbb4fSPeter Maydellactually delete the next node from the list. The simplest way to 17362dbb4fSPeter Maydellavoid this is to mark the node as deleted, and remove it from the 18362dbb4fSPeter Maydelllist in the above loop:: 19362dbb4fSPeter Maydell 20362dbb4fSPeter Maydell QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 21362dbb4fSPeter Maydell if (ioh->deleted) { 22362dbb4fSPeter Maydell QLIST_REMOVE(ioh, next); 23362dbb4fSPeter Maydell g_free(ioh); 24362dbb4fSPeter Maydell } else { 25362dbb4fSPeter Maydell if (ioh->revents & G_IO_OUT) { 26362dbb4fSPeter Maydell ioh->fd_write(ioh->opaque); 27362dbb4fSPeter Maydell } 28362dbb4fSPeter Maydell } 29362dbb4fSPeter Maydell } 30362dbb4fSPeter Maydell 31362dbb4fSPeter MaydellIf however this loop must also be reentrant, i.e. it is possible that 32362dbb4fSPeter Maydell``ioh->fd_write`` invokes the loop again, some kind of counting is needed:: 33362dbb4fSPeter Maydell 34362dbb4fSPeter Maydell walking_handlers++; 35362dbb4fSPeter Maydell QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 36362dbb4fSPeter Maydell if (ioh->deleted) { 37362dbb4fSPeter Maydell if (walking_handlers == 1) { 38362dbb4fSPeter Maydell QLIST_REMOVE(ioh, next); 39362dbb4fSPeter Maydell g_free(ioh); 40362dbb4fSPeter Maydell } 41362dbb4fSPeter Maydell } else { 42362dbb4fSPeter Maydell if (ioh->revents & G_IO_OUT) { 43362dbb4fSPeter Maydell ioh->fd_write(ioh->opaque); 44362dbb4fSPeter Maydell } 45362dbb4fSPeter Maydell } 46362dbb4fSPeter Maydell } 47362dbb4fSPeter Maydell walking_handlers--; 48362dbb4fSPeter Maydell 49362dbb4fSPeter MaydellOne may think of using the RCU primitives, ``rcu_read_lock()`` and 50362dbb4fSPeter Maydell``rcu_read_unlock()``; effectively, the RCU nesting count would take 51362dbb4fSPeter Maydellthe place of the walking_handlers global variable. Indeed, 52362dbb4fSPeter Maydellreference counting and RCU have similar purposes, but their usage in 53362dbb4fSPeter Maydellgeneral is complementary: 54362dbb4fSPeter Maydell 55362dbb4fSPeter Maydell- reference counting is fine-grained and limited to a single data 56362dbb4fSPeter Maydell structure; RCU delays reclamation of *all* RCU-protected data 57362dbb4fSPeter Maydell structures; 58362dbb4fSPeter Maydell 59362dbb4fSPeter Maydell- reference counting works even in the presence of code that keeps 60362dbb4fSPeter Maydell a reference for a long time; RCU critical sections in principle 61362dbb4fSPeter Maydell should be kept short; 62362dbb4fSPeter Maydell 63362dbb4fSPeter Maydell- reference counting is often applied to code that is not thread-safe 64362dbb4fSPeter Maydell but is reentrant; in fact, usage of reference counting in QEMU predates 65362dbb4fSPeter Maydell the introduction of threads by many years. RCU is generally used to 66362dbb4fSPeter Maydell protect readers from other threads freeing memory after concurrent 67362dbb4fSPeter Maydell modifications to a data structure. 68362dbb4fSPeter Maydell 69362dbb4fSPeter Maydell- reclaiming data can be done by a separate thread in the case of RCU; 70362dbb4fSPeter Maydell this can improve performance, but also delay reclamation undesirably. 71362dbb4fSPeter Maydell With reference counting, reclamation is deterministic. 72362dbb4fSPeter Maydell 73362dbb4fSPeter MaydellThis file documents ``QemuLockCnt``, an abstraction for using reference 74362dbb4fSPeter Maydellcounting in code that has to be both thread-safe and reentrant. 75362dbb4fSPeter Maydell 76362dbb4fSPeter Maydell 77362dbb4fSPeter Maydell``QemuLockCnt`` concepts 78362dbb4fSPeter Maydell------------------------ 79362dbb4fSPeter Maydell 80362dbb4fSPeter MaydellA ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives 81362dbb4fSPeter Maydellto increment and decrement the counter, and to take and release the 82362dbb4fSPeter Maydellmutex. The counter notes how many visits to the data structures are 83362dbb4fSPeter Maydelltaking place (the visits could be from different threads, or there could 84362dbb4fSPeter Maydellbe multiple reentrant visits from the same thread). The basic rules 85362dbb4fSPeter Maydellgoverning the counter/mutex pair then are the following: 86362dbb4fSPeter Maydell 87362dbb4fSPeter Maydell- Data protected by the QemuLockCnt must not be freed unless the 88362dbb4fSPeter Maydell counter is zero and the mutex is taken. 89362dbb4fSPeter Maydell 90362dbb4fSPeter Maydell- A new visit cannot be started while the counter is zero and the 91362dbb4fSPeter Maydell mutex is taken. 92362dbb4fSPeter Maydell 93362dbb4fSPeter MaydellMost of the time, the mutex protects all writes to the data structure, 94362dbb4fSPeter Maydellnot just frees, though there could be cases where this is not necessary. 95362dbb4fSPeter Maydell 96362dbb4fSPeter MaydellReads, instead, can be done without taking the mutex, as long as the 97362dbb4fSPeter Maydellreaders and writers use the same macros that are used for RCU, for 98362dbb4fSPeter Maydellexample ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``, 99362dbb4fSPeter Maydelletc. This is because the reads are done outside a lock and a set 100362dbb4fSPeter Maydellor ``QLIST_INSERT_HEAD`` 101362dbb4fSPeter Maydellcan happen concurrently with the read. The RCU API ensures that the 102362dbb4fSPeter Maydellprocessor and the compiler see all required memory barriers. 103362dbb4fSPeter Maydell 104362dbb4fSPeter MaydellThis could be implemented simply by protecting the counter with the 105362dbb4fSPeter Maydellmutex, for example:: 106362dbb4fSPeter Maydell 107362dbb4fSPeter Maydell // (1) 108362dbb4fSPeter Maydell qemu_mutex_lock(&walking_handlers_mutex); 109362dbb4fSPeter Maydell walking_handlers++; 110362dbb4fSPeter Maydell qemu_mutex_unlock(&walking_handlers_mutex); 111362dbb4fSPeter Maydell 112362dbb4fSPeter Maydell ... 113362dbb4fSPeter Maydell 114362dbb4fSPeter Maydell // (2) 115362dbb4fSPeter Maydell qemu_mutex_lock(&walking_handlers_mutex); 116362dbb4fSPeter Maydell if (--walking_handlers == 0) { 117362dbb4fSPeter Maydell QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 118362dbb4fSPeter Maydell if (ioh->deleted) { 119362dbb4fSPeter Maydell QLIST_REMOVE(ioh, next); 120362dbb4fSPeter Maydell g_free(ioh); 121362dbb4fSPeter Maydell } 122362dbb4fSPeter Maydell } 123362dbb4fSPeter Maydell } 124362dbb4fSPeter Maydell qemu_mutex_unlock(&walking_handlers_mutex); 125362dbb4fSPeter Maydell 126362dbb4fSPeter MaydellHere, no frees can happen in the code represented by the ellipsis. 127362dbb4fSPeter MaydellIf another thread is executing critical section (2), that part of 128362dbb4fSPeter Maydellthe code cannot be entered, because the thread will not be able 129362dbb4fSPeter Maydellto increment the ``walking_handlers`` variable. And of course 130362dbb4fSPeter Maydellduring the visit any other thread will see a nonzero value for 131362dbb4fSPeter Maydell``walking_handlers``, as in the single-threaded code. 132362dbb4fSPeter Maydell 133362dbb4fSPeter MaydellNote that it is possible for multiple concurrent accesses to delay 134362dbb4fSPeter Maydellthe cleanup arbitrarily; in other words, for the ``walking_handlers`` 135362dbb4fSPeter Maydellcounter to never become zero. For this reason, this technique is 136362dbb4fSPeter Maydellmore easily applicable if concurrent access to the structure is rare. 137362dbb4fSPeter Maydell 138362dbb4fSPeter MaydellHowever, critical sections are easy to forget since you have to do 139362dbb4fSPeter Maydellthem for each modification of the counter. ``QemuLockCnt`` ensures that 140362dbb4fSPeter Maydellall modifications of the counter take the lock appropriately, and it 141362dbb4fSPeter Maydellcan also be more efficient in two ways: 142362dbb4fSPeter Maydell 143362dbb4fSPeter Maydell- it avoids taking the lock for many operations (for example 144362dbb4fSPeter Maydell incrementing the counter while it is non-zero); 145362dbb4fSPeter Maydell 146362dbb4fSPeter Maydell- on some platforms, one can implement ``QemuLockCnt`` to hold the lock 147362dbb4fSPeter Maydell and the mutex in a single word, making the fast path no more expensive 148362dbb4fSPeter Maydell than simply managing a counter using atomic operations (see 149362dbb4fSPeter Maydell :doc:`atomics`). This can be very helpful if concurrent access to 150362dbb4fSPeter Maydell the data structure is expected to be rare. 151362dbb4fSPeter Maydell 152362dbb4fSPeter Maydell 153362dbb4fSPeter MaydellUsing the same mutex for frees and writes can still incur some small 154362dbb4fSPeter Maydellinefficiencies; for example, a visit can never start if the counter is 155362dbb4fSPeter Maydellzero and the mutex is taken -- even if the mutex is taken by a write, 156362dbb4fSPeter Maydellwhich in principle need not block a visit of the data structure. 157362dbb4fSPeter MaydellHowever, these are usually not a problem if any of the following 158362dbb4fSPeter Maydellassumptions are valid: 159362dbb4fSPeter Maydell 160362dbb4fSPeter Maydell- concurrent access is possible but rare 161362dbb4fSPeter Maydell 162362dbb4fSPeter Maydell- writes are rare 163362dbb4fSPeter Maydell 164362dbb4fSPeter Maydell- writes are frequent, but this kind of write (e.g. appending to a 165362dbb4fSPeter Maydell list) has a very small critical section. 166362dbb4fSPeter Maydell 167362dbb4fSPeter MaydellFor example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of 168362dbb4fSPeter Maydellbottom halves and file descriptor handlers. Modifications to the list 169362dbb4fSPeter Maydellof file descriptor handlers are rare. Creation of a new bottom half is 170362dbb4fSPeter Maydellfrequent and can happen on a fast path; however: 1) it is almost never 171362dbb4fSPeter Maydellconcurrent with a visit to the list of bottom halves; 2) it only has 172362dbb4fSPeter Maydellthree instructions in the critical path, two assignments and a ``smp_wmb()``. 173362dbb4fSPeter Maydell 174362dbb4fSPeter Maydell 175362dbb4fSPeter Maydell``QemuLockCnt`` API 176362dbb4fSPeter Maydell------------------- 177362dbb4fSPeter Maydell 178*0ae50e8eSPeter Maydell.. kernel-doc:: include/qemu/lockcnt.h 179362dbb4fSPeter Maydell 180362dbb4fSPeter Maydell 181362dbb4fSPeter Maydell``QemuLockCnt`` usage 182362dbb4fSPeter Maydell--------------------- 183362dbb4fSPeter Maydell 184362dbb4fSPeter MaydellThis section explains the typical usage patterns for ``QemuLockCnt`` functions. 185362dbb4fSPeter Maydell 186362dbb4fSPeter MaydellSetting a variable to a non-NULL value can be done between 187362dbb4fSPeter Maydell``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``:: 188362dbb4fSPeter Maydell 189362dbb4fSPeter Maydell qemu_lockcnt_lock(&xyz_lockcnt); 190362dbb4fSPeter Maydell if (!xyz) { 191362dbb4fSPeter Maydell new_xyz = g_new(XYZ, 1); 192362dbb4fSPeter Maydell ... 193362dbb4fSPeter Maydell qatomic_rcu_set(&xyz, new_xyz); 194362dbb4fSPeter Maydell } 195362dbb4fSPeter Maydell qemu_lockcnt_unlock(&xyz_lockcnt); 196362dbb4fSPeter Maydell 197362dbb4fSPeter MaydellAccessing the value can be done between ``qemu_lockcnt_inc`` and 198362dbb4fSPeter Maydell``qemu_lockcnt_dec``:: 199362dbb4fSPeter Maydell 200362dbb4fSPeter Maydell qemu_lockcnt_inc(&xyz_lockcnt); 201362dbb4fSPeter Maydell if (xyz) { 202362dbb4fSPeter Maydell XYZ *p = qatomic_rcu_read(&xyz); 203362dbb4fSPeter Maydell ... 204362dbb4fSPeter Maydell /* Accesses can now be done through "p". */ 205362dbb4fSPeter Maydell } 206362dbb4fSPeter Maydell qemu_lockcnt_dec(&xyz_lockcnt); 207362dbb4fSPeter Maydell 208362dbb4fSPeter MaydellFreeing the object can similarly use ``qemu_lockcnt_lock`` and 209362dbb4fSPeter Maydell``qemu_lockcnt_unlock``, but you also need to ensure that the count 210362dbb4fSPeter Maydellis zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc`` 211362dbb4fSPeter Maydelltakes the ``QemuLockCnt``'s lock, the count cannot become non-zero while 212362dbb4fSPeter Maydellthe object is being freed. Freeing an object looks like this:: 213362dbb4fSPeter Maydell 214362dbb4fSPeter Maydell qemu_lockcnt_lock(&xyz_lockcnt); 215362dbb4fSPeter Maydell if (!qemu_lockcnt_count(&xyz_lockcnt)) { 216362dbb4fSPeter Maydell g_free(xyz); 217362dbb4fSPeter Maydell xyz = NULL; 218362dbb4fSPeter Maydell } 219362dbb4fSPeter Maydell qemu_lockcnt_unlock(&xyz_lockcnt); 220362dbb4fSPeter Maydell 221362dbb4fSPeter MaydellIf an object has to be freed right after a visit, you can combine 222362dbb4fSPeter Maydellthe decrement, the locking and the check on count as follows:: 223362dbb4fSPeter Maydell 224362dbb4fSPeter Maydell qemu_lockcnt_inc(&xyz_lockcnt); 225362dbb4fSPeter Maydell if (xyz) { 226362dbb4fSPeter Maydell XYZ *p = qatomic_rcu_read(&xyz); 227362dbb4fSPeter Maydell ... 228362dbb4fSPeter Maydell /* Accesses can now be done through "p". */ 229362dbb4fSPeter Maydell } 230362dbb4fSPeter Maydell if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) { 231362dbb4fSPeter Maydell g_free(xyz); 232362dbb4fSPeter Maydell xyz = NULL; 233362dbb4fSPeter Maydell qemu_lockcnt_unlock(&xyz_lockcnt); 234362dbb4fSPeter Maydell } 235362dbb4fSPeter Maydell 236362dbb4fSPeter Maydell``QemuLockCnt`` can also be used to access a list as follows:: 237362dbb4fSPeter Maydell 238362dbb4fSPeter Maydell qemu_lockcnt_inc(&io_handlers_lockcnt); 239362dbb4fSPeter Maydell QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) { 240362dbb4fSPeter Maydell if (ioh->revents & G_IO_OUT) { 241362dbb4fSPeter Maydell ioh->fd_write(ioh->opaque); 242362dbb4fSPeter Maydell } 243362dbb4fSPeter Maydell } 244362dbb4fSPeter Maydell 245362dbb4fSPeter Maydell if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) { 246362dbb4fSPeter Maydell QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { 247362dbb4fSPeter Maydell if (ioh->deleted) { 248362dbb4fSPeter Maydell QLIST_REMOVE(ioh, next); 249362dbb4fSPeter Maydell g_free(ioh); 250362dbb4fSPeter Maydell } 251362dbb4fSPeter Maydell } 252362dbb4fSPeter Maydell qemu_lockcnt_unlock(&io_handlers_lockcnt); 253362dbb4fSPeter Maydell } 254362dbb4fSPeter Maydell 255362dbb4fSPeter MaydellAgain, the RCU primitives are used because new items can be added to the 256362dbb4fSPeter Maydelllist during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and 257362dbb4fSPeter Maydellthe compiler see the appropriate memory barriers. 258362dbb4fSPeter Maydell 259362dbb4fSPeter MaydellAn alternative pattern uses ``qemu_lockcnt_dec_if_lock``:: 260362dbb4fSPeter Maydell 261362dbb4fSPeter Maydell qemu_lockcnt_inc(&io_handlers_lockcnt); 262362dbb4fSPeter Maydell QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) { 263362dbb4fSPeter Maydell if (ioh->deleted) { 264362dbb4fSPeter Maydell if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) { 265362dbb4fSPeter Maydell QLIST_REMOVE(ioh, next); 266362dbb4fSPeter Maydell g_free(ioh); 267362dbb4fSPeter Maydell qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt); 268362dbb4fSPeter Maydell } 269362dbb4fSPeter Maydell } else { 270362dbb4fSPeter Maydell if (ioh->revents & G_IO_OUT) { 271362dbb4fSPeter Maydell ioh->fd_write(ioh->opaque); 272362dbb4fSPeter Maydell } 273362dbb4fSPeter Maydell } 274362dbb4fSPeter Maydell } 275362dbb4fSPeter Maydell qemu_lockcnt_dec(&io_handlers_lockcnt); 276362dbb4fSPeter Maydell 277362dbb4fSPeter MaydellHere you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``, 278362dbb4fSPeter Maydellbecause there is no special task to do if the count goes from 1 to 0. 279