xref: /openbmc/qemu/docs/devel/lockcnt.rst (revision cde3c425)
1Locked Counters (aka ``QemuLockCnt``)
2=====================================
3
4QEMU often uses reference counts to track data structures that are being
5accessed and should not be freed.  For example, a loop that invoke
6callbacks like this is not safe::
7
8    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
9        if (ioh->revents & G_IO_OUT) {
10            ioh->fd_write(ioh->opaque);
11        }
12    }
13
14``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
15by stashing away its ``next`` pointer.  However, ``ioh->fd_write`` could
16actually delete the next node from the list.  The simplest way to
17avoid this is to mark the node as deleted, and remove it from the
18list in the above loop::
19
20    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
21        if (ioh->deleted) {
22            QLIST_REMOVE(ioh, next);
23            g_free(ioh);
24        } else {
25            if (ioh->revents & G_IO_OUT) {
26                ioh->fd_write(ioh->opaque);
27            }
28        }
29    }
30
31If however this loop must also be reentrant, i.e. it is possible that
32``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
33
34    walking_handlers++;
35    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
36        if (ioh->deleted) {
37            if (walking_handlers == 1) {
38                QLIST_REMOVE(ioh, next);
39                g_free(ioh);
40            }
41        } else {
42            if (ioh->revents & G_IO_OUT) {
43                ioh->fd_write(ioh->opaque);
44            }
45        }
46    }
47    walking_handlers--;
48
49One may think of using the RCU primitives, ``rcu_read_lock()`` and
50``rcu_read_unlock()``; effectively, the RCU nesting count would take
51the place of the walking_handlers global variable.  Indeed,
52reference counting and RCU have similar purposes, but their usage in
53general is complementary:
54
55- reference counting is fine-grained and limited to a single data
56  structure; RCU delays reclamation of *all* RCU-protected data
57  structures;
58
59- reference counting works even in the presence of code that keeps
60  a reference for a long time; RCU critical sections in principle
61  should be kept short;
62
63- reference counting is often applied to code that is not thread-safe
64  but is reentrant; in fact, usage of reference counting in QEMU predates
65  the introduction of threads by many years.  RCU is generally used to
66  protect readers from other threads freeing memory after concurrent
67  modifications to a data structure.
68
69- reclaiming data can be done by a separate thread in the case of RCU;
70  this can improve performance, but also delay reclamation undesirably.
71  With reference counting, reclamation is deterministic.
72
73This file documents ``QemuLockCnt``, an abstraction for using reference
74counting in code that has to be both thread-safe and reentrant.
75
76
77``QemuLockCnt`` concepts
78------------------------
79
80A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
81to increment and decrement the counter, and to take and release the
82mutex.  The counter notes how many visits to the data structures are
83taking place (the visits could be from different threads, or there could
84be multiple reentrant visits from the same thread).  The basic rules
85governing the counter/mutex pair then are the following:
86
87- Data protected by the QemuLockCnt must not be freed unless the
88  counter is zero and the mutex is taken.
89
90- A new visit cannot be started while the counter is zero and the
91  mutex is taken.
92
93Most of the time, the mutex protects all writes to the data structure,
94not just frees, though there could be cases where this is not necessary.
95
96Reads, instead, can be done without taking the mutex, as long as the
97readers and writers use the same macros that are used for RCU, for
98example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
99etc.  This is because the reads are done outside a lock and a set
100or ``QLIST_INSERT_HEAD``
101can happen concurrently with the read.  The RCU API ensures that the
102processor and the compiler see all required memory barriers.
103
104This could be implemented simply by protecting the counter with the
105mutex, for example::
106
107    // (1)
108    qemu_mutex_lock(&walking_handlers_mutex);
109    walking_handlers++;
110    qemu_mutex_unlock(&walking_handlers_mutex);
111
112    ...
113
114    // (2)
115    qemu_mutex_lock(&walking_handlers_mutex);
116    if (--walking_handlers == 0) {
117        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
118            if (ioh->deleted) {
119                QLIST_REMOVE(ioh, next);
120                g_free(ioh);
121            }
122        }
123    }
124    qemu_mutex_unlock(&walking_handlers_mutex);
125
126Here, no frees can happen in the code represented by the ellipsis.
127If another thread is executing critical section (2), that part of
128the code cannot be entered, because the thread will not be able
129to increment the ``walking_handlers`` variable.  And of course
130during the visit any other thread will see a nonzero value for
131``walking_handlers``, as in the single-threaded code.
132
133Note that it is possible for multiple concurrent accesses to delay
134the cleanup arbitrarily; in other words, for the ``walking_handlers``
135counter to never become zero.  For this reason, this technique is
136more easily applicable if concurrent access to the structure is rare.
137
138However, critical sections are easy to forget since you have to do
139them for each modification of the counter.  ``QemuLockCnt`` ensures that
140all modifications of the counter take the lock appropriately, and it
141can also be more efficient in two ways:
142
143- it avoids taking the lock for many operations (for example
144  incrementing the counter while it is non-zero);
145
146- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
147  and the mutex in a single word, making the fast path no more expensive
148  than simply managing a counter using atomic operations (see
149  :doc:`atomics`).  This can be very helpful if concurrent access to
150  the data structure is expected to be rare.
151
152
153Using the same mutex for frees and writes can still incur some small
154inefficiencies; for example, a visit can never start if the counter is
155zero and the mutex is taken -- even if the mutex is taken by a write,
156which in principle need not block a visit of the data structure.
157However, these are usually not a problem if any of the following
158assumptions are valid:
159
160- concurrent access is possible but rare
161
162- writes are rare
163
164- writes are frequent, but this kind of write (e.g. appending to a
165  list) has a very small critical section.
166
167For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
168bottom halves and file descriptor handlers.  Modifications to the list
169of file descriptor handlers are rare.  Creation of a new bottom half is
170frequent and can happen on a fast path; however: 1) it is almost never
171concurrent with a visit to the list of bottom halves; 2) it only has
172three instructions in the critical path, two assignments and a ``smp_wmb()``.
173
174
175``QemuLockCnt`` API
176-------------------
177
178.. kernel-doc:: include/qemu/lockcnt.h
179
180
181``QemuLockCnt`` usage
182---------------------
183
184This section explains the typical usage patterns for ``QemuLockCnt`` functions.
185
186Setting a variable to a non-NULL value can be done between
187``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
188
189    qemu_lockcnt_lock(&xyz_lockcnt);
190    if (!xyz) {
191        new_xyz = g_new(XYZ, 1);
192        ...
193        qatomic_rcu_set(&xyz, new_xyz);
194    }
195    qemu_lockcnt_unlock(&xyz_lockcnt);
196
197Accessing the value can be done between ``qemu_lockcnt_inc`` and
198``qemu_lockcnt_dec``::
199
200    qemu_lockcnt_inc(&xyz_lockcnt);
201    if (xyz) {
202        XYZ *p = qatomic_rcu_read(&xyz);
203        ...
204        /* Accesses can now be done through "p".  */
205    }
206    qemu_lockcnt_dec(&xyz_lockcnt);
207
208Freeing the object can similarly use ``qemu_lockcnt_lock`` and
209``qemu_lockcnt_unlock``, but you also need to ensure that the count
210is zero (i.e. there is no concurrent visit).  Because ``qemu_lockcnt_inc``
211takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
212the object is being freed.  Freeing an object looks like this::
213
214    qemu_lockcnt_lock(&xyz_lockcnt);
215    if (!qemu_lockcnt_count(&xyz_lockcnt)) {
216        g_free(xyz);
217        xyz = NULL;
218    }
219    qemu_lockcnt_unlock(&xyz_lockcnt);
220
221If an object has to be freed right after a visit, you can combine
222the decrement, the locking and the check on count as follows::
223
224    qemu_lockcnt_inc(&xyz_lockcnt);
225    if (xyz) {
226        XYZ *p = qatomic_rcu_read(&xyz);
227        ...
228        /* Accesses can now be done through "p".  */
229    }
230    if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
231        g_free(xyz);
232        xyz = NULL;
233        qemu_lockcnt_unlock(&xyz_lockcnt);
234    }
235
236``QemuLockCnt`` can also be used to access a list as follows::
237
238    qemu_lockcnt_inc(&io_handlers_lockcnt);
239    QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
240        if (ioh->revents & G_IO_OUT) {
241            ioh->fd_write(ioh->opaque);
242        }
243    }
244
245    if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
246        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
247            if (ioh->deleted) {
248                QLIST_REMOVE(ioh, next);
249                g_free(ioh);
250            }
251        }
252        qemu_lockcnt_unlock(&io_handlers_lockcnt);
253    }
254
255Again, the RCU primitives are used because new items can be added to the
256list during the walk.  ``QLIST_FOREACH_RCU`` ensures that the processor and
257the compiler see the appropriate memory barriers.
258
259An alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
260
261    qemu_lockcnt_inc(&io_handlers_lockcnt);
262    QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
263        if (ioh->deleted) {
264            if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
265                QLIST_REMOVE(ioh, next);
266                g_free(ioh);
267                qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
268            }
269        } else {
270            if (ioh->revents & G_IO_OUT) {
271                ioh->fd_write(ioh->opaque);
272            }
273        }
274    }
275    qemu_lockcnt_dec(&io_handlers_lockcnt);
276
277Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
278because there is no special task to do if the count goes from 1 to 0.
279