xref: /openbmc/qemu/docs/devel/lockcnt.rst (revision f774a677507966222624a9b2859f06ede7608100)
1362dbb4fSPeter MaydellLocked Counters (aka ``QemuLockCnt``)
2362dbb4fSPeter Maydell=====================================
3362dbb4fSPeter Maydell
4362dbb4fSPeter MaydellQEMU often uses reference counts to track data structures that are being
5362dbb4fSPeter Maydellaccessed and should not be freed.  For example, a loop that invoke
6362dbb4fSPeter Maydellcallbacks like this is not safe::
7362dbb4fSPeter Maydell
8362dbb4fSPeter Maydell    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
9362dbb4fSPeter Maydell        if (ioh->revents & G_IO_OUT) {
10362dbb4fSPeter Maydell            ioh->fd_write(ioh->opaque);
11362dbb4fSPeter Maydell        }
12362dbb4fSPeter Maydell    }
13362dbb4fSPeter Maydell
14362dbb4fSPeter Maydell``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
15362dbb4fSPeter Maydellby stashing away its ``next`` pointer.  However, ``ioh->fd_write`` could
16362dbb4fSPeter Maydellactually delete the next node from the list.  The simplest way to
17362dbb4fSPeter Maydellavoid this is to mark the node as deleted, and remove it from the
18362dbb4fSPeter Maydelllist in the above loop::
19362dbb4fSPeter Maydell
20362dbb4fSPeter Maydell    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
21362dbb4fSPeter Maydell        if (ioh->deleted) {
22362dbb4fSPeter Maydell            QLIST_REMOVE(ioh, next);
23362dbb4fSPeter Maydell            g_free(ioh);
24362dbb4fSPeter Maydell        } else {
25362dbb4fSPeter Maydell            if (ioh->revents & G_IO_OUT) {
26362dbb4fSPeter Maydell                ioh->fd_write(ioh->opaque);
27362dbb4fSPeter Maydell            }
28362dbb4fSPeter Maydell        }
29362dbb4fSPeter Maydell    }
30362dbb4fSPeter Maydell
31362dbb4fSPeter MaydellIf however this loop must also be reentrant, i.e. it is possible that
32362dbb4fSPeter Maydell``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
33362dbb4fSPeter Maydell
34362dbb4fSPeter Maydell    walking_handlers++;
35362dbb4fSPeter Maydell    QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
36362dbb4fSPeter Maydell        if (ioh->deleted) {
37362dbb4fSPeter Maydell            if (walking_handlers == 1) {
38362dbb4fSPeter Maydell                QLIST_REMOVE(ioh, next);
39362dbb4fSPeter Maydell                g_free(ioh);
40362dbb4fSPeter Maydell            }
41362dbb4fSPeter Maydell        } else {
42362dbb4fSPeter Maydell            if (ioh->revents & G_IO_OUT) {
43362dbb4fSPeter Maydell                ioh->fd_write(ioh->opaque);
44362dbb4fSPeter Maydell            }
45362dbb4fSPeter Maydell        }
46362dbb4fSPeter Maydell    }
47362dbb4fSPeter Maydell    walking_handlers--;
48362dbb4fSPeter Maydell
49362dbb4fSPeter MaydellOne may think of using the RCU primitives, ``rcu_read_lock()`` and
50362dbb4fSPeter Maydell``rcu_read_unlock()``; effectively, the RCU nesting count would take
51362dbb4fSPeter Maydellthe place of the walking_handlers global variable.  Indeed,
52362dbb4fSPeter Maydellreference counting and RCU have similar purposes, but their usage in
53362dbb4fSPeter Maydellgeneral is complementary:
54362dbb4fSPeter Maydell
55362dbb4fSPeter Maydell- reference counting is fine-grained and limited to a single data
56362dbb4fSPeter Maydell  structure; RCU delays reclamation of *all* RCU-protected data
57362dbb4fSPeter Maydell  structures;
58362dbb4fSPeter Maydell
59362dbb4fSPeter Maydell- reference counting works even in the presence of code that keeps
60362dbb4fSPeter Maydell  a reference for a long time; RCU critical sections in principle
61362dbb4fSPeter Maydell  should be kept short;
62362dbb4fSPeter Maydell
63362dbb4fSPeter Maydell- reference counting is often applied to code that is not thread-safe
64362dbb4fSPeter Maydell  but is reentrant; in fact, usage of reference counting in QEMU predates
65362dbb4fSPeter Maydell  the introduction of threads by many years.  RCU is generally used to
66362dbb4fSPeter Maydell  protect readers from other threads freeing memory after concurrent
67362dbb4fSPeter Maydell  modifications to a data structure.
68362dbb4fSPeter Maydell
69362dbb4fSPeter Maydell- reclaiming data can be done by a separate thread in the case of RCU;
70362dbb4fSPeter Maydell  this can improve performance, but also delay reclamation undesirably.
71362dbb4fSPeter Maydell  With reference counting, reclamation is deterministic.
72362dbb4fSPeter Maydell
73362dbb4fSPeter MaydellThis file documents ``QemuLockCnt``, an abstraction for using reference
74362dbb4fSPeter Maydellcounting in code that has to be both thread-safe and reentrant.
75362dbb4fSPeter Maydell
76362dbb4fSPeter Maydell
77362dbb4fSPeter Maydell``QemuLockCnt`` concepts
78362dbb4fSPeter Maydell------------------------
79362dbb4fSPeter Maydell
80362dbb4fSPeter MaydellA ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
81362dbb4fSPeter Maydellto increment and decrement the counter, and to take and release the
82362dbb4fSPeter Maydellmutex.  The counter notes how many visits to the data structures are
83362dbb4fSPeter Maydelltaking place (the visits could be from different threads, or there could
84362dbb4fSPeter Maydellbe multiple reentrant visits from the same thread).  The basic rules
85362dbb4fSPeter Maydellgoverning the counter/mutex pair then are the following:
86362dbb4fSPeter Maydell
87362dbb4fSPeter Maydell- Data protected by the QemuLockCnt must not be freed unless the
88362dbb4fSPeter Maydell  counter is zero and the mutex is taken.
89362dbb4fSPeter Maydell
90362dbb4fSPeter Maydell- A new visit cannot be started while the counter is zero and the
91362dbb4fSPeter Maydell  mutex is taken.
92362dbb4fSPeter Maydell
93362dbb4fSPeter MaydellMost of the time, the mutex protects all writes to the data structure,
94362dbb4fSPeter Maydellnot just frees, though there could be cases where this is not necessary.
95362dbb4fSPeter Maydell
96362dbb4fSPeter MaydellReads, instead, can be done without taking the mutex, as long as the
97362dbb4fSPeter Maydellreaders and writers use the same macros that are used for RCU, for
98362dbb4fSPeter Maydellexample ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
99362dbb4fSPeter Maydelletc.  This is because the reads are done outside a lock and a set
100362dbb4fSPeter Maydellor ``QLIST_INSERT_HEAD``
101362dbb4fSPeter Maydellcan happen concurrently with the read.  The RCU API ensures that the
102362dbb4fSPeter Maydellprocessor and the compiler see all required memory barriers.
103362dbb4fSPeter Maydell
104362dbb4fSPeter MaydellThis could be implemented simply by protecting the counter with the
105362dbb4fSPeter Maydellmutex, for example::
106362dbb4fSPeter Maydell
107362dbb4fSPeter Maydell    // (1)
108362dbb4fSPeter Maydell    qemu_mutex_lock(&walking_handlers_mutex);
109362dbb4fSPeter Maydell    walking_handlers++;
110362dbb4fSPeter Maydell    qemu_mutex_unlock(&walking_handlers_mutex);
111362dbb4fSPeter Maydell
112362dbb4fSPeter Maydell    ...
113362dbb4fSPeter Maydell
114362dbb4fSPeter Maydell    // (2)
115362dbb4fSPeter Maydell    qemu_mutex_lock(&walking_handlers_mutex);
116362dbb4fSPeter Maydell    if (--walking_handlers == 0) {
117362dbb4fSPeter Maydell        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
118362dbb4fSPeter Maydell            if (ioh->deleted) {
119362dbb4fSPeter Maydell                QLIST_REMOVE(ioh, next);
120362dbb4fSPeter Maydell                g_free(ioh);
121362dbb4fSPeter Maydell            }
122362dbb4fSPeter Maydell        }
123362dbb4fSPeter Maydell    }
124362dbb4fSPeter Maydell    qemu_mutex_unlock(&walking_handlers_mutex);
125362dbb4fSPeter Maydell
126362dbb4fSPeter MaydellHere, no frees can happen in the code represented by the ellipsis.
127362dbb4fSPeter MaydellIf another thread is executing critical section (2), that part of
128362dbb4fSPeter Maydellthe code cannot be entered, because the thread will not be able
129362dbb4fSPeter Maydellto increment the ``walking_handlers`` variable.  And of course
130362dbb4fSPeter Maydellduring the visit any other thread will see a nonzero value for
131362dbb4fSPeter Maydell``walking_handlers``, as in the single-threaded code.
132362dbb4fSPeter Maydell
133362dbb4fSPeter MaydellNote that it is possible for multiple concurrent accesses to delay
134362dbb4fSPeter Maydellthe cleanup arbitrarily; in other words, for the ``walking_handlers``
135362dbb4fSPeter Maydellcounter to never become zero.  For this reason, this technique is
136362dbb4fSPeter Maydellmore easily applicable if concurrent access to the structure is rare.
137362dbb4fSPeter Maydell
138362dbb4fSPeter MaydellHowever, critical sections are easy to forget since you have to do
139362dbb4fSPeter Maydellthem for each modification of the counter.  ``QemuLockCnt`` ensures that
140362dbb4fSPeter Maydellall modifications of the counter take the lock appropriately, and it
141362dbb4fSPeter Maydellcan also be more efficient in two ways:
142362dbb4fSPeter Maydell
143362dbb4fSPeter Maydell- it avoids taking the lock for many operations (for example
144362dbb4fSPeter Maydell  incrementing the counter while it is non-zero);
145362dbb4fSPeter Maydell
146362dbb4fSPeter Maydell- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
147362dbb4fSPeter Maydell  and the mutex in a single word, making the fast path no more expensive
148362dbb4fSPeter Maydell  than simply managing a counter using atomic operations (see
149362dbb4fSPeter Maydell  :doc:`atomics`).  This can be very helpful if concurrent access to
150362dbb4fSPeter Maydell  the data structure is expected to be rare.
151362dbb4fSPeter Maydell
152362dbb4fSPeter Maydell
153362dbb4fSPeter MaydellUsing the same mutex for frees and writes can still incur some small
154362dbb4fSPeter Maydellinefficiencies; for example, a visit can never start if the counter is
155362dbb4fSPeter Maydellzero and the mutex is taken -- even if the mutex is taken by a write,
156362dbb4fSPeter Maydellwhich in principle need not block a visit of the data structure.
157362dbb4fSPeter MaydellHowever, these are usually not a problem if any of the following
158362dbb4fSPeter Maydellassumptions are valid:
159362dbb4fSPeter Maydell
160362dbb4fSPeter Maydell- concurrent access is possible but rare
161362dbb4fSPeter Maydell
162362dbb4fSPeter Maydell- writes are rare
163362dbb4fSPeter Maydell
164362dbb4fSPeter Maydell- writes are frequent, but this kind of write (e.g. appending to a
165362dbb4fSPeter Maydell  list) has a very small critical section.
166362dbb4fSPeter Maydell
167362dbb4fSPeter MaydellFor example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
168362dbb4fSPeter Maydellbottom halves and file descriptor handlers.  Modifications to the list
169362dbb4fSPeter Maydellof file descriptor handlers are rare.  Creation of a new bottom half is
170362dbb4fSPeter Maydellfrequent and can happen on a fast path; however: 1) it is almost never
171362dbb4fSPeter Maydellconcurrent with a visit to the list of bottom halves; 2) it only has
172362dbb4fSPeter Maydellthree instructions in the critical path, two assignments and a ``smp_wmb()``.
173362dbb4fSPeter Maydell
174362dbb4fSPeter Maydell
175362dbb4fSPeter Maydell``QemuLockCnt`` API
176362dbb4fSPeter Maydell-------------------
177362dbb4fSPeter Maydell
178*0ae50e8eSPeter Maydell.. kernel-doc:: include/qemu/lockcnt.h
179362dbb4fSPeter Maydell
180362dbb4fSPeter Maydell
181362dbb4fSPeter Maydell``QemuLockCnt`` usage
182362dbb4fSPeter Maydell---------------------
183362dbb4fSPeter Maydell
184362dbb4fSPeter MaydellThis section explains the typical usage patterns for ``QemuLockCnt`` functions.
185362dbb4fSPeter Maydell
186362dbb4fSPeter MaydellSetting a variable to a non-NULL value can be done between
187362dbb4fSPeter Maydell``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
188362dbb4fSPeter Maydell
189362dbb4fSPeter Maydell    qemu_lockcnt_lock(&xyz_lockcnt);
190362dbb4fSPeter Maydell    if (!xyz) {
191362dbb4fSPeter Maydell        new_xyz = g_new(XYZ, 1);
192362dbb4fSPeter Maydell        ...
193362dbb4fSPeter Maydell        qatomic_rcu_set(&xyz, new_xyz);
194362dbb4fSPeter Maydell    }
195362dbb4fSPeter Maydell    qemu_lockcnt_unlock(&xyz_lockcnt);
196362dbb4fSPeter Maydell
197362dbb4fSPeter MaydellAccessing the value can be done between ``qemu_lockcnt_inc`` and
198362dbb4fSPeter Maydell``qemu_lockcnt_dec``::
199362dbb4fSPeter Maydell
200362dbb4fSPeter Maydell    qemu_lockcnt_inc(&xyz_lockcnt);
201362dbb4fSPeter Maydell    if (xyz) {
202362dbb4fSPeter Maydell        XYZ *p = qatomic_rcu_read(&xyz);
203362dbb4fSPeter Maydell        ...
204362dbb4fSPeter Maydell        /* Accesses can now be done through "p".  */
205362dbb4fSPeter Maydell    }
206362dbb4fSPeter Maydell    qemu_lockcnt_dec(&xyz_lockcnt);
207362dbb4fSPeter Maydell
208362dbb4fSPeter MaydellFreeing the object can similarly use ``qemu_lockcnt_lock`` and
209362dbb4fSPeter Maydell``qemu_lockcnt_unlock``, but you also need to ensure that the count
210362dbb4fSPeter Maydellis zero (i.e. there is no concurrent visit).  Because ``qemu_lockcnt_inc``
211362dbb4fSPeter Maydelltakes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
212362dbb4fSPeter Maydellthe object is being freed.  Freeing an object looks like this::
213362dbb4fSPeter Maydell
214362dbb4fSPeter Maydell    qemu_lockcnt_lock(&xyz_lockcnt);
215362dbb4fSPeter Maydell    if (!qemu_lockcnt_count(&xyz_lockcnt)) {
216362dbb4fSPeter Maydell        g_free(xyz);
217362dbb4fSPeter Maydell        xyz = NULL;
218362dbb4fSPeter Maydell    }
219362dbb4fSPeter Maydell    qemu_lockcnt_unlock(&xyz_lockcnt);
220362dbb4fSPeter Maydell
221362dbb4fSPeter MaydellIf an object has to be freed right after a visit, you can combine
222362dbb4fSPeter Maydellthe decrement, the locking and the check on count as follows::
223362dbb4fSPeter Maydell
224362dbb4fSPeter Maydell    qemu_lockcnt_inc(&xyz_lockcnt);
225362dbb4fSPeter Maydell    if (xyz) {
226362dbb4fSPeter Maydell        XYZ *p = qatomic_rcu_read(&xyz);
227362dbb4fSPeter Maydell        ...
228362dbb4fSPeter Maydell        /* Accesses can now be done through "p".  */
229362dbb4fSPeter Maydell    }
230362dbb4fSPeter Maydell    if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
231362dbb4fSPeter Maydell        g_free(xyz);
232362dbb4fSPeter Maydell        xyz = NULL;
233362dbb4fSPeter Maydell        qemu_lockcnt_unlock(&xyz_lockcnt);
234362dbb4fSPeter Maydell    }
235362dbb4fSPeter Maydell
236362dbb4fSPeter Maydell``QemuLockCnt`` can also be used to access a list as follows::
237362dbb4fSPeter Maydell
238362dbb4fSPeter Maydell    qemu_lockcnt_inc(&io_handlers_lockcnt);
239362dbb4fSPeter Maydell    QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
240362dbb4fSPeter Maydell        if (ioh->revents & G_IO_OUT) {
241362dbb4fSPeter Maydell            ioh->fd_write(ioh->opaque);
242362dbb4fSPeter Maydell        }
243362dbb4fSPeter Maydell    }
244362dbb4fSPeter Maydell
245362dbb4fSPeter Maydell    if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
246362dbb4fSPeter Maydell        QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
247362dbb4fSPeter Maydell            if (ioh->deleted) {
248362dbb4fSPeter Maydell                QLIST_REMOVE(ioh, next);
249362dbb4fSPeter Maydell                g_free(ioh);
250362dbb4fSPeter Maydell            }
251362dbb4fSPeter Maydell        }
252362dbb4fSPeter Maydell        qemu_lockcnt_unlock(&io_handlers_lockcnt);
253362dbb4fSPeter Maydell    }
254362dbb4fSPeter Maydell
255362dbb4fSPeter MaydellAgain, the RCU primitives are used because new items can be added to the
256362dbb4fSPeter Maydelllist during the walk.  ``QLIST_FOREACH_RCU`` ensures that the processor and
257362dbb4fSPeter Maydellthe compiler see the appropriate memory barriers.
258362dbb4fSPeter Maydell
259362dbb4fSPeter MaydellAn alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
260362dbb4fSPeter Maydell
261362dbb4fSPeter Maydell    qemu_lockcnt_inc(&io_handlers_lockcnt);
262362dbb4fSPeter Maydell    QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
263362dbb4fSPeter Maydell        if (ioh->deleted) {
264362dbb4fSPeter Maydell            if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
265362dbb4fSPeter Maydell                QLIST_REMOVE(ioh, next);
266362dbb4fSPeter Maydell                g_free(ioh);
267362dbb4fSPeter Maydell                qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
268362dbb4fSPeter Maydell            }
269362dbb4fSPeter Maydell        } else {
270362dbb4fSPeter Maydell            if (ioh->revents & G_IO_OUT) {
271362dbb4fSPeter Maydell                ioh->fd_write(ioh->opaque);
272362dbb4fSPeter Maydell            }
273362dbb4fSPeter Maydell        }
274362dbb4fSPeter Maydell    }
275362dbb4fSPeter Maydell    qemu_lockcnt_dec(&io_handlers_lockcnt);
276362dbb4fSPeter Maydell
277362dbb4fSPeter MaydellHere you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
278362dbb4fSPeter Maydellbecause there is no special task to do if the count goes from 1 to 0.
279