xref: /openbmc/qemu/docs/devel/rcu.txt (revision e0cbcf1e)
1ac06724aSPaolo BonziniUsing RCU (Read-Copy-Update) for synchronization
2ac06724aSPaolo Bonzini================================================
3ac06724aSPaolo Bonzini
4ac06724aSPaolo BonziniRead-copy update (RCU) is a synchronization mechanism that is used to
5ac06724aSPaolo Bonziniprotect read-mostly data structures.  RCU is very efficient and scalable
6ac06724aSPaolo Bonzinion the read side (it is wait-free), and thus can make the read paths
7ac06724aSPaolo Bonziniextremely fast.
8ac06724aSPaolo Bonzini
9ac06724aSPaolo BonziniRCU supports concurrency between a single writer and multiple readers,
10ac06724aSPaolo Bonzinithus it is not used alone.  Typically, the write-side will use a lock to
11ac06724aSPaolo Bonziniserialize multiple updates, but other approaches are possible (e.g.,
12ac06724aSPaolo Bonzinirestricting updates to a single task).  In QEMU, when a lock is used,
13ac06724aSPaolo Bonzinithis will often be the "iothread mutex", also known as the "big QEMU
14ac06724aSPaolo Bonzinilock" (BQL).  Also, restricting updates to a single task is done in
15ac06724aSPaolo BonziniQEMU using the "bottom half" API.
16ac06724aSPaolo Bonzini
17ac06724aSPaolo BonziniRCU is fundamentally a "wait-to-finish" mechanism.  The read side marks
18ac06724aSPaolo Bonzinisections of code with "critical sections", and the update side will wait
19ac06724aSPaolo Bonzinifor the execution of all *currently running* critical sections before
20ac06724aSPaolo Bonziniproceeding, or before asynchronously executing a callback.
21ac06724aSPaolo Bonzini
22ac06724aSPaolo BonziniThe key point here is that only the currently running critical sections
23ac06724aSPaolo Bonziniare waited for; critical sections that are started _after_ the beginning
24ac06724aSPaolo Bonziniof the wait do not extend the wait, despite running concurrently with
25ac06724aSPaolo Bonzinithe updater.  This is the reason why RCU is more scalable than,
26ac06724aSPaolo Bonzinifor example, reader-writer locks.  It is so much more scalable that
27ac06724aSPaolo Bonzinithe system will have a single instance of the RCU mechanism; a single
28ac06724aSPaolo Bonzinimechanism can be used for an arbitrary number of "things", without
29ac06724aSPaolo Bonzinihaving to worry about things such as contention or deadlocks.
30ac06724aSPaolo Bonzini
31ac06724aSPaolo BonziniHow is this possible?  The basic idea is to split updates in two phases,
32ac06724aSPaolo Bonzini"removal" and "reclamation".  During removal, we ensure that subsequent
33ac06724aSPaolo Bonzinireaders will not be able to get a reference to the old data.  After
34ac06724aSPaolo Bonziniremoval has completed, a critical section will not be able to access
35ac06724aSPaolo Bonzinithe old data.  Therefore, critical sections that begin after removal
36ac06724aSPaolo Bonzinido not matter; as soon as all previous critical sections have finished,
37ac06724aSPaolo Bonzinithere cannot be any readers who hold references to the data structure,
38ac06724aSPaolo Bonziniand these can now be safely reclaimed (e.g., freed or unref'ed).
39ac06724aSPaolo Bonzini
40ac06724aSPaolo BonziniHere is a picture:
41ac06724aSPaolo Bonzini
42ac06724aSPaolo Bonzini        thread 1                  thread 2                  thread 3
43ac06724aSPaolo Bonzini    -------------------    ------------------------    -------------------
44ac06724aSPaolo Bonzini    enter RCU crit.sec.
45ac06724aSPaolo Bonzini           |                finish removal phase
46ac06724aSPaolo Bonzini           |                begin wait
47ac06724aSPaolo Bonzini           |                      |                    enter RCU crit.sec.
48ac06724aSPaolo Bonzini    exit RCU crit.sec             |                           |
49ac06724aSPaolo Bonzini                            complete wait                     |
50ac06724aSPaolo Bonzini                            begin reclamation phase           |
51ac06724aSPaolo Bonzini                                                       exit RCU crit.sec.
52ac06724aSPaolo Bonzini
53ac06724aSPaolo Bonzini
54ac06724aSPaolo BonziniNote how thread 3 is still executing its critical section when thread 2
55ac06724aSPaolo Bonzinistarts reclaiming data.  This is possible, because the old version of the
56ac06724aSPaolo Bonzinidata structure was not accessible at the time thread 3 began executing
57ac06724aSPaolo Bonzinithat critical section.
58ac06724aSPaolo Bonzini
59ac06724aSPaolo Bonzini
60ac06724aSPaolo BonziniRCU API
61ac06724aSPaolo Bonzini=======
62ac06724aSPaolo Bonzini
63ac06724aSPaolo BonziniThe core RCU API is small:
64ac06724aSPaolo Bonzini
65ac06724aSPaolo Bonzini     void rcu_read_lock(void);
66ac06724aSPaolo Bonzini
67ac06724aSPaolo Bonzini        Used by a reader to inform the reclaimer that the reader is
68ac06724aSPaolo Bonzini        entering an RCU read-side critical section.
69ac06724aSPaolo Bonzini
70ac06724aSPaolo Bonzini     void rcu_read_unlock(void);
71ac06724aSPaolo Bonzini
72ac06724aSPaolo Bonzini        Used by a reader to inform the reclaimer that the reader is
73ac06724aSPaolo Bonzini        exiting an RCU read-side critical section.  Note that RCU
74ac06724aSPaolo Bonzini        read-side critical sections may be nested and/or overlapping.
75ac06724aSPaolo Bonzini
76ac06724aSPaolo Bonzini     void synchronize_rcu(void);
77ac06724aSPaolo Bonzini
78ac06724aSPaolo Bonzini        Blocks until all pre-existing RCU read-side critical sections
79ac06724aSPaolo Bonzini        on all threads have completed.  This marks the end of the removal
80ac06724aSPaolo Bonzini        phase and the beginning of reclamation phase.
81ac06724aSPaolo Bonzini
82ac06724aSPaolo Bonzini        Note that it would be valid for another update to come while
83ac06724aSPaolo Bonzini        synchronize_rcu is running.  Because of this, it is better that
84ac06724aSPaolo Bonzini        the updater releases any locks it may hold before calling
85ac06724aSPaolo Bonzini        synchronize_rcu.  If this is not possible (for example, because
86ac06724aSPaolo Bonzini        the updater is protected by the BQL), you can use call_rcu.
87ac06724aSPaolo Bonzini
88ac06724aSPaolo Bonzini     void call_rcu1(struct rcu_head * head,
89ac06724aSPaolo Bonzini                    void (*func)(struct rcu_head *head));
90ac06724aSPaolo Bonzini
91ac06724aSPaolo Bonzini        This function invokes func(head) after all pre-existing RCU
92ac06724aSPaolo Bonzini        read-side critical sections on all threads have completed.  This
93ac06724aSPaolo Bonzini        marks the end of the removal phase, with func taking care
94ac06724aSPaolo Bonzini        asynchronously of the reclamation phase.
95ac06724aSPaolo Bonzini
96ac06724aSPaolo Bonzini        The foo struct needs to have an rcu_head structure added,
97ac06724aSPaolo Bonzini        perhaps as follows:
98ac06724aSPaolo Bonzini
99ac06724aSPaolo Bonzini            struct foo {
100ac06724aSPaolo Bonzini                struct rcu_head rcu;
101ac06724aSPaolo Bonzini                int a;
102ac06724aSPaolo Bonzini                char b;
103ac06724aSPaolo Bonzini                long c;
104ac06724aSPaolo Bonzini            };
105ac06724aSPaolo Bonzini
106ac06724aSPaolo Bonzini        so that the reclaimer function can fetch the struct foo address
107ac06724aSPaolo Bonzini        and free it:
108ac06724aSPaolo Bonzini
109ac06724aSPaolo Bonzini            call_rcu1(&foo.rcu, foo_reclaim);
110ac06724aSPaolo Bonzini
111ac06724aSPaolo Bonzini            void foo_reclaim(struct rcu_head *rp)
112ac06724aSPaolo Bonzini            {
113ac06724aSPaolo Bonzini                struct foo *fp = container_of(rp, struct foo, rcu);
114ac06724aSPaolo Bonzini                g_free(fp);
115ac06724aSPaolo Bonzini            }
116ac06724aSPaolo Bonzini
117ac06724aSPaolo Bonzini        For the common case where the rcu_head member is the first of the
118ac06724aSPaolo Bonzini        struct, you can use the following macro.
119ac06724aSPaolo Bonzini
120ac06724aSPaolo Bonzini     void call_rcu(T *p,
121ac06724aSPaolo Bonzini                   void (*func)(T *p),
122ac06724aSPaolo Bonzini                   field-name);
123ac06724aSPaolo Bonzini     void g_free_rcu(T *p,
124ac06724aSPaolo Bonzini                     field-name);
125ac06724aSPaolo Bonzini
126ac06724aSPaolo Bonzini        call_rcu1 is typically used through these macro, in the common case
127ac06724aSPaolo Bonzini        where the "struct rcu_head" is the first field in the struct.  If
128ac06724aSPaolo Bonzini        the callback function is g_free, in particular, g_free_rcu can be
129ac06724aSPaolo Bonzini        used.  In the above case, one could have written simply:
130ac06724aSPaolo Bonzini
131ac06724aSPaolo Bonzini            g_free_rcu(&foo, rcu);
132ac06724aSPaolo Bonzini
133d73415a3SStefan Hajnoczi     typeof(*p) qatomic_rcu_read(p);
134ac06724aSPaolo Bonzini
135d73415a3SStefan Hajnoczi        qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
136ac06724aSPaolo Bonzini        some assumptions on the code that calls it.  This allows a more
137ac06724aSPaolo Bonzini        optimized implementation.
138ac06724aSPaolo Bonzini
139d73415a3SStefan Hajnoczi        qatomic_rcu_read assumes that whenever a single RCU critical
140ac06724aSPaolo Bonzini        section reads multiple shared data, these reads are either
141ac06724aSPaolo Bonzini        data-dependent or need no ordering.  This is almost always the
142ac06724aSPaolo Bonzini        case when using RCU, because read-side critical sections typically
143ac06724aSPaolo Bonzini        navigate one or more pointers (the pointers that are changed on
144ac06724aSPaolo Bonzini        every update) until reaching a data structure of interest,
145ac06724aSPaolo Bonzini        and then read from there.
146ac06724aSPaolo Bonzini
147d73415a3SStefan Hajnoczi        RCU read-side critical sections must use qatomic_rcu_read() to
148ac06724aSPaolo Bonzini        read data, unless concurrent writes are prevented by another
149ac06724aSPaolo Bonzini        synchronization mechanism.
150ac06724aSPaolo Bonzini
151ac06724aSPaolo Bonzini        Furthermore, RCU read-side critical sections should traverse the
152ac06724aSPaolo Bonzini        data structure in a single direction, opposite to the direction
153ac06724aSPaolo Bonzini        in which the updater initializes it.
154ac06724aSPaolo Bonzini
155d73415a3SStefan Hajnoczi     void qatomic_rcu_set(p, typeof(*p) v);
156ac06724aSPaolo Bonzini
157d73415a3SStefan Hajnoczi        qatomic_rcu_set() is similar to qatomic_store_release(), though it also
158ac06724aSPaolo Bonzini        makes assumptions on the code that calls it in order to allow a more
159ac06724aSPaolo Bonzini        optimized implementation.
160ac06724aSPaolo Bonzini
161d73415a3SStefan Hajnoczi        In particular, qatomic_rcu_set() suffices for synchronization
162ac06724aSPaolo Bonzini        with readers, if the updater never mutates a field within a
163ac06724aSPaolo Bonzini        data item that is already accessible to readers.  This is the
164ac06724aSPaolo Bonzini        case when initializing a new copy of the RCU-protected data
165ac06724aSPaolo Bonzini        structure; just ensure that initialization of *p is carried out
166d73415a3SStefan Hajnoczi        before qatomic_rcu_set() makes the data item visible to readers.
167ac06724aSPaolo Bonzini        If this rule is observed, writes will happen in the opposite
168ac06724aSPaolo Bonzini        order as reads in the RCU read-side critical sections (or if
169ac06724aSPaolo Bonzini        there is just one update), and there will be no need for other
170ac06724aSPaolo Bonzini        synchronization mechanism to coordinate the accesses.
171ac06724aSPaolo Bonzini
172ac06724aSPaolo BonziniThe following APIs must be used before RCU is used in a thread:
173ac06724aSPaolo Bonzini
174ac06724aSPaolo Bonzini     void rcu_register_thread(void);
175ac06724aSPaolo Bonzini
176ac06724aSPaolo Bonzini        Mark a thread as taking part in the RCU mechanism.  Such a thread
177ac06724aSPaolo Bonzini        will have to report quiescent points regularly, either manually
178ac06724aSPaolo Bonzini        or through the QemuCond/QemuSemaphore/QemuEvent APIs.
179ac06724aSPaolo Bonzini
180ac06724aSPaolo Bonzini     void rcu_unregister_thread(void);
181ac06724aSPaolo Bonzini
182ac06724aSPaolo Bonzini        Mark a thread as not taking part anymore in the RCU mechanism.
183ac06724aSPaolo Bonzini        It is not a problem if such a thread reports quiescent points,
184ac06724aSPaolo Bonzini        either manually or by using the QemuCond/QemuSemaphore/QemuEvent
185ac06724aSPaolo Bonzini        APIs.
186ac06724aSPaolo Bonzini
187ac06724aSPaolo BonziniNote that these APIs are relatively heavyweight, and should _not_ be
188ac06724aSPaolo Bonzininested.
189ac06724aSPaolo Bonzini
1905626f8c6SDr. David Alan GilbertConvenience macros
1915626f8c6SDr. David Alan Gilbert==================
1925626f8c6SDr. David Alan Gilbert
1935626f8c6SDr. David Alan GilbertTwo macros are provided that automatically release the read lock at the
1945626f8c6SDr. David Alan Gilbertend of the scope.
1955626f8c6SDr. David Alan Gilbert
1965626f8c6SDr. David Alan Gilbert      RCU_READ_LOCK_GUARD()
1975626f8c6SDr. David Alan Gilbert
1985626f8c6SDr. David Alan Gilbert         Takes the lock and will release it at the end of the block it's
1995626f8c6SDr. David Alan Gilbert         used in.
2005626f8c6SDr. David Alan Gilbert
2015626f8c6SDr. David Alan Gilbert      WITH_RCU_READ_LOCK_GUARD()  { code }
2025626f8c6SDr. David Alan Gilbert
2035626f8c6SDr. David Alan Gilbert         Is used at the head of a block to protect the code within the block.
2045626f8c6SDr. David Alan Gilbert
2055626f8c6SDr. David Alan GilbertNote that 'goto'ing out of the guarded block will also drop the lock.
206ac06724aSPaolo Bonzini
207ac06724aSPaolo BonziniDIFFERENCES WITH LINUX
208ac06724aSPaolo Bonzini======================
209ac06724aSPaolo Bonzini
210ac06724aSPaolo Bonzini- Waiting on a mutex is possible, though discouraged, within an RCU critical
211ac06724aSPaolo Bonzini  section.  This is because spinlocks are rarely (if ever) used in userspace
212ac06724aSPaolo Bonzini  programming; not allowing this would prevent upgrading an RCU read-side
213ac06724aSPaolo Bonzini  critical section to become an updater.
214ac06724aSPaolo Bonzini
215d73415a3SStefan Hajnoczi- qatomic_rcu_read and qatomic_rcu_set replace rcu_dereference and
216ac06724aSPaolo Bonzini  rcu_assign_pointer.  They take a _pointer_ to the variable being accessed.
217ac06724aSPaolo Bonzini
218ac06724aSPaolo Bonzini- call_rcu is a macro that has an extra argument (the name of the first
219ac06724aSPaolo Bonzini  field in the struct, which must be a struct rcu_head), and expects the
220ac06724aSPaolo Bonzini  type of the callback's argument to be the type of the first argument.
221ac06724aSPaolo Bonzini  call_rcu1 is the same as Linux's call_rcu.
222ac06724aSPaolo Bonzini
223ac06724aSPaolo Bonzini
224ac06724aSPaolo BonziniRCU PATTERNS
225ac06724aSPaolo Bonzini============
226ac06724aSPaolo Bonzini
227ac06724aSPaolo BonziniMany patterns using read-writer locks translate directly to RCU, with
228ac06724aSPaolo Bonzinithe advantages of higher scalability and deadlock immunity.
229ac06724aSPaolo Bonzini
230ac06724aSPaolo BonziniIn general, RCU can be used whenever it is possible to create a new
231ac06724aSPaolo Bonzini"version" of a data structure every time the updater runs.  This may
232ac06724aSPaolo Bonzinisound like a very strict restriction, however:
233ac06724aSPaolo Bonzini
234ac06724aSPaolo Bonzini- the updater does not mean "everything that writes to a data structure",
235ac06724aSPaolo Bonzini  but rather "everything that involves a reclamation step".  See the
236ac06724aSPaolo Bonzini  array example below
237ac06724aSPaolo Bonzini
238ac06724aSPaolo Bonzini- in some cases, creating a new version of a data structure may actually
239ac06724aSPaolo Bonzini  be very cheap.  For example, modifying the "next" pointer of a singly
240ac06724aSPaolo Bonzini  linked list is effectively creating a new version of the list.
241ac06724aSPaolo Bonzini
242ac06724aSPaolo BonziniHere are some frequently-used RCU idioms that are worth noting.
243ac06724aSPaolo Bonzini
244ac06724aSPaolo Bonzini
245ac06724aSPaolo BonziniRCU list processing
246ac06724aSPaolo Bonzini-------------------
247ac06724aSPaolo Bonzini
248ac06724aSPaolo BonziniTBD (not yet used in QEMU)
249ac06724aSPaolo Bonzini
250ac06724aSPaolo Bonzini
251ac06724aSPaolo BonziniRCU reference counting
252ac06724aSPaolo Bonzini----------------------
253ac06724aSPaolo Bonzini
254ac06724aSPaolo BonziniBecause grace periods are not allowed to complete while there is an RCU
255ac06724aSPaolo Bonziniread-side critical section in progress, the RCU read-side primitives
256ac06724aSPaolo Bonzinimay be used as a restricted reference-counting mechanism.  For example,
257ac06724aSPaolo Bonziniconsider the following code fragment:
258ac06724aSPaolo Bonzini
259ac06724aSPaolo Bonzini    rcu_read_lock();
260d73415a3SStefan Hajnoczi    p = qatomic_rcu_read(&foo);
261ac06724aSPaolo Bonzini    /* do something with p. */
262ac06724aSPaolo Bonzini    rcu_read_unlock();
263ac06724aSPaolo Bonzini
264ac06724aSPaolo BonziniThe RCU read-side critical section ensures that the value of "p" remains
265ac06724aSPaolo Bonzinivalid until after the rcu_read_unlock().  In some sense, it is acquiring
266ac06724aSPaolo Bonzinia reference to p that is later released when the critical section ends.
267ac06724aSPaolo BonziniThe write side looks simply like this (with appropriate locking):
268ac06724aSPaolo Bonzini
269ac06724aSPaolo Bonzini    qemu_mutex_lock(&foo_mutex);
270ac06724aSPaolo Bonzini    old = foo;
271d73415a3SStefan Hajnoczi    qatomic_rcu_set(&foo, new);
272ac06724aSPaolo Bonzini    qemu_mutex_unlock(&foo_mutex);
273ac06724aSPaolo Bonzini    synchronize_rcu();
274ac06724aSPaolo Bonzini    free(old);
275ac06724aSPaolo Bonzini
276ac06724aSPaolo BonziniIf the processing cannot be done purely within the critical section, it
277ac06724aSPaolo Bonziniis possible to combine this idiom with a "real" reference count:
278ac06724aSPaolo Bonzini
279ac06724aSPaolo Bonzini    rcu_read_lock();
280d73415a3SStefan Hajnoczi    p = qatomic_rcu_read(&foo);
281ac06724aSPaolo Bonzini    foo_ref(p);
282ac06724aSPaolo Bonzini    rcu_read_unlock();
283ac06724aSPaolo Bonzini    /* do something with p. */
284ac06724aSPaolo Bonzini    foo_unref(p);
285ac06724aSPaolo Bonzini
286ac06724aSPaolo BonziniThe write side can be like this:
287ac06724aSPaolo Bonzini
288ac06724aSPaolo Bonzini    qemu_mutex_lock(&foo_mutex);
289ac06724aSPaolo Bonzini    old = foo;
290d73415a3SStefan Hajnoczi    qatomic_rcu_set(&foo, new);
291ac06724aSPaolo Bonzini    qemu_mutex_unlock(&foo_mutex);
292ac06724aSPaolo Bonzini    synchronize_rcu();
293ac06724aSPaolo Bonzini    foo_unref(old);
294ac06724aSPaolo Bonzini
295ac06724aSPaolo Bonzinior with call_rcu:
296ac06724aSPaolo Bonzini
297ac06724aSPaolo Bonzini    qemu_mutex_lock(&foo_mutex);
298ac06724aSPaolo Bonzini    old = foo;
299d73415a3SStefan Hajnoczi    qatomic_rcu_set(&foo, new);
300ac06724aSPaolo Bonzini    qemu_mutex_unlock(&foo_mutex);
301ac06724aSPaolo Bonzini    call_rcu(foo_unref, old, rcu);
302ac06724aSPaolo Bonzini
303ac06724aSPaolo BonziniIn both cases, the write side only performs removal.  Reclamation
304ac06724aSPaolo Bonzinihappens when the last reference to a "foo" object is dropped.
305ac06724aSPaolo BonziniUsing synchronize_rcu() is undesirably expensive, because the
306ac06724aSPaolo Bonzinilast reference may be dropped on the read side.  Hence you can
307ac06724aSPaolo Bonziniuse call_rcu() instead:
308ac06724aSPaolo Bonzini
309ac06724aSPaolo Bonzini     foo_unref(struct foo *p) {
310d73415a3SStefan Hajnoczi        if (qatomic_fetch_dec(&p->refcount) == 1) {
311ac06724aSPaolo Bonzini            call_rcu(foo_destroy, p, rcu);
312ac06724aSPaolo Bonzini        }
313ac06724aSPaolo Bonzini    }
314ac06724aSPaolo Bonzini
315ac06724aSPaolo Bonzini
316ac06724aSPaolo BonziniNote that the same idioms would be possible with reader/writer
317ac06724aSPaolo Bonzinilocks:
318ac06724aSPaolo Bonzini
319ac06724aSPaolo Bonzini    read_lock(&foo_rwlock);         write_mutex_lock(&foo_rwlock);
320ac06724aSPaolo Bonzini    p = foo;                        p = foo;
321ac06724aSPaolo Bonzini    /* do something with p. */      foo = new;
322ac06724aSPaolo Bonzini    read_unlock(&foo_rwlock);       free(p);
323ac06724aSPaolo Bonzini                                    write_mutex_unlock(&foo_rwlock);
324ac06724aSPaolo Bonzini                                    free(p);
325ac06724aSPaolo Bonzini
326ac06724aSPaolo Bonzini    ------------------------------------------------------------------
327ac06724aSPaolo Bonzini
328ac06724aSPaolo Bonzini    read_lock(&foo_rwlock);         write_mutex_lock(&foo_rwlock);
329ac06724aSPaolo Bonzini    p = foo;                        old = foo;
330ac06724aSPaolo Bonzini    foo_ref(p);                     foo = new;
331ac06724aSPaolo Bonzini    read_unlock(&foo_rwlock);       foo_unref(old);
332ac06724aSPaolo Bonzini    /* do something with p. */      write_mutex_unlock(&foo_rwlock);
333ac06724aSPaolo Bonzini    read_lock(&foo_rwlock);
334ac06724aSPaolo Bonzini    foo_unref(p);
335ac06724aSPaolo Bonzini    read_unlock(&foo_rwlock);
336ac06724aSPaolo Bonzini
337ac06724aSPaolo Bonzinifoo_unref could use a mechanism such as bottom halves to move deallocation
338ac06724aSPaolo Bonziniout of the write-side critical section.
339ac06724aSPaolo Bonzini
340ac06724aSPaolo Bonzini
341ac06724aSPaolo BonziniRCU resizable arrays
342ac06724aSPaolo Bonzini--------------------
343ac06724aSPaolo Bonzini
344ac06724aSPaolo BonziniResizable arrays can be used with RCU.  The expensive RCU synchronization
345ac06724aSPaolo Bonzini(or call_rcu) only needs to take place when the array is resized.
346ac06724aSPaolo BonziniThe two items to take care of are:
347ac06724aSPaolo Bonzini
348ac06724aSPaolo Bonzini- ensuring that the old version of the array is available between removal
349ac06724aSPaolo Bonzini  and reclamation;
350ac06724aSPaolo Bonzini
351ac06724aSPaolo Bonzini- avoiding mismatches in the read side between the array data and the
352ac06724aSPaolo Bonzini  array size.
353ac06724aSPaolo Bonzini
354ac06724aSPaolo BonziniThe first problem is avoided simply by not using realloc.  Instead,
355ac06724aSPaolo Bonzinieach resize will allocate a new array and copy the old data into it.
356ac06724aSPaolo BonziniThe second problem would arise if the size and the data pointers were
357ac06724aSPaolo Bonzinitwo members of a larger struct:
358ac06724aSPaolo Bonzini
359ac06724aSPaolo Bonzini    struct mystuff {
360ac06724aSPaolo Bonzini        ...
361ac06724aSPaolo Bonzini        int data_size;
362ac06724aSPaolo Bonzini        int data_alloc;
363ac06724aSPaolo Bonzini        T   *data;
364ac06724aSPaolo Bonzini        ...
365ac06724aSPaolo Bonzini    };
366ac06724aSPaolo Bonzini
367ac06724aSPaolo BonziniInstead, we store the size of the array with the array itself:
368ac06724aSPaolo Bonzini
369ac06724aSPaolo Bonzini    struct arr {
370ac06724aSPaolo Bonzini        int size;
371ac06724aSPaolo Bonzini        int alloc;
372ac06724aSPaolo Bonzini        T   data[];
373ac06724aSPaolo Bonzini    };
374ac06724aSPaolo Bonzini    struct arr *global_array;
375ac06724aSPaolo Bonzini
376ac06724aSPaolo Bonzini    read side:
377ac06724aSPaolo Bonzini        rcu_read_lock();
378d73415a3SStefan Hajnoczi        struct arr *array = qatomic_rcu_read(&global_array);
379ac06724aSPaolo Bonzini        x = i < array->size ? array->data[i] : -1;
380ac06724aSPaolo Bonzini        rcu_read_unlock();
381ac06724aSPaolo Bonzini        return x;
382ac06724aSPaolo Bonzini
383ac06724aSPaolo Bonzini    write side (running under a lock):
384ac06724aSPaolo Bonzini        if (global_array->size == global_array->alloc) {
385ac06724aSPaolo Bonzini            /* Creating a new version.  */
386ac06724aSPaolo Bonzini            new_array = g_malloc(sizeof(struct arr) +
387ac06724aSPaolo Bonzini                                 global_array->alloc * 2 * sizeof(T));
388ac06724aSPaolo Bonzini            new_array->size = global_array->size;
389ac06724aSPaolo Bonzini            new_array->alloc = global_array->alloc * 2;
390ac06724aSPaolo Bonzini            memcpy(new_array->data, global_array->data,
391ac06724aSPaolo Bonzini                   global_array->alloc * sizeof(T));
392ac06724aSPaolo Bonzini
393ac06724aSPaolo Bonzini            /* Removal phase.  */
394ac06724aSPaolo Bonzini            old_array = global_array;
395*d533d635SKeqian Zhu            qatomic_rcu_set(&global_array, new_array);
396ac06724aSPaolo Bonzini            synchronize_rcu();
397ac06724aSPaolo Bonzini
398ac06724aSPaolo Bonzini            /* Reclamation phase.  */
399ac06724aSPaolo Bonzini            free(old_array);
400ac06724aSPaolo Bonzini        }
401ac06724aSPaolo Bonzini
402ac06724aSPaolo Bonzini
403ac06724aSPaolo BonziniSOURCES
404ac06724aSPaolo Bonzini=======
405ac06724aSPaolo Bonzini
406ac06724aSPaolo Bonzini* Documentation/RCU/ from the Linux kernel
407