xref: /openbmc/qemu/docs/devel/multiple-iothreads.rst (revision f774a677507966222624a9b2859f06ede7608100)
1*4f0b3e0bSPeter MaydellUsing Multiple ``IOThread``\ s
2*4f0b3e0bSPeter Maydell==============================
3*4f0b3e0bSPeter Maydell
4*4f0b3e0bSPeter Maydell..
5*4f0b3e0bSPeter Maydell   Copyright (c) 2014-2017 Red Hat Inc.
6*4f0b3e0bSPeter Maydell
7*4f0b3e0bSPeter Maydell   This work is licensed under the terms of the GNU GPL, version 2 or later.  See
8*4f0b3e0bSPeter Maydell   the COPYING file in the top-level directory.
9*4f0b3e0bSPeter Maydell
10*4f0b3e0bSPeter Maydell
11*4f0b3e0bSPeter MaydellThis document explains the ``IOThread`` feature and how to write code that runs
12*4f0b3e0bSPeter Maydelloutside the BQL.
13*4f0b3e0bSPeter Maydell
14*4f0b3e0bSPeter MaydellThe main loop and ``IOThread``\ s
15*4f0b3e0bSPeter Maydell---------------------------------
16*4f0b3e0bSPeter MaydellQEMU is an event-driven program that can do several things at once using an
17*4f0b3e0bSPeter Maydellevent loop.  The VNC server and the QMP monitor are both processed from the
18*4f0b3e0bSPeter Maydellsame event loop, which monitors their file descriptors until they become
19*4f0b3e0bSPeter Maydellreadable and then invokes a callback.
20*4f0b3e0bSPeter Maydell
21*4f0b3e0bSPeter MaydellThe default event loop is called the main loop (see ``main-loop.c``).  It is
22*4f0b3e0bSPeter Maydellpossible to create additional event loop threads using
23*4f0b3e0bSPeter Maydell``-object iothread,id=my-iothread``.
24*4f0b3e0bSPeter Maydell
25*4f0b3e0bSPeter MaydellSide note: The main loop and ``IOThread`` are both event loops but their code is
26*4f0b3e0bSPeter Maydellnot shared completely.  Sometimes it is useful to remember that although they
27*4f0b3e0bSPeter Maydellare conceptually similar they are currently not interchangeable.
28*4f0b3e0bSPeter Maydell
29*4f0b3e0bSPeter MaydellWhy ``IOThread``\ s are useful
30*4f0b3e0bSPeter Maydell------------------------------
31*4f0b3e0bSPeter Maydell``IOThread``\ s allow the user to control the placement of work.  The main loop is a
32*4f0b3e0bSPeter Maydellscalability bottleneck on hosts with many CPUs.  Work can be spread across
33*4f0b3e0bSPeter Maydellseveral ``IOThread``\ s instead of just one main loop.  When set up correctly this
34*4f0b3e0bSPeter Maydellcan improve I/O latency and reduce jitter seen by the guest.
35*4f0b3e0bSPeter Maydell
36*4f0b3e0bSPeter MaydellThe main loop is also deeply associated with the BQL, which is a
37*4f0b3e0bSPeter Maydellscalability bottleneck in itself.  vCPU threads and the main loop use the BQL
38*4f0b3e0bSPeter Maydellto serialize execution of QEMU code.  This mutex is necessary because a lot of
39*4f0b3e0bSPeter MaydellQEMU's code historically was not thread-safe.
40*4f0b3e0bSPeter Maydell
41*4f0b3e0bSPeter MaydellThe fact that all I/O processing is done in a single main loop and that the
42*4f0b3e0bSPeter MaydellBQL is contended by all vCPU threads and the main loop explain
43*4f0b3e0bSPeter Maydellwhy it is desirable to place work into ``IOThread``\ s.
44*4f0b3e0bSPeter Maydell
45*4f0b3e0bSPeter MaydellThe experimental ``virtio-blk`` data-plane implementation has been benchmarked and
46*4f0b3e0bSPeter Maydellshows these effects:
47*4f0b3e0bSPeter Maydellftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
48*4f0b3e0bSPeter Maydell
49*4f0b3e0bSPeter Maydell.. _how-to-program:
50*4f0b3e0bSPeter Maydell
51*4f0b3e0bSPeter MaydellHow to program for ``IOThread``\ s
52*4f0b3e0bSPeter Maydell----------------------------------
53*4f0b3e0bSPeter MaydellThe main difference between legacy code and new code that can run in an
54*4f0b3e0bSPeter Maydell``IOThread`` is dealing explicitly with the event loop object, ``AioContext``
55*4f0b3e0bSPeter Maydell(see ``include/block/aio.h``).  Code that only works in the main loop
56*4f0b3e0bSPeter Maydellimplicitly uses the main loop's ``AioContext``.  Code that supports running
57*4f0b3e0bSPeter Maydellin ``IOThread``\ s must be aware of its ``AioContext``.
58*4f0b3e0bSPeter Maydell
59*4f0b3e0bSPeter MaydellAioContext supports the following services:
60*4f0b3e0bSPeter Maydell * File descriptor monitoring (read/write/error on POSIX hosts)
61*4f0b3e0bSPeter Maydell * Event notifiers (inter-thread signalling)
62*4f0b3e0bSPeter Maydell * Timers
63*4f0b3e0bSPeter Maydell * Bottom Halves (BH) deferred callbacks
64*4f0b3e0bSPeter Maydell
65*4f0b3e0bSPeter MaydellThere are several old APIs that use the main loop AioContext:
66*4f0b3e0bSPeter Maydell * LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor
67*4f0b3e0bSPeter Maydell * LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier
68*4f0b3e0bSPeter Maydell * LEGACY ``timer_new_ms()`` - create a timer
69*4f0b3e0bSPeter Maydell * LEGACY ``qemu_bh_new()`` - create a BH
70*4f0b3e0bSPeter Maydell * LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard
71*4f0b3e0bSPeter Maydell * LEGACY ``qemu_aio_wait()`` - run an event loop iteration
72*4f0b3e0bSPeter Maydell
73*4f0b3e0bSPeter MaydellSince they implicitly work on the main loop they cannot be used in code that
74*4f0b3e0bSPeter Maydellruns in an ``IOThread``.  They might cause a crash or deadlock if called from an
75*4f0b3e0bSPeter Maydell``IOThread`` since the BQL is not held.
76*4f0b3e0bSPeter Maydell
77*4f0b3e0bSPeter MaydellInstead, use the ``AioContext`` functions directly (see ``include/block/aio.h``):
78*4f0b3e0bSPeter Maydell * ``aio_set_fd_handler()`` - monitor a file descriptor
79*4f0b3e0bSPeter Maydell * ``aio_set_event_notifier()`` - monitor an event notifier
80*4f0b3e0bSPeter Maydell * ``aio_timer_new()`` - create a timer
81*4f0b3e0bSPeter Maydell * ``aio_bh_new()`` - create a BH
82*4f0b3e0bSPeter Maydell * ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard
83*4f0b3e0bSPeter Maydell * ``aio_poll()`` - run an event loop iteration
84*4f0b3e0bSPeter Maydell
85*4f0b3e0bSPeter MaydellThe ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a
86*4f0b3e0bSPeter Maydell``MemReentrancyGuard``
87*4f0b3e0bSPeter Maydellargument, which is used to check for and prevent re-entrancy problems. For
88*4f0b3e0bSPeter MaydellBHs associated with devices, the reentrancy-guard is contained in the
89*4f0b3e0bSPeter Maydellcorresponding ``DeviceState`` and named ``mem_reentrancy_guard``.
90*4f0b3e0bSPeter Maydell
91*4f0b3e0bSPeter MaydellThe ``AioContext`` can be obtained from the ``IOThread`` using
92*4f0b3e0bSPeter Maydell``iothread_get_aio_context()`` or for the main loop using
93*4f0b3e0bSPeter Maydell``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument
94*4f0b3e0bSPeter Maydellworks both in ``IOThread``\ s or the main loop, depending on which ``AioContext``
95*4f0b3e0bSPeter Maydellinstance the caller passes in.
96*4f0b3e0bSPeter Maydell
97*4f0b3e0bSPeter MaydellHow to synchronize with an ``IOThread``
98*4f0b3e0bSPeter Maydell---------------------------------------
99*4f0b3e0bSPeter MaydellVariables that can be accessed by multiple threads require some form of
100*4f0b3e0bSPeter Maydellsynchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc.
101*4f0b3e0bSPeter Maydell
102*4f0b3e0bSPeter Maydell``AioContext`` functions like ``aio_set_fd_handler()``,
103*4f0b3e0bSPeter Maydell``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()``
104*4f0b3e0bSPeter Maydellare thread-safe. They can be used to trigger activity in an ``IOThread``.
105*4f0b3e0bSPeter Maydell
106*4f0b3e0bSPeter MaydellSide note: the best way to schedule a function call across threads is to call
107*4f0b3e0bSPeter Maydell``aio_bh_schedule_oneshot()``.
108*4f0b3e0bSPeter Maydell
109*4f0b3e0bSPeter MaydellThe main loop thread can wait synchronously for a condition using
110*4f0b3e0bSPeter Maydell``AIO_WAIT_WHILE()``.
111*4f0b3e0bSPeter Maydell
112*4f0b3e0bSPeter Maydell``AioContext`` and the block layer
113*4f0b3e0bSPeter Maydell----------------------------------
114*4f0b3e0bSPeter MaydellThe ``AioContext`` originates from the QEMU block layer, even though nowadays
115*4f0b3e0bSPeter Maydell``AioContext`` is a generic event loop that can be used by any QEMU subsystem.
116*4f0b3e0bSPeter Maydell
117*4f0b3e0bSPeter MaydellThe block layer has support for ``AioContext`` integrated.  Each
118*4f0b3e0bSPeter Maydell``BlockDriverState`` is associated with an ``AioContext`` using
119*4f0b3e0bSPeter Maydell``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``.
120*4f0b3e0bSPeter MaydellThis allows block layer code to process I/O inside the
121*4f0b3e0bSPeter Maydellright ``AioContext``.  Other subsystems may wish to follow a similar approach.
122*4f0b3e0bSPeter Maydell
123*4f0b3e0bSPeter MaydellBlock layer code must therefore expect to run in an ``IOThread`` and avoid using
124*4f0b3e0bSPeter Maydellold APIs that implicitly use the main loop.  See
125*4f0b3e0bSPeter Maydell`How to program for IOThreads`_ for information on how to do that.
126*4f0b3e0bSPeter Maydell
127*4f0b3e0bSPeter MaydellCode running in the monitor typically needs to ensure that past
128*4f0b3e0bSPeter Maydellrequests from the guest are completed.  When a block device is running
129*4f0b3e0bSPeter Maydellin an ``IOThread``, the ``IOThread`` can also process requests from the guest
130*4f0b3e0bSPeter Maydell(via ioeventfd).  To achieve both objects, wrap the code between
131*4f0b3e0bSPeter Maydell``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained
132*4f0b3e0bSPeter Maydellsection".
133*4f0b3e0bSPeter Maydell
134*4f0b3e0bSPeter MaydellLong-running jobs (usually in the form of coroutines) are often scheduled in
135*4f0b3e0bSPeter Maydellthe ``BlockDriverState``'s ``AioContext``.  The functions
136*4f0b3e0bSPeter Maydell``bdrv_add``/``remove_aio_context_notifier``, or alternatively
137*4f0b3e0bSPeter Maydell``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``,
138*4f0b3e0bSPeter Maydellcan be used to get a notification whenever ``bdrv_try_change_aio_context()``
139*4f0b3e0bSPeter Maydellmoves a ``BlockDriverState`` to a different ``AioContext``.
140