1Copyright (c) 2014-2017 Red Hat Inc. 2 3This work is licensed under the terms of the GNU GPL, version 2 or later. See 4the COPYING file in the top-level directory. 5 6 7This document explains the IOThread feature and how to write code that runs 8outside the BQL. 9 10The main loop and IOThreads 11--------------------------- 12QEMU is an event-driven program that can do several things at once using an 13event loop. The VNC server and the QMP monitor are both processed from the 14same event loop, which monitors their file descriptors until they become 15readable and then invokes a callback. 16 17The default event loop is called the main loop (see main-loop.c). It is 18possible to create additional event loop threads using -object 19iothread,id=my-iothread. 20 21Side note: The main loop and IOThread are both event loops but their code is 22not shared completely. Sometimes it is useful to remember that although they 23are conceptually similar they are currently not interchangeable. 24 25Why IOThreads are useful 26------------------------ 27IOThreads allow the user to control the placement of work. The main loop is a 28scalability bottleneck on hosts with many CPUs. Work can be spread across 29several IOThreads instead of just one main loop. When set up correctly this 30can improve I/O latency and reduce jitter seen by the guest. 31 32The main loop is also deeply associated with the BQL, which is a 33scalability bottleneck in itself. vCPU threads and the main loop use the BQL 34to serialize execution of QEMU code. This mutex is necessary because a lot of 35QEMU's code historically was not thread-safe. 36 37The fact that all I/O processing is done in a single main loop and that the 38BQL is contended by all vCPU threads and the main loop explain 39why it is desirable to place work into IOThreads. 40 41The experimental virtio-blk data-plane implementation has been benchmarked and 42shows these effects: 43ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf 44 45How to program for IOThreads 46---------------------------- 47The main difference between legacy code and new code that can run in an 48IOThread is dealing explicitly with the event loop object, AioContext 49(see include/block/aio.h). Code that only works in the main loop 50implicitly uses the main loop's AioContext. Code that supports running 51in IOThreads must be aware of its AioContext. 52 53AioContext supports the following services: 54 * File descriptor monitoring (read/write/error on POSIX hosts) 55 * Event notifiers (inter-thread signalling) 56 * Timers 57 * Bottom Halves (BH) deferred callbacks 58 59There are several old APIs that use the main loop AioContext: 60 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor 61 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier 62 * LEGACY timer_new_ms() - create a timer 63 * LEGACY qemu_bh_new() - create a BH 64 * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard 65 * LEGACY qemu_aio_wait() - run an event loop iteration 66 67Since they implicitly work on the main loop they cannot be used in code that 68runs in an IOThread. They might cause a crash or deadlock if called from an 69IOThread since the BQL is not held. 70 71Instead, use the AioContext functions directly (see include/block/aio.h): 72 * aio_set_fd_handler() - monitor a file descriptor 73 * aio_set_event_notifier() - monitor an event notifier 74 * aio_timer_new() - create a timer 75 * aio_bh_new() - create a BH 76 * aio_bh_new_guarded() - create a BH with a device re-entrancy guard 77 * aio_poll() - run an event loop iteration 78 79The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard" 80argument, which is used to check for and prevent re-entrancy problems. For 81BHs associated with devices, the reentrancy-guard is contained in the 82corresponding DeviceState and named "mem_reentrancy_guard". 83 84The AioContext can be obtained from the IOThread using 85iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). 86Code that takes an AioContext argument works both in IOThreads or the main 87loop, depending on which AioContext instance the caller passes in. 88 89How to synchronize with an IOThread 90----------------------------------- 91Variables that can be accessed by multiple threads require some form of 92synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc. 93 94AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(), 95aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger 96activity in an IOThread. 97 98Side note: the best way to schedule a function call across threads is to call 99aio_bh_schedule_oneshot(). 100 101The main loop thread can wait synchronously for a condition using 102AIO_WAIT_WHILE(). 103 104AioContext and the block layer 105------------------------------ 106The AioContext originates from the QEMU block layer, even though nowadays 107AioContext is a generic event loop that can be used by any QEMU subsystem. 108 109The block layer has support for AioContext integrated. Each BlockDriverState 110is associated with an AioContext using bdrv_try_change_aio_context() and 111bdrv_get_aio_context(). This allows block layer code to process I/O inside the 112right AioContext. Other subsystems may wish to follow a similar approach. 113 114Block layer code must therefore expect to run in an IOThread and avoid using 115old APIs that implicitly use the main loop. See the "How to program for 116IOThreads" above for information on how to do that. 117 118Code running in the monitor typically needs to ensure that past 119requests from the guest are completed. When a block device is running 120in an IOThread, the IOThread can also process requests from the guest 121(via ioeventfd). To achieve both objects, wrap the code between 122bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained 123section". 124 125Long-running jobs (usually in the form of coroutines) are often scheduled in 126the BlockDriverState's AioContext. The functions 127bdrv_add/remove_aio_context_notifier, or alternatively 128blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to 129get a notification whenever bdrv_try_change_aio_context() moves a 130BlockDriverState to a different AioContext. 131