| d45b2c65 | 10-Nov-2025 |
Hanna Czenczek <hreitz@redhat.com> |
block: Note in which AioContext AIO CBs are called
This doesn’t seem to be specified anywhere, but is something we probably want to be clear. I believe it is reasonable to implicitly assume that ca
block: Note in which AioContext AIO CBs are called
This doesn’t seem to be specified anywhere, but is something we probably want to be clear. I believe it is reasonable to implicitly assume that callbacks are run in the current thread (unless explicitly noted otherwise), so codify that assumption.
Some implementations don’t actually fulfill this contract yet. The next patches should rectify that.
Note: I don’t know of any user-visible bugs produced by not running AIO callbacks in the original context. AIO functionality is generally mapped to coroutines through the use of bdrv_co_io_em_complete(), which can run in any AioContext, and will always wake the yielding coroutine in its original context. The only benefit here is that running bdrv_co_io_em_complete() in the original context will make that aio_co_wake() most likely a simpler qemu_coroutine_enter() instead of scheduling the wakeup through AioContext.co_schedule_bh.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20251110154854.151484-17-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 9730b997 | 07-Oct-2025 |
Yeqi Fu <fufuyqqqqqq@gmail.com> |
block: replace TABs with space
Bring the block files in line with the QEMU coding style, with spaces for indentation. This patch partially resolves the issue 371.
Resolves: https://gitlab.com/qemu-
block: replace TABs with space
Bring the block files in line with the QEMU coding style, with spaces for indentation. This patch partially resolves the issue 371.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/371 Signed-off-by: Yeqi Fu <fufuyqqqqqq@gmail.com> Message-ID: <20230325085224.23842-1-fufuyqqqqqq@gmail.com> [thuth: Rebased the patch to the current master branch] Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20251007163511.334178-1-thuth@redhat.com> [kwolf: Fixed up vertical alignemnt] Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 047dabef | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
block/io_uring: use aio_add_sqe()
AioContext has its own io_uring instance for file descriptor monitoring. The disk I/O io_uring code was developed separately. Originally I thought the characteristi
block/io_uring: use aio_add_sqe()
AioContext has its own io_uring instance for file descriptor monitoring. The disk I/O io_uring code was developed separately. Originally I thought the characteristics of file descriptor monitoring and disk I/O were too different, requiring separate io_uring instances.
Now it has become clear to me that it's feasible to share a single io_uring instance for file descriptor monitoring and disk I/O. We're not using io_uring's IOPOLL feature or anything else that would require a separate instance.
Unify block/io_uring.c and util/fdmon-io_uring.c using the new aio_add_sqe() API that allows user-defined io_uring sqe submission. Now block/io_uring.c just needs to submit readv/writev/fsync and most of the io_uring-specific logic is handled by fdmon-io_uring.c.
There are two immediate advantages: 1. Fewer system calls. There is no need to monitor the disk I/O io_uring ring fd from the file descriptor monitoring io_uring instance. Disk I/O completions are now picked up directly. Also, sqes are accumulated in the sq ring until the end of the event loop iteration and there are fewer io_uring_enter(2) syscalls. 2. Less code duplication.
Note that error_setg() messages are not supposed to end with punctuation, so I removed a '.' for the non-io_uring build error message.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-15-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 1eebdab3 | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Introduce the aio_add_sqe() API for submitting io_uring requests in the current AioContext. This allows other components in QEMU,
aio-posix: add aio_add_sqe() API for user-defined io_uring requests
Introduce the aio_add_sqe() API for submitting io_uring requests in the current AioContext. This allows other components in QEMU, like the block layer, to take advantage of io_uring features without creating their own io_uring context.
This API supports nested event loops just like file descriptor monitoring and BHs do. This comes at a complexity cost: CQE callbacks must be placed on a list so that nested event loops can invoke pending CQE callbacks from parent event loops. If you're wondering why CqeHandler exists instead of just a callback function pointer, this is why.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-14-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 87e7a0f4 | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
aio-posix: add fdmon_ops->dispatch()
The ppoll and epoll file descriptor monitoring implementations rely on the event loop's generic file descriptor, timer, and BH dispatch code to invoke user callb
aio-posix: add fdmon_ops->dispatch()
The ppoll and epoll file descriptor monitoring implementations rely on the event loop's generic file descriptor, timer, and BH dispatch code to invoke user callbacks.
The io_uring file descriptor monitoring implementation will need io_uring-specific dispatch logic for CQE handlers for custom SQEs.
Introduce a new FDMonOps ->dispatch() callback that allows file descriptor monitoring implementations to invoke user callbacks. The next patch will use this new callback.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20251104022933.618123-13-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 421dcc80 | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
aio: add errp argument to aio_context_setup()
When aio_context_new() -> aio_context_setup() fails at startup it doesn't really matter whether errors are returned to the caller or the process termina
aio: add errp argument to aio_context_setup()
When aio_context_new() -> aio_context_setup() fails at startup it doesn't really matter whether errors are returned to the caller or the process terminates immediately.
However, it is not acceptable to terminate when hotplugging --object iothread at runtime. Refactor aio_context_setup() so that errors can be propagated. The next commit will set errp when fdmon_io_uring_setup() fails.
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 3769b9ab | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
aio: free AioContext when aio_context_new() fails
g_source_destroy() only removes the GSource from the GMainContext it's attached to, if any. It does not free it.
Use g_source_unref() instead so th
aio: free AioContext when aio_context_new() fails
g_source_destroy() only removes the GSource from the GMainContext it's attached to, if any. It does not free it.
Use g_source_unref() instead so that the AioContext (which embeds a GSource) is freed. There is no need to call g_source_destroy() in aio_context_new() because the GSource isn't attached to a GMainContext yet.
aio_ctx_finalize() expects everything to be set up already, so introduce the new ctx->initialized boolean and do nothing when called with !initialized. This also requires moving aio_context_setup() down after event_notifier_init() since aio_ctx_finalize() won't release any resources that aio_context_setup() acquired.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20251104022933.618123-9-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| d1f42b60 | 03-Nov-2025 |
Stefan Hajnoczi <stefanha@redhat.com> |
aio: remove aio_context_use_g_source()
There is no need for aio_context_use_g_source() now that epoll(7) and io_uring(7) file descriptor monitoring works with the glib event loop. AioContext doesn't
aio: remove aio_context_use_g_source()
There is no need for aio_context_use_g_source() now that epoll(7) and io_uring(7) file descriptor monitoring works with the glib event loop. AioContext doesn't need to be notified that GSource is being used.
On hosts with io_uring support this now enables fdmon-io_uring.c by default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the event loop will use io_uring!
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251104022933.618123-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 5b4b3bfd | 24-Oct-2025 |
Kevin Wolf <kwolf@redhat.com> |
qemu-img info: Optionally show block limits
Add a new --limits option to 'qemu-img info' that displays the block limits for the image and all of its children, making the information more accessible
qemu-img info: Optionally show block limits
Add a new --limits option to 'qemu-img info' that displays the block limits for the image and all of its children, making the information more accessible for human users than in QMP. This option is not enabled by default because it can be a lot of output that isn't usually relevant if you're not specifically trying to diagnose some I/O problem.
This makes the same information automatically also available in HMP 'info block -v'.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20251024123041.51254-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 46dd683d | 24-Oct-2025 |
Kevin Wolf <kwolf@redhat.com> |
block: Improve comments in BlockLimits
Patches to expose the limits in QAPI have made clear that the existing documentation of BlockLimits could be improved: The meaning of min_mem_alignment and opt
block: Improve comments in BlockLimits
Patches to expose the limits in QAPI have made clear that the existing documentation of BlockLimits could be improved: The meaning of min_mem_alignment and opt_mem_alignment could be clearer, and talking about better alignment values isn't helpful when we only detect these values and never choose them.
Make the changes in the BlockLimits documentation now, so that the patches exposing the fields in QAPI can use descriptions consistent with it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20251024123041.51254-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| cbadaf57 | 17-Sep-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: implement 'resize' callback for child_of_bds class
If a filtered child is resized, the size of the parent node is now also refreshed (recursively for chains of filtered children).
For filter
block: implement 'resize' callback for child_of_bds class
If a filtered child is resized, the size of the parent node is now also refreshed (recursively for chains of filtered children).
For filter block drivers that do not implement .bdrv_co_getlength(), this commit does not change the current behavior, because bdrv_co_refresh_total_sectors() will used the current size via the passed-in hint. This is the case for block drivers for (some) block jobs, as well as copy-before-write.
Block jobs already set up a blocker preventing a QMP block_resize operation while the job is running. That does not directly cover an associated 'file' node of a 'raw' node, but resizing such a 'file' node is already prevented too (backup, commit, mirror and stream were checked).
The other case is copy-before-write. This commit does not change the fact that the copy-before-write node still has the same size after its filtered child is resized.
Block drivers that do implement .bdrv_co_getlength() and where .is_filter is true, already returned the length of the file child, so there is no change before and after this commit, with two exceptions: 1. preallocate can return an early data_end and otherwise queries the file child, but that special casing is not changed. 2. blkverify returns the length of the test file. This commit does not affect that behavior.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250917115509.401015-4-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 08736e75 | 17-Sep-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: make bdrv_co_parent_cb_resize() a proper IO API function
In preparation for calling it via the bdrv_child_cb_resize() callback that will be added by the next commit. Rename it to include the
block: make bdrv_co_parent_cb_resize() a proper IO API function
In preparation for calling it via the bdrv_child_cb_resize() callback that will be added by the next commit. Rename it to include the "_co_" part while at it.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20250917115509.401015-3-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 41203754 | 17-Sep-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
include/block/block_int-common: document when resize callback is used
The 'resize' callback is only called by bdrv_parent_cb_resize() which is only called by bdrv_co_write_req_finish() to notify the
include/block/block_int-common: document when resize callback is used
The 'resize' callback is only called by bdrv_parent_cb_resize() which is only called by bdrv_co_write_req_finish() to notify the parent(s) that the child was resized.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20250917115509.401015-2-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| a256a427 | 30-May-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
blockjob: mark block_job_remove_all_bdrv() as GRAPH_UNLOCKED
The function block_job_remove_all_bdrv() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked.
Signed-off-by:
blockjob: mark block_job_remove_all_bdrv() as GRAPH_UNLOCKED
The function block_job_remove_all_bdrv() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-49-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 2cf92b15 | 30-May-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: mark bdrv_open_child_common() and its callers GRAPH_UNLOCKED
The function bdrv_open_child_common() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. Mark it and
block: mark bdrv_open_child_common() and its callers GRAPH_UNLOCKED
The function bdrv_open_child_common() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. Mark it and its two callers bdrv_open_file_child() and bdrv_open_child() as GRAPH_UNLOCKED. This requires temporarily unlocking in vmdk_parse_extents() and making the locked section shorter in vmdk_open().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-48-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| ede08593 | 30-May-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: mark bdrv_close() as GRAPH_UNLOCKED
The functions blk_log_writes_close(), blkverify_close(), quorum_close(), vmdk_close() via vmdk_free_extents(), and other bdrv_close() implementations call
block: mark bdrv_close() as GRAPH_UNLOCKED
The functions blk_log_writes_close(), blkverify_close(), quorum_close(), vmdk_close() via vmdk_free_extents(), and other bdrv_close() implementations call bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. They are reached via the BlockDriver's bdrv_close() callback and the bdrv_close() wrapper, which are also marked as GRAPH_UNLOCKED_PTR and GRAPH_UNLOCKED.
Furthermore, the function bdrv_close() also calls bdrv_drained_begin() and bdrv_graph_wrlock_drained(), so there are additional reasons for marking it GRAPH_UNLOCKED.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-47-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 6d7e3f8d | 30-May-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: mark bdrv_close_all() as GRAPH_UNLOCKED
The function bdrv_close_all() calls bdrv_drain_all(), which must be called with the graph unlocked.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> M
block: mark bdrv_close_all() as GRAPH_UNLOCKED
The function bdrv_close_all() calls bdrv_drain_all(), which must be called with the graph unlocked.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-46-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
| 94371745 | 30-May-2025 |
Fiona Ebner <f.ebner@proxmox.com> |
block: mark bdrv_drop_intermediate() as GRAPH_UNLOCKED
The function bdrv_drop_intermediate() calls bdrv_drained_begin(), which must be called with the graph unlocked.
Signed-off-by: Fiona Ebner <f.
block: mark bdrv_drop_intermediate() as GRAPH_UNLOCKED
The function bdrv_drop_intermediate() calls bdrv_drained_begin(), which must be called with the graph unlocked.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-45-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|