a22bd366 | 29-Aug-2024 |
Kevin Wolf <kwolf@redhat.com> |
raw-format: Fix error message for invalid offset/size
s->offset and s->size are only set at the end of the function and still contain the old values when formatting the error message. Print the para
raw-format: Fix error message for invalid offset/size
s->offset and s->size are only set at the end of the function and still contain the old values when formatting the error message. Print the parameters with the new values that we actually checked instead.
Fixes: 500e2434207d ('raw-format: Split raw_read_options()') Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240829185527.47152-1-kwolf@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 04bbc3ee52b32ac465547bb40c1f090a1b8f315a) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
show more ...
|
7eefbf8b | 12-Jul-2024 |
Fiona Ebner <f.ebner@proxmox.com> |
block/reqlist: allow adding overlapping requests
Allow overlapping request by removing the assert that made it impossible. There are only two callers:
1. block_copy_task_create()
It already assert
block/reqlist: allow adding overlapping requests
Allow overlapping request by removing the assert that made it impossible. There are only two callers:
1. block_copy_task_create()
It already asserts the very same condition before calling reqlist_init_req().
2. cbw_snapshot_read_lock()
There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1].
In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request.
[0]:
> #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done
[1]:
> #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175
Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> (cherry picked from commit 6475155d519209c80fdda53e05130365aa769838) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
show more ...
|
547c4e50 | 08-Aug-2024 |
Stefano Garzarella <sgarzare@redhat.com> |
block/blkio: use FUA flag on write zeroes only if supported
libblkio supports BLKIO_REQ_FUA with write zeros requests only since version 1.4.0, so let's inform the block layer that the blkio driver
block/blkio: use FUA flag on write zeroes only if supported
libblkio supports BLKIO_REQ_FUA with write zeros requests only since version 1.4.0, so let's inform the block layer that the blkio driver supports it only in this case. Otherwise we can have runtime errors as reported in https://issues.redhat.com/browse/RHEL-32878
Fixes: fd66dbd424 ("blkio: add libblkio block driver") Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-32878 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240808080545.40744-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
c8a76dbd | 06-Aug-2024 |
Eric Blake <eblake@redhat.com> |
nbd/server: CVE-2024-7409: Cap default max-connections to 100
Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely nee
nbd/server: CVE-2024-7409: Cap default max-connections to 100
Allowing an unlimited number of clients to any web service is a recipe for a rudimentary denial of service attack: the client merely needs to open lots of sockets without closing them, until qemu no longer has any more fds available to allocate.
For qemu-nbd, we default to allowing only 1 connection unless more are explicitly asked for (-e or --shared); this was historically picked as a nice default (without an explicit -t, a non-persistent qemu-nbd goes away after a client disconnects, without needing any additional follow-up commands), and we are not going to change that interface now (besides, someday we want to point people towards qemu-storage-daemon instead of qemu-nbd).
But for qemu proper, and the newer qemu-storage-daemon, the QMP nbd-server-start command has historically had a default of unlimited number of connections, in part because unlike qemu-nbd it is inherently persistent until nbd-server-stop. Allowing multiple client sockets is particularly useful for clients that can take advantage of MULTI_CONN (creating parallel sockets to increase throughput), although known clients that do so (such as libnbd's nbdcopy) typically use only 8 or 16 connections (the benefits of scaling diminish once more sockets are competing for kernel attention). Picking a number large enough for typical use cases, but not unlimited, makes it slightly harder for a malicious client to perform a denial of service merely by opening lots of connections withot progressing through the handshake.
This change does not eliminate CVE-2024-7409 on its own, but reduces the chance for fd exhaustion or unlimited memory usage as an attack surface. On the other hand, by itself, it makes it more obvious that with a finite limit, we have the problem of an unauthenticated client holding 100 fds opened as a way to block out a legitimate client from being able to connect; thus, later patches will further add timeouts to reject clients that are not making progress.
This is an INTENTIONAL change in behavior, and will break any client of nbd-server-start that was not passing an explicit max-connections parameter, yet expects more than 100 simultaneous connections. We are not aware of any such client (as stated above, most clients aware of MULTI_CONN get by just fine on 8 or 16 connections, and probably cope with later connections failing by relying on the earlier connections; libvirt has not yet been passing max-connections, but generally creates NBD servers with the intent for a single client for the sake of live storage migration; meanwhile, the KubeSAN project anticipates a large cluster sharing multiple clients [up to 8 per node, and up to 100 nodes in a cluster], but it currently uses qemu-nbd with an explicit --shared=0 rather than qemu-storage-daemon with nbd-server-start).
We considered using a deprecation period (declare that omitting max-parameters is deprecated, and make it mandatory in 3 releases - then we don't need to pick an arbitrary default); that has zero risk of breaking any apps that accidentally depended on more than 100 connections, and where such breakage might not be noticed under unit testing but only under the larger loads of production usage. But it does not close the denial-of-service hole until far into the future, and requires all apps to change to add the parameter even if 100 was good enough. It also has a drawback that any app (like libvirt) that is accidentally relying on an unlimited default should seriously consider their own CVE now, at which point they are going to change to pass explicit max-connections sooner than waiting for 3 qemu releases. Finally, if our changed default breaks an app, that app can always pass in an explicit max-parameters with a larger value.
It is also intentional that the HMP interface to nbd-server-start is not changed to expose max-connections (any client needing to fine-tune things should be using QMP).
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20240807174943.771624-12-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [ericb: Expand commit message to summarize Dan's argument for why we break corner-case back-compat behavior without a deprecation period] Signed-off-by: Eric Blake <eblake@redhat.com>
show more ...
|
5eed3db3 | 20-Jul-2024 |
Amjad Alsharafi <amjadsharafi10@gmail.com> |
vvfat: Fix reading files with non-continuous clusters
When reading with `read_cluster` we get the `mapping` with `find_mapping_for_cluster` and then we call `open_file` for this mapping. The issue a
vvfat: Fix reading files with non-continuous clusters
When reading with `read_cluster` we get the `mapping` with `find_mapping_for_cluster` and then we call `open_file` for this mapping. The issue appear when its the same file, but a second cluster that is not immediately after it, imagine clusters `500 -> 503`, this will give us 2 mappings one has the range `500..501` and another `503..504`, both point to the same file, but different offsets.
When we don't open the file since the path is the same, we won't assign `s->current_mapping` and thus accessing way out of bound of the file.
From our example above, after `open_file` (that didn't open anything) we will get the offset into the file with `s->cluster_size*(cluster_num-s->current_mapping->begin)`, which will give us `0x2000 * (504-500)`, which is out of bound for this mapping and will produce some issues.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Message-ID: <1f3ea115779abab62ba32c788073cdc99f9ad5dd.1721470238.git.amjadsharafi10@gmail.com> [kwolf: Simplified the patch based on Amjad's analysis and input] Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
f60a6f7e | 20-Jul-2024 |
Amjad Alsharafi <amjadsharafi10@gmail.com> |
vvfat: Fix wrong checks for cluster mappings invariant
How this `abort` was intended to check for was: - if the `mapping->first_mapping_index` is not the same as `first_mapping_index`, which **sho
vvfat: Fix wrong checks for cluster mappings invariant
How this `abort` was intended to check for was: - if the `mapping->first_mapping_index` is not the same as `first_mapping_index`, which **should** happen only in one case, when we are handling the first mapping, in that case `mapping->first_mapping_index == -1`, in all other cases, the other mappings after the first should have the condition `true`. - From above, we know that this is the first mapping, so if the offset is not `0`, then abort, since this is an invalid state.
The issue was that `first_mapping_index` is not set if we are checking from the middle, the variable `first_mapping_index` is only set if we passed through the check `cluster_was_modified` with the first mapping, and in the same function call we checked the other mappings.
One approach is to go into the loop even if `cluster_was_modified` is not true so that we will be able to set `first_mapping_index` for the first mapping, but since `first_mapping_index` is only used here, another approach is to just check manually for the `mapping->first_mapping_index != -1` since we know that this is the value for the only entry where `offset == 0` (i.e. first mapping).
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <b0fbca3ee208c565885838f6a7deeaeb23f4f9c2.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
21b25a0e | 20-Jul-2024 |
Amjad Alsharafi <amjadsharafi10@gmail.com> |
vvfat: Fix usage of `info.file.offset`
The field is marked as "the offset in the file (in clusters)", but it was being used like this `cluster_size*(nums)+mapping->info.file.offset`, which is incorr
vvfat: Fix usage of `info.file.offset`
The field is marked as "the offset in the file (in clusters)", but it was being used like this `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <72f19a7903886dda1aa78bcae0e17702ee939262.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
b881cf00 | 20-Jul-2024 |
Amjad Alsharafi <amjadsharafi10@gmail.com> |
vvfat: Fix bug in writing to middle of file
Before this commit, the behavior when calling `commit_one_file` for example with `offset=0x2000` (second cluster), what will happen is that we won't fetch
vvfat: Fix bug in writing to middle of file
Before this commit, the behavior when calling `commit_one_file` for example with `offset=0x2000` (second cluster), what will happen is that we won't fetch the next cluster from the fat, and instead use the first cluster for the read operation.
This is due to off-by-one error here, where `i=0x2000 !< offset=0x2000`, thus not fetching the next cluster.
Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <b97c1e1f1bc2f776061ae914f95d799d124fcd73.1721470238.git.amjadsharafi10@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
d5f6cbb2 | 27-Jun-2024 |
Kevin Wolf <kwolf@redhat.com> |
block-copy: Fix missing graph lock
The graph lock needs to be held when calling bdrv_co_pdiscard(). Fix block_copy_task_entry() to take it for the call.
WITH_GRAPH_RDLOCK_GUARD() was implemented in
block-copy: Fix missing graph lock
The graph lock needs to be held when calling bdrv_co_pdiscard(). Fix block_copy_task_entry() to take it for the call.
WITH_GRAPH_RDLOCK_GUARD() was implemented in a weak way because of limitations in clang's Thread Safety Analysis at the time, so that it only asserts that the lock is held (which allows calling functions that require the lock), but we never deal with the unlocking (so even after the scope of the guard, the compiler assumes that the lock is still held). This is why the compiler didn't catch this locking error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240627181245.281403-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
2abc22e6 | 29-Jun-2024 |
Michael Tokarev <mjt@tls.msk.ru> |
block/curl: rewrite http header parsing function
Existing code was long, unclear and twisty.
This also relaxes the rules a tiny bit: allows to have whitespace before header name and colon and makes
block/curl: rewrite http header parsing function
Existing code was long, unclear and twisty.
This also relaxes the rules a tiny bit: allows to have whitespace before header name and colon and makes the header value match to be case-insensitive.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|
d05ae948 | 28-Jun-2024 |
Nir Soffer <nsoffer@redhat.com> |
Consider discard option when writing zeros
When opening an image with discard=off, we punch hole in the image when writing zeroes, making the image sparse. This breaks users that want to ensure that
Consider discard option when writing zeros
When opening an image with discard=off, we punch hole in the image when writing zeroes, making the image sparse. This breaks users that want to ensure that writes cannot fail with ENOSPACE by using fully allocated images[1].
bdrv_co_pwrite_zeroes() correctly disables BDRV_REQ_MAY_UNMAP if we opened the child without discard=unmap or discard=on. But we don't go through this function when accessing the top node. Move the check down to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.
This change implements the documented behavior, punching holes only when opening the image with discard=on or discard=unmap. This may not be the best default but can improve it later.
The test depends on a file system supporting discard, deallocating the entire file when punching hole with the length of the entire file. Tested with xfs, ext4, and tmpfs.
[1] https://lists.nongnu.org/archive/html/qemu-discuss/2024-06/msg00003.html
Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20240628202058.1964986-3-nsoffer@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
7b1070a7 | 24-May-2024 |
Akihiko Odaki <akihiko.odaki@daynix.com> |
Revert "meson: Propagate gnutls dependency"
This reverts commit 3eacf70bb5a83e4775ad8003cbca63a40f70c8c2.
It was only needed because of duplicate objects caused by declare_dependency(link_whole: ..
Revert "meson: Propagate gnutls dependency"
This reverts commit 3eacf70bb5a83e4775ad8003cbca63a40f70c8c2.
It was only needed because of duplicate objects caused by declare_dependency(link_whole: ...), and can be dropped now that meson.build specifies objects and dependencies separately for the internal dependencies.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20240524-objects-v1-2-07cbbe96166b@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
bd385a52 | 11-Apr-2024 |
Kevin Wolf <kwolf@redhat.com> |
qcow2: Don't open data_file with BDRV_O_NO_IO
One use case for 'qemu-img info' is verifying that untrusted images don't reference an unwanted external file, be it as a backing file or an external da
qcow2: Don't open data_file with BDRV_O_NO_IO
One use case for 'qemu-img info' is verifying that untrusted images don't reference an unwanted external file, be it as a backing file or an external data file. To make sure that calling 'qemu-img info' can't already have undesired side effects with a malicious image, just don't open the data file at all with BDRV_O_NO_IO. If nothing ever tries to do I/O, we don't need to have it open.
This changes the output of iotests case 061, which used 'qemu-img info' to show that opening an image with an invalid data file fails. After this patch, it succeeds. Replace this part of the test with a qemu-io call, but keep the final 'qemu-img info' to show that the invalid data file is correctly displayed in the output.
Fixes: CVE-2024-4467 Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
show more ...
|
e9c9d8dc | 29-Jun-2024 |
Akihiko Odaki <akihiko.odaki@daynix.com> |
block/file-posix: Drop ifdef for macOS versions older than 12.0
macOS versions older than 12.0 are no longer supported.
docs/about/build-platforms.rst says: > Support for the previous major version
block/file-posix: Drop ifdef for macOS versions older than 12.0
macOS versions older than 12.0 are no longer supported.
docs/about/build-platforms.rst says: > Support for the previous major version will be dropped 2 years after > the new major version is released or when the vendor itself drops > support, whichever comes first.
macOS 12.0 was released 2021: https://www.apple.com/newsroom/2021/10/macos-monterey-is-now-available/
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20240629-macos-v1-3-6e70a6b700a0@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
show more ...
|
d656aaa1 | 04-Sep-2023 |
Paolo Bonzini <pbonzini@redhat.com> |
block: rename former bdrv_file_open callbacks
Since there is no bdrv_file_open callback anymore, rename the implementations so that they end with "_open" instead of "_file_open". NFS is the excepti
block: rename former bdrv_file_open callbacks
Since there is no bdrv_file_open callback anymore, rename the implementations so that they end with "_open" instead of "_file_open". NFS is the exception because all the functions are named nfs_file_*.
Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
44b424dc | 24-Nov-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
block: remove separate bdrv_file_open callback
bdrv_file_open and bdrv_open are completely equivalent, they are never checked except to see which one to invoke. So merge them into a single one.
Si
block: remove separate bdrv_file_open callback
bdrv_file_open and bdrv_open are completely equivalent, they are never checked except to see which one to invoke. So merge them into a single one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
3ab0f063 | 27-May-2024 |
Stefan Hajnoczi <stefanha@redhat.com> |
crypto/block: drop qcrypto_block_open() n_threads argument
The n_threads argument is no longer used since the previous commit. Remove it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Messag
crypto/block: drop qcrypto_block_open() n_threads argument
The n_threads argument is no longer used since the previous commit. Remove it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240527155851.892885-3-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
24687abf | 25-Apr-2024 |
Prasad Pandit <pjp@fedoraproject.org> |
linux-aio: add IO_CMD_FDSYNC command support
Libaio defines IO_CMD_FDSYNC command to sync all outstanding asynchronous I/O operations, by flushing out file data to the disk storage. Enable linux-aio
linux-aio: add IO_CMD_FDSYNC command support
Libaio defines IO_CMD_FDSYNC command to sync all outstanding asynchronous I/O operations, by flushing out file data to the disk storage. Enable linux-aio to submit such aio request.
When using aio=native without fdsync() support, QEMU creates pthreads, and destroying these pthreads results in TLB flushes. In a real-time guest environment, TLB flushes cause a latency spike. This patch helps to avoid such spikes.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Message-ID: <20240425070412.37248-1-ppandit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
10b1e09e | 29-Apr-2024 |
Fiona Ebner <f.ebner@proxmox.com> |
block/copy-before-write: use uint64_t for timeout in nanoseconds
rather than the uint32_t for which the maximum is slightly more than 4 seconds and larger values would overflow. The QAPI interface a
block/copy-before-write: use uint64_t for timeout in nanoseconds
rather than the uint32_t for which the maximum is slightly more than 4 seconds and larger values would overflow. The QAPI interface allows specifying the number of seconds, so only values 0 to 4 are safe right now, other values lead to a much lower timeout than a user expects.
The block_copy() call where this is used already takes a uint64_t for the timeout, so no change required there.
Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option") Reported-by: Friedrich Weber <f.weber@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20240429141934.442154-1-f.ebner@proxmox.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
b67e3538 | 30-Apr-2024 |
Denis V. Lunev via <qemu-block@nongnu.org> |
block: drop force_dup parameter of raw_reconfigure_getfd()
Since commit 72373e40fbc, this parameter is always passed as 'false' from the caller.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: A
block: drop force_dup parameter of raw_reconfigure_getfd()
Since commit 72373e40fbc, this parameter is always passed as 'false' from the caller.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> Message-ID: <20240430170213.148558-1-den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
0fd05c8d | 13-Mar-2024 |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> |
qapi: blockdev-backup: add discard-source parameter
Add a parameter that enables discard-after-copy. That is mostly useful in "push backup with fleecing" scheme, when source is snapshot-access forma
qapi: blockdev-backup: add discard-source parameter
Add a parameter that enables discard-after-copy. That is mostly useful in "push backup with fleecing" scheme, when source is snapshot-access format driver node, based on copy-before-write filter snapshot-access API:
[guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target] | | | root | file v v [copy-before-write] | | | file | target v v [active disk] [temp.img]
In this case discard-after-copy does two things:
- discard data in temp.img to save disk space - avoid further copy-before-write operation in discarded area
Note that we have to declare WRITE permission on source in copy-before-write filter, for discard to work. Still we can't take it unconditionally, as it will break normal backup from RO source. So, we have to add a parameter and pass it thorough bdrv_open flags.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240313152822.626493-5-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|
006e845b | 13-Mar-2024 |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> |
block/copy-before-write: create block_copy bitmap in filter node
Currently block_copy creates copy_bitmap in source node. But that is in bad relation with .independent_close=true of copy-before-writ
block/copy-before-write: create block_copy bitmap in filter node
Currently block_copy creates copy_bitmap in source node. But that is in bad relation with .independent_close=true of copy-before-write filter: source node may be detached and removed before .bdrv_close() handler called, which should call block_copy_state_free(), which in turn should remove copy_bitmap.
That's all not ideal: it would be better if internal bitmap of block-copy object is not attached to any node. But that is not possible now.
The simplest solution is just create copy_bitmap in filter node, where anyway two other bitmaps are created.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240313152822.626493-4-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|
50717519 | 13-Mar-2024 |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> |
block/copy-before-write: support unligned snapshot-discard
First thing that crashes on unligned access here is bdrv_reset_dirty_bitmap(). Correct way is to align-down the snapshot-discard request.
block/copy-before-write: support unligned snapshot-discard
First thing that crashes on unligned access here is bdrv_reset_dirty_bitmap(). Correct way is to align-down the snapshot-discard request.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240313152822.626493-3-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|
137b4d4b | 13-Mar-2024 |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> |
block/copy-before-write: fix permission
In case when source node does not have any parents, the condition still works as required: backup job do create the parent by
block_job_create -> block_job
block/copy-before-write: fix permission
In case when source node does not have any parents, the condition still works as required: backup job do create the parent by
block_job_create -> block_job_add_bdrv -> bdrv_root_attach_child
Still, in this case checking @perm variable doesn't work, as backup job creates the root blk with empty permissions (as it rely on CBW filter to require correct permissions and don't want to create extra conflicts).
So, we should not check @perm.
The hack may be dropped entirely when transactional insertion of filter (when we don't try to recalculate permissions in intermediate state, when filter does conflict with original parent of the source node) merged (old big series "[PATCH v5 00/45] Transactional block-graph modifying API"[1] and it's current in-flight part is "[PATCH v8 0/7] blockdev-replace"[2])
[1] https://patchew.org/QEMU/20220330212902.590099-1-vsementsov@openvz.org/ [2] https://patchew.org/QEMU/20231017184444.932733-1-vsementsov@yandex-team.ru/
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240313152822.626493-2-vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|
7d99ae59 | 04-Apr-2024 |
Alexander Ivanov <alexander.ivanov@virtuozzo.com> |
blockcommit: Reopen base image as RO after abort
If a blockcommit is aborted the base image remains in RW mode, that leads to a fail of subsequent live migration.
How to reproduce: $ virsh snapsh
blockcommit: Reopen base image as RO after abort
If a blockcommit is aborted the base image remains in RW mode, that leads to a fail of subsequent live migration.
How to reproduce: $ virsh snapshot-create-as vm snp1 --disk-only
*** write something to the disk inside the guest ***
$ virsh blockcommit vm vda --active --shallow && virsh blockjob vm vda --abort $ lsof /vzt/vm.qcow2 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME qemu-syst 433203 root 45u REG 253,0 1724776448 133 /vzt/vm.qcow2 $ cat /proc/433203/fdinfo/45 pos: 0 flags: 02140002 <==== The last 2 means RW mode
If the base image is in RW mode at the end of blockcommit and was in RO mode before blockcommit, reopen the base BDS in RO.
Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20240404091136.129811-1-alexander.ivanov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
show more ...
|