Revision tags: v9.2.0, v9.1.2 |
|
#
cbad4551 |
| 04-Nov-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-20241030-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request for softfreeze
v2: - Patch "migration: Move cpu-throttle.c from system to migration
Merge tag 'migration-20241030-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request for softfreeze
v2: - Patch "migration: Move cpu-throttle.c from system to migration", fix build on MacOS, and subject spelling
NOTE: checkpatch.pl could report a false positive on this branch:
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #21: {include/sysemu => migration}/cpu-throttle.h | 0
That's covered by "F: migration/" entry.
Changelog:
- Peter's cleanup patch on migrate_fd_cleanup() - Peter's cleanup patch to introduce thread name macros - Hanna's error path fix for vmstate subsection save()s - Hyman's auto converge enhancement on background dirty sync - Peter's additional tracepoints for save state entries - Thomas's build fix for OpenBSD in dirtyrate.c - Peter's deprecation of query-migrationthreads command - Peter's cleanup/fixes from the "export misc.h" series - Maciej's two small patches from multifd+vfio series
# -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZyTbVRIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wan3wD+L4TVNDc34Hy4mvWu7u1lCOePX0GBdUEc # oEeBGblwbrcBAIR8d+5z9O5YcWH1coozG1aUC4qCtSHHk5TGbJk4/UUD # =XB5Q # -----END PGP SIGNATURE----- # gpg: Signature made Fri 01 Nov 2024 13:44:53 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-20241030-pull-request' of https://gitlab.com/peterx/qemu: migration/multifd: Zero p->flags before starting filling a packet migration/ram: Add load start trace event migration: Drop migration_is_idle() migration: Drop migration_is_setup_or_active() migration: Unexport ram_mig_init() migration: Unexport dirty_bitmap_mig_init() migration: Take migration object refcount earlier for threads migration: Deprecate query-migrationthreads command migration/dirtyrate: Silence warning about strcpy() on OpenBSD tests/migration: Add case for periodic ramblock dirty sync migration: Support periodic RAMBlock dirty bitmap sync migration: Remove "rs" parameter in migration_bitmap_sync_precopy migration: Move cpu-throttle.c from system to migration migration: Stop CPU throttling conditionally accel/tcg/icount-common: Remove the reference to the unused header file migration: Ensure vmstate_save() sets errp migration: Put thread names together with macros migration: Cleanup migrate_fd_cleanup() on accessing to_dst_file
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
b0350c51 |
| 29-Oct-2024 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
migration/ram: Add load start trace event
There's a RAM load complete trace event but there wasn't its start equivalent.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by
migration/ram: Add load start trace event
There's a RAM load complete trace event but there wasn't its start equivalent.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/94ddfa7ecb83a78f73b82867dd30c8767592d257.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
Revision tags: v9.1.1 |
|
#
52ac968a |
| 17-Oct-2024 |
Hyman Huang <yong.huang@smartx.com> |
migration: Support periodic RAMBlock dirty bitmap sync
When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only cal
migration: Support periodic RAMBlock dirty bitmap sync
When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only called for each iteration, so it won't be invoked for a long time if one iteration can take a long time.
The periodic dirty sync aims to fix the above issue by synchronizing the ramblock from remote dirty bitmap and, when necessary, triggering the CPU throttle multiple times during a long iteration.
This is a trade-off between synchronization overhead and CPU throttle impact.
Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com [peterx: make prev_cnt global, and reset for each migration] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
d481cec7 |
| 17-Oct-2024 |
Hyman Huang <yong.huang@smartx.com> |
migration: Move cpu-throttle.c from system to migration
Move cpu-throttle.c from system to migration since it's only used for migration; this makes us avoid exporting the util functions and variable
migration: Move cpu-throttle.c from system to migration
Move cpu-throttle.c from system to migration since it's only used for migration; this makes us avoid exporting the util functions and variables in misc.h but export them in migration.h when implementing the periodic ramblock dirty sync feature in the upcoming commits.
Since CPU throttle timers are only used in migration, move their registry to migration_object_init.
Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com [peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet] [peterx: Fix subject spelling, per pm215] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
becd6944 |
| 06-Sep-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-20240904-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Steve's cleanup of unused variable - Peter Maydell's fixes for several leaks in
Merge tag 'migration-20240904-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Steve's cleanup of unused variable - Peter Maydell's fixes for several leaks in migration-test - Fabiano's flexibilization of multifd data structures for device state migration - Arman Nabiev's fix for ppc e500 migration - Thomas' fix for migration-test vs. --without-default-devices
# -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmbYVXwQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnRucEAC1vo046UGdUmbb4PaF5vKAg97io6RB2nrH # HMz56Yc0AcAKRUGwe2Z80e2jY8B6zi8Ha8b9l7cVsej095eGCF+tINIL4wRX4lHm # alDY/LkhuqjE5g5c/DaeTztyBOFLvdWHPU5eJyDOC9r7kSlnUcL1gAslH23b8uL0 # xvhPVKaTWjGIzNL1q/XfBr1WgRGqfD6dYb32HJDTq85yOnUT5sEr55aoEEu0euKh # MYbXPmi5AMbrp8nP21kzUopX8iYERRdoKwhF0ZssciGi/qJVevH70tNdbDEQSxyp # +vtP54TnL3LrzD4uY5Snng9zT9h0QrZujY79OEcxu20U0s29OQaudWkIjp7yLLUv # UnPZHS+bIyaS53DdpV94GKGGBX1wrjGC/sn8eGYzmb2yMlMjLTBoE8L5r9cadshX # XTeF4MtKGqaS3xDM2fIgACHHFl6qr/l0nENspv0raFzpf9Jx/WbpekghvTuWN6/B # pZHnoOTNiAqXS/Rnyy829vsQ0Pw4hi6wx79Z73RP+35ubZTgTmOsQx9f2FjuEh6k # JS+q9k4VJ+nntUWsYn4GS1Jlt+FXJ2hfzNj1NNFN4xLT1oioc6pCHsQyV7SBArB1 # ml2zYyfKCTC3riIRhcv/ew6OcKbhHcPFOpd/v0y40LO3mx8S0LZnUWXkcrl3XIZS # Mj5CBdlFgA== # =SRN4 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 04 Sep 2024 13:41:32 BST # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20240904-pull-request' of https://gitlab.com/farosas/qemu: (34 commits) tests/qtest/migration: Add a check for the availability of the "pc" machine target/ppc: Fix migration of CPUs with TLB_EMB TLB type migration/multifd: Add documentation for multifd methods migration/multifd: Add a couple of asserts for p->iov migration/multifd: Fix p->iov leak in multifd-uadk.c migration/multifd: Stop changing the packet on recv side migration/multifd: Make MultiFDMethods const migration/multifd: Move nocomp code into multifd-nocomp.c migration/multifd: Register nocomp ops dynamically migration/multifd: Standardize on multifd ops names migration/multifd: Allow multifd sync without flush migration/multifd: Replace multifd_send_state->pages with client data migration/multifd: Don't send ram data during SYNC migration/multifd: Isolate ram pages packet data migration/multifd: Remove total pages tracing migration/multifd: Move pages accounting into multifd_send_zero_page_detect() migration/multifd: Replace p->pages with an union pointer migration/multifd: Make MultiFDPages_t:offset a flexible array member migration/multifd: Introduce MultiFDSendData migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov ...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
Revision tags: v9.1.0 |
|
#
87bb9e95 |
| 27-Aug-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Isolate ram pages packet data
While we cannot yet disentangle the multifd packet from page data, we can make the code a bit cleaner by setting the page-related fields in a separat
migration/multifd: Isolate ram pages packet data
While we cannot yet disentangle the multifd packet from page data, we can make the code a bit cleaner by setting the page-related fields in a separate function.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
96d396bf |
| 27-Aug-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Remove total pages tracing
The total_normal_pages and total_zero_pages elements are used only for the end tracepoints of the multifd threads. These are not super useful since they
migration/multifd: Remove total pages tracing
The total_normal_pages and total_zero_pages elements are used only for the end tracepoints of the multifd threads. These are not super useful since they record per-channel numbers and are just the sum of all the pages that are transmitted per-packet, for which we already have tracepoints. Remove the totals from the tracing.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
7e1c0047 |
| 22-May-2024 |
Richard Henderson <richard.henderson@linaro.org> |
Merge tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Li Zhijian's COLO minor fixes - Marc-André's virtio-gpu fix - Fiona's virtio-net
Merge tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Li Zhijian's COLO minor fixes - Marc-André's virtio-gpu fix - Fiona's virtio-net USO fix - A couple of migration-test fixes from Thomas
# -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZObggQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnWE8D/49RGE+g29qyk9aKx3lU8mSq+ZzmX5GncBt # 5+Mx5qoHDsBCQTE+dQpEVIoeMJ2HIbgbOML4qsnp6Hw/4/TWkfwC/R6+ZmHBevRk # fVLkVh2JMHVg8Tq+0FO1X1QnMU03uJ7EAuWdDa8HqlJ5dQY/K3gDaku8oQBXk96X # 13pChSbMob76tdb+wiwbdEakabigH7XfrPdI6lzI8MCGTIcPKc/UKTFYuoj/OsNx # raqy+uBtvKtfHxiaYnIgHIPNAF/1f4tP3iAOcPoZWIMXWxFkE8+ANDJAbWo6xIcL # DGg/wEzZO/OnXLjOhjvLBUHK/fx4wQ5bsqA09BVxoRyBGblkXr+bcwBLYjgiEqzT # aniPiAx5W/Db+T7HqZPIWesFYj3cmcwvYUTrx/RPMdC0epG+ZczDMtescHdZbxvt # Pjs3nFeCLhyYcVhlTI72eXRCxdd/26+r6/OmrBC2+GaZrybM61TvNo+3XvO0Pfhi # UmwF2EN27XmSMelLvH/MnflUVgBHKDs3CCQzDlxreHq2jMVR0SL7LU5wMJJ58Iok # M3u74izQM25bwYxiASH+4iRn0puH1mOwgOx28W0uiQfZY/678/lCnwa1Tul15BRE # fIQZJhyIGzhSpwLqEXmdXdlLQs1isqIgpd/mzKgZ285nLr7kz+4gxCUqiXgVbrl7 # P45Dym1u4g== # =DDrh # -----END PGP SIGNATURE----- # gpg: Signature made Wed 22 May 2024 03:13:28 PM PDT # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu: tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1 virtio-gpu: fix v2 migration migration: fix a typo migration: add "exists" info to load-state-field trace migration/colo: Tidy up bql_unlock() around bdrv_activate_all() migration/colo: make colo_incoming_co() return void migration/colo: Minor fix for colo error message
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
#
3f879f2f |
| 16-May-2024 |
Marc-André Lureau <marcandre.lureau@redhat.com> |
migration: add "exists" info to load-state-field trace
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fiona Ebner <f.ebner@prox
migration: add "exists" info to load-state-field trace
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
a016dd50 |
| 09-May-2024 |
Richard Henderson <richard.henderson@linaro.org> |
Merge tag 'migration-20240508-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Will's WITH_QEMU_LOCK_GUARD cleanup - Vladimir's new exit-on-error parameter - F
Merge tag 'migration-20240508-pull-request' of https://gitlab.com/farosas/qemu into staging
Migration pull request
- Will's WITH_QEMU_LOCK_GUARD cleanup - Vladimir's new exit-on-error parameter - Fabiano's removals and deprecations series (block migration and non-multifd compression removed) - Peter's documentation fix for HMP migrate command
v2: - updated Peter's documentation fix.
# -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmY7934QHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnXynEADHjRa7HqwuYPhft3wGgLiFbCyQNFpNrjM9 # prQSiLlYt9gRlE4c9ZavCxR28xtOrK2oFhCnLMXaIEkct6JuylfiwCPwPuxNQP9+ # EZirECf1yKkyt+RV/LfIx3R/prJgoH5XWhpna+WIBFFo2qSorHTAzjb5dKYZDjkB # EjfN8R9goVH6aCPd4SyiCUUNxuR6/0si9AxfhUgUvUXyLZmE1ztZEoWI02FCYzVj # kKDdVK2+Z1Rlv88tyY4/E6z4pwYLWx5EiXSFv0NXIpTdyO3dM+jeAHxcN7KmQ1+5 # GvX0n+mFYOzRIbRfAnhSZbkez/nuPcbJ76phzSYDs8f/7YtOpuOFKFw7yuGrl5N5 # ZqXo5MOOGliF2wozTjacsOrUhB+MbSb0iA71T7aAdBC2s4H9+XIWfoN/OZfsBhAW # r2i1gSytVLQqsip7A0CFF+DqeSse9QHHlH8vfb8NUn1Tp0o2QfsX+/7LHlvl/2eJ # EP/zmjD6c/8vjB3fTKZr52h2lEO/36xmX+OtZpep3EBvvl1BY1LP4nBNOW1vQM/b # fzcq+agaikwS5gI2QSOC9HJ3aX6q416+wZEm3rQ8XRGSPDFfLPKM/GPPfWdj6ngb # +e3EZPrs+3dOeH1kly5xVMGXGUof+VVBmVwdv4C+XNMM8fRZOxoqd0SD8dz/vOC7 # nSGztXUPqw== # =5T+K # -----END PGP SIGNATURE----- # gpg: Signature made Thu 09 May 2024 12:06:54 AM CEST # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D
* tag 'migration-20240508-pull-request' of https://gitlab.com/farosas/qemu: hmp/migration: Fix "migrate" command's documentation migration: Deprecate fd: for file migration migration: Remove non-multifd compression migration: Remove block migration migration: Remove 'blk/-b' option from migrate commands migration: Remove 'inc' option from migrate command migration: Remove 'skipped' field from MigrationStats qapi: introduce exit-on-error parameter for migrate-incoming migration: process_incoming_migration_co(): rework error reporting migration: process_incoming_migration_co(): fix reporting s->error migration: process_incoming_migration_co(): complete cleanup on failure migration: move trace-point from migrate_fd_error to migrate_set_error migration/ram.c: API Conversion qemu_mutex_lock(), and qemu_mutex_unlock() to WITH_QEMU_LOCK_GUARD macro
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
#
d4a17b8f |
| 30-Apr-2024 |
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> |
migration: move trace-point from migrate_fd_error to migrate_set_error
Cover more cases by trace-point.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fabiano
migration: move trace-point from migrate_fd_error to migrate_set_error
Cover more cases by trace-point.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
8f3f329f |
| 12-Mar-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-20240311-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request
- Avihai's fix to allow vmstate iterators to not starve for VFIO - Maksim's fix on
Merge tag 'migration-20240311-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull request
- Avihai's fix to allow vmstate iterators to not starve for VFIO - Maksim's fix on additional check on precopy load error - Fabiano's fix on fdatasync() hang in mapped-ram - Jonathan's fix on vring cached access over MMIO regions - Cedric's cleanup patches 1-4 out of his error report series - Yu's fix for RDMA migration (which used to be broken even for 8.2) - Anthony's small cleanup/fix on err message - Steve's patches on privatize migration.h - Xiang's patchset to enable zero page detections in multifd threads
# -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZe9+uBIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wamaQD/SvmpMEcuRndT9LPSxzXowAGDZTBpYUfv # 5XAbx80dS9IBAO8PJJgQJIBHBeacyLBjHP9CsdVtgw5/VW+wCsbfV4AB # =xavb # -----END PGP SIGNATURE----- # gpg: Signature made Mon 11 Mar 2024 21:59:20 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-20240311-pull-request' of https://gitlab.com/peterx/qemu: (34 commits) migration/multifd: Add new migration test cases for legacy zero page checking. migration/multifd: Enable multifd zero page checking by default. migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page. migration/multifd: Implement zero page transmission on the multifd thread. migration/multifd: Add new migration option zero-page-detection. migration/multifd: Allow clearing of the file_bmap from multifd migration/multifd: Allow zero pages in file migration migration: purge MigrationState from public interface migration: delete unused accessors migration: privatize colo interfaces migration: migration_file_set_error migration: migration_is_device migration: migration_thread_is_self migration: export vcpu_dirty_limit_period migration: export migration_is_running migration: export migration_is_active migration: export migration_is_setup_or_active migration: remove migration.h references migration: export fewer options migration: Fix format in error message ...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
303e6f54 |
| 11-Mar-2024 |
Hao Xiang <hao.xiang@bytedance.com> |
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for n
migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters.
Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
c90cfb52 |
| 05-Mar-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu into staging
Migartion pull request for 20240304
- Bryan's fix on multifd compression level API - Fabiano's mapped-ram seri
Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu into staging
Migartion pull request for 20240304
- Bryan's fix on multifd compression level API - Fabiano's mapped-ram series (base + multifd only) - Steve's amend on cpr document in qapi/
# -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZeUjKhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbv5QD/ZexBUsmZA5qyxgGvZ2yvlUBEGNOvtmKY # kRdiYPU7khMA/0N43rn4LcqKCoq4+T+EAnYizGjIyhH/7BRUyn4DUxgO # =AeEn # -----END PGP SIGNATURE----- # gpg: Signature made Mon 04 Mar 2024 01:26:02 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu: (27 commits) migration/multifd: Document two places for mapped-ram tests/qtest/migration: Add a multifd + mapped-ram migration test migration/multifd: Add mapped-ram support to fd: URI migration/multifd: Support incoming mapped-ram stream format migration/multifd: Support outgoing mapped-ram stream format migration/multifd: Prepare multifd sync for mapped-ram migration migration/multifd: Add incoming QIOChannelFile support migration/multifd: Add outgoing QIOChannelFile support migration/multifd: Add a wrapper for channels_created migration/multifd: Allow receiving pages without packets migration/multifd: Allow multifd without packets migration/multifd: Decouple recv method from pages migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data tests/qtest/migration: Add tests for mapped-ram file-based migration migration/ram: Add incoming 'mapped-ram' migration migration/ram: Add outgoing 'mapped-ram' migration migration: Add mapped-ram URI compatibility check migration/ram: Introduce 'mapped-ram' migration capability migration/qemu-file: add utility methods for working with seekable channels io: fsync before closing a file channel ...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts: # migration/ram.c
show more ...
|
#
4aac6b1e |
| 29-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multif
migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.
Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
5d1fc614 |
| 09-Feb-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull
- William's fix on hwpoison migration which used to crash QEMU - Peter's multifd cleanup + b
Merge tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu into staging
Migration pull
- William's fix on hwpoison migration which used to crash QEMU - Peter's multifd cleanup + bugfix + optimizations - Avihai's fix on multifd crash over non-socket channels - Fabiano's multifd thread-race fix - Peter's CI fix series
# -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZcREtRIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wacrwEAl2aeQkh51h/e+OKX7MG4/4Y6Edf6Oz7o # IJLk/cyrUFQA/2exo2lOdv5zHNOJKwAYj8HYDraezrC/MK1eED4Wji0M # =k53l # -----END PGP SIGNATURE----- # gpg: Signature made Thu 08 Feb 2024 03:04:21 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu: (34 commits) ci: Update comment for migration-compat-aarch64 ci: Remove tag dependency for build-previous-qemu tests/migration-test: Stick with gicv3 in aarch64 test migration/multifd: Add a synchronization point for channel creation migration/multifd: Unify multifd and TLS connection paths migration/multifd: Move multifd_send_setup into migration thread migration/multifd: Move multifd_send_setup error handling in to the function migration/multifd: Remove p->running migration/multifd: Join the TLS thread migration: Fix logic of channels and transport compatibility check migration/multifd: Optimize sender side to be lockless migration/multifd: Fix MultiFDSendParams.packet_num race migration/multifd: Stick with send/recv on function names migration/multifd: Cleanup multifd_load_cleanup() migration/multifd: Cleanup multifd_save_cleanup() migration/multifd: Rewrite multifd_queue_page() migration/multifd: Change retval of multifd_send_pages() migration/multifd: Change retval of multifd_queue_page() migration/multifd: Split multifd_send_terminate_threads() migration/multifd: Forbid spurious wakeups ...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
#
3ab4441d |
| 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only
- multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling.
Use multifd_send_set_error() in the three old call sites where only the error will be set.
Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup().
Both helpers will need to set quitting=1.
Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
75b7b25d |
| 02-Nov-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'migration-20231102-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request (20231102)
Hi
In this pull request:
- migration reboot mode (steve) * I
Merge tag 'migration-20231102-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request (20231102)
Hi
In this pull request:
- migration reboot mode (steve) * I disabled the test because our CI don't like programs using so much shared memory. Searching for a fix. - test for postcopy recover (fabiano) - MigrateAddress QAPI (het) - better return path error handling (peter) - traces for downtime (peter) - vmstate_register() check for duplicates (juan) thomas find better solutions for s390x and ipmi. now also works on s390x
Please, apply.
# -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmVDipMACgkQ9IfvGFhy # 1yNYnQ/9E5Cywsoqljqa/9FiKBSII2qMrmkfu6JLKqePnsh5pFZiukbudYRuJCCe # ZTDEmD0NmKRJbDx2xRU1qx/e6gKJy+gz37KP89Buuh/WwZHPboPYtxQpGvCSiH26 # J3i+1+TgaqmkLzcO35wa8tp6gneQclWeAwKgMvdb4cm2pJEhgWRKI62ccyLzxeve # UCzFQn60t55ETyVZGnRD4YwdTQvGKH+DPlyTuJOLR3DePuvZd8EdH+ypvB4RLAy7 # 3+CuQOxmF5LRXPbpJuAeOsudbmhhHzrO/yL7ZmsiKQTthsJv+SzC1bO94jhQrawZ # Q7GCii5KpGq0KnRTRKZRGk6XKwxcYRduXMX3R5tXuVmDmCZsjhXzziU8yEdftph8 # 5TJdk1o0Gb043EFu81mrsQYS+9yJqe6sy6m3PTJaec54cAty5ln+c17WOvpAOaSV # +1phe05ftuVPmQ3KWhbIR/tCmavNLwEZxpVIfyaKJx04bFbtQ9gRpRyURORX4KXc # s4WXvNirQEohxYBnP4TPvA09xBTW3V08pk/wRDwt0YDXnLiqCltOuxD8r05K8K4B # MkCLcWj0g7he2tBkF60oz1KSIE0oTB81um9AzLIv5F2YSYLaJM5BIcoC437MR2f4 # MOR7drR1fP5GsRu/SeU5BWvhVq3IvdOxR7G2MLNRJJvl7ZtGXDc= # =uaqL # -----END PGP SIGNATURE----- # gpg: Signature made Thu 02 Nov 2023 19:40:03 HKT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* tag 'migration-20231102-pull-request' of https://gitlab.com/juan.quintela/qemu: (40 commits) migration: modify test_multifd_tcp_none() to use new QAPI syntax. migration: Implement MigrateChannelList to hmp migration flow. migration: Implement MigrateChannelList to qmp migration flow. migration: modify migration_channels_and_uri_compatible() for new QAPI syntax migration: New migrate and migrate-incoming argument 'channels' migration: Convert the file backend to the new QAPI syntax migration: convert exec backend to accept MigrateAddress. migration: convert rdma backend to accept MigrateAddress migration: convert socket backend to accept MigrateAddress migration: convert migration 'uri' into 'MigrateAddress' migration: New QAPI type 'MigrateAddress' migration: Change ram_dirty_bitmap_reload() retval to bool tests/migration-test: Add a test for postcopy hangs during RECOVER migration: Allow network to fail even during recovery migration: Refactor error handling in source return path tests/qtest: migration: add reboot mode test cpr: reboot mode cpr: relax vhost migration blockers cpr: relax blockdev migration blockers migration: per-mode blockers ...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
7aa6070d |
| 17-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_err
migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate.
To make this better, a few things done:
- Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*.
- With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself.
- Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread.
- Do the same when detected qemufile error in source return path
We need to re-export qemu_file_get_error_obj() to do the last one.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
show more ...
|
#
3e5f3bcd |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints f
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-*: record BH steps to resume VM for precopy - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>
show more ...
|
#
3c80f142 |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently.
Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>
show more ...
|
#
ec6f9f13 |
| 17-Oct-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
Merge tag 'migration-20231017-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request (20231017)
Hi
Same that yesterday one, except: - rebased to latest (clean r
Merge tag 'migration-20231017-pull-request' of https://gitlab.com/juan.quintela/qemu into staging
Migration Pull request (20231017)
Hi
Same that yesterday one, except: - rebased to latest (clean rebase) - fixed 64 bits read on big endian host
CI: https://gitlab.com/juan.quintela/qemu/-/pipelines/1039214198
Please, apply.
# -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmUuReUACgkQ9IfvGFhy # 1yO+FQ/+Nx2botbrUVJb3vLeG6f+x5xeWJjB0boOqhk7227cKmAA33Oqwx5l4UtL # oLOHA6P4ThqacpaluGOMMp44BSr/jOMDC/HUDVJtSplTD+droPiklIIGUfYScLbA # oYx6lXfSB2jMpSuSU19STbjwBRvd4bjJix3zDGwEIgXYqYt0tY0FY/nnGTmImnM1 # KDjRerf1lg4Rt0vvwg7I0onIDvh3CKX26Sj5a3wSRaLoocUe3jpsuBNH7MMqroHs # WpocBIsLiBAf/CbeLZsQlhbVeOi1R+kSAR5hDPvvJCPWHIrd2wf8+3NXjcFepb7d # M4wE2jLjCvHhzwYwSc0ir4n74jwD22IirEPQs8ONHrjLCb5VoBKYV5bqsFUHF55N # SbFvcZIzJFiOm2anEWiiqiNTLtYAdQCKtUvbyJ7Mq4ck6icIInLdX9zrm4voofYJ # 02lX/IIGlT3C3dGSz09LBoJ6E82zmQWNHmov8A90+3RYvMF9uSpxi0z40lhj6jWC # 6Q2AHxrJJ040ZboeOfJQG78BtvZ/9PQ2ORhJ3ceRDND4kSTDtfe/TSNAZ3thM33y # Sv99o+F/HaqrKnxK8eTJrvIEWxojDu3lnqJERWAm2AOxTnQ+6mgGtsCfLEdrv5D1 # xVsY2QczB1quRjaU2ml/7Cxe4Q1urTtfl82IEXGded6UL+cmF/I= # =br93 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 17 Oct 2023 04:29:25 EDT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* tag 'migration-20231017-pull-request' of https://gitlab.com/juan.quintela/qemu: (38 commits) migration/multifd: Clarify Error usage in multifd_channel_connect migration/multifd: Unify multifd_send_thread error paths migration/multifd: Remove direct "socket" references migration/ram: Merge save_zero_page functions migration/ram: Move xbzrle zero page handling into save_zero_page migration/ram: Stop passing QEMUFile around in save_zero_page migration/ram: Remove RAMState from xbzrle_cache_zero_page migration/ram: Refactor precopy ram loading code multifd: reset next_packet_len after sending pages multifd: fix counters in multifd_send_thread migration: check for rate_limit_max for RATE_LIMIT_DISABLED migration: Improve json and formatting migration/rdma: Remove all "ret" variables that are used only once migration/rdma: Declare for index variables local migration/rdma: Use i as for index instead of idx migration/rdma: Check sooner if we are in postcopy for save_page() migration/rdma: Remove qemu_ prefix from exported functions migration/rdma: Move rdma constants from qemu-file.h to rdma.h qemu-file: Remove QEMUFileHooks migration/rdma: Create rdma_control_save_page() ...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
#
967e3889 |
| 12-Oct-2023 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error.
I
migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a NULL Error and the other always gives it a non-NULL Error.
In the non-NULL case, all it does it trace the error and return. One of the callers already have tracing, add a tracepoint to the other and stop passing the error into the function.
Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231012134343.23757-4-farosas@suse.de>
show more ...
|
#
b1b38387 |
| 11-Oct-2023 |
Juan Quintela <quintela@redhat.com> |
migration/rdma: Remove qemu_ prefix from exported functions
Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Si
migration/rdma: Remove qemu_ prefix from exported functions
Functions are long enough even without this.
Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-10-quintela@redhat.com>
show more ...
|
#
8b239597 |
| 10-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Allow user to specify available switchover bandwidth
Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision
migration: Allow user to specify available switchover bandwidth
Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process.
This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available.
However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes.
With that, the decision to switchover will be affected, by assuming that we may not be able to switchover at all with such a low bandwidth, but in reality we can.
The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth, keeping iterations forever with a low estimation of bandwidth.
The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far.
One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value.
For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. It may not be the exact bandwidth when switching over (in which case qemu will push migration data as fast as possible), but much better than QEMU trying to wildly guess, especially when very wrong.
A new parameter "avail-switchover-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), it trusts the user more by using this value to decide when to switchover, assuming that we'll have such bandwidth available then.
Note that specifying this value will not throttle the bandwidth for switchover yet, so QEMU will always use the full bandwidth possible for sending switchover data, assuming that should always be the most important way to use the network at that time.
This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason.
Reported-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231010221922.40638-1-peterx@redhat.com>
show more ...
|