44d0d456 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and scheduling together and wrap the dispatch with an intermediary function that will ensure we always keep the ref/unref balanced.
Also move the responsibility of deleting the BH into the wrapper and remove the now unnecessary pointers.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
699d9476 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the current_migration object.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https
migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the current_migration object.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9cf26896 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Ev
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Even on this load-side function, we might still use the MigrationState, e.g to check for capabilities.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
59094cfa | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use.
migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
27eb8499 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half to run without holding a reference to the 'current_migration' object, w
migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half to run without holding a reference to the 'current_migration' object, which leads to a segmentation fault if the BH is still live after migration_shutdown() has dropped the last reference to current_migration.
In my system the bug manifests as migrate_multifd() returning true when it shouldn't and multifd_load_shutdown() calling multifd_recv_terminate_threads() which crashes due to an uninitialized multifd_recv_state.
Fix the issue by holding a reference to the object when scheduling the BH and dropping it before returning from the BH. The same is already done for the cleanup_bh at migrate_fd_cleanup_schedule().
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969 Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
0a5d1108 | 11-Sep-2023 |
Fabiano Rosas <farosas@suse.de> |
migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering yank functions. Query for features instead.
The yank method for all channels used with migra
migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering yank functions. Query for features instead.
The yank method for all channels used with migration code currently is to call the qio_channel_shutdown() function, so query for QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the future for indicating whether a channel supports yanking, but that seems overkill at the moment.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20230911171320.24372-9-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
b0504edd | 17-Jan-2024 |
Peter Xu <peterx@redhat.com> |
migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means this check:
remaining_size < s->threshold_size
Must have been d
migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means this check:
remaining_size < s->threshold_size
Must have been done already, actually at migration_iteration_run():
if (must_precopy <= s->threshold_size) { qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
That should be after one round of ram_state_pending_estimate(). It makes the 2nd check meaningless and can be dropped.
To say it in another way, when reaching ->state_pending_exact(), we unconditionally sync dirty bits for precopy.
Then we can drop migrate_get_current() there too.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240117075848.139045-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
a8629e0c | 17-Jan-2024 |
Peter Xu <peterx@redhat.com> |
migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t. Make it always clear that it's never a negative.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro
migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t. Make it always clear that it's never a negative.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
918f620d | 17-Jan-2024 |
Markus Armbruster <armbru@redhat.com> |
migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails. Plug the leak with g_autoptr().
Fixes: 967f2de5c9ec (migration: Implement MigrateChannelLi
migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails. Plug the leak with g_autoptr().
Fixes: 967f2de5c9ec (migration: Implement MigrateChannelList to hmp migration flow.) v8.2.0-rc0 Fixes: CID 1533125 Signed-off-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20240117140722.3979657-1-armbru@redhat.com [peterx: fix CID number as reported by Peter Maydell] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
73b49878 | 17-Jan-2024 |
Paolo Bonzini <pbonzini@redhat.com> |
userfaultfd: use 1ULL to build ioctl masks
There is no need to use the Linux-internal __u64 type, 1ULL is guaranteed to be wide enough.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-b
userfaultfd: use 1ULL to build ioctl masks
There is no need to use the Linux-internal __u64 type, 1ULL is guaranteed to be wide enough.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonzini@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
44ce1b5d | 11-Jan-2024 |
Nick Briggs <nicholas.h.briggs@gmail.com> |
migration/rdma: define htonll/ntohll only if not predefined
Solaris has #defines for htonll and ntohll which cause syntax errors when compiling code that attempts to (re)define these functions..
Si
migration/rdma: define htonll/ntohll only if not predefined
Solaris has #defines for htonll and ntohll which cause syntax errors when compiling code that attempts to (re)define these functions..
Signed-off-by: Nick Briggs <nicholas.h.briggs@gmail.com> Link: https://lore.kernel.org/r/65a04a7d.497ab3.3e7bef1f@gateway.sonic.net Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
e3b8ad5c | 04-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Report error in incoming migration
We're not currently reporting the errors set with migrate_set_error() when incoming migration fails.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Rev
migration: Report error in incoming migration
We're not currently reporting the errors set with migrate_set_error() when incoming migration fails.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
6074f816 | 04-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Change multifd_pages_init argument
The 'size' argument is actually the number of pages that fit in a multifd packet. Change it to uint32_t and rename.
Signed-off-by: Fabiano Rosa
migration/multifd: Change multifd_pages_init argument
The 'size' argument is actually the number of pages that fit in a multifd packet. Change it to uint32_t and rename.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9346fa18 | 04-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Remove QEMUFile from where it is not needed
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144
migration/multifd: Remove QEMUFile from where it is not needed
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
dca1bc7f | 04-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Remove MultiFDPages_t::packet_num
This was introduced by commit 34c55a94b1 ("migration: Create multipage support") and never used.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
migration/multifd: Remove MultiFDPages_t::packet_num
This was introduced by commit 34c55a94b1 ("migration: Create multipage support") and never used.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
0770ad43 | 05-Dec-2023 |
Het Gala <het.gala@nutanix.com> |
migration: Simplify initial conditionals in migration for better readability
The inital conditional statements in qmp migration functions is harder to understand than necessary. It is better to get
migration: Simplify initial conditionals in migration for better readability
The inital conditional statements in qmp migration functions is harder to understand than necessary. It is better to get all errors out of the way in the beginning itself to have better readability and error handling.
Signed-off-by: Het Gala <het.gala@nutanix.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231205080039.197615-1-het.gala@nutanix.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
a4a411fb | 02-Jan-2024 |
Stefan Hajnoczi <stefanha@redhat.com> |
Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL) in their names. Update the code comments to use "BQL" instead of "iothread lock"
Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL) in their names. Update the code comments to use "BQL" instead of "iothread lock".
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Message-id: 20240102153529.486531-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
195801d7 | 02-Jan-2024 |
Stefan Hajnoczi <stefanha@redhat.com> |
system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonl
system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread().
The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL.
The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void)
There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
show more ...
|
c8193acc | 05-Jan-2024 |
Peter Maydell <peter.maydell@linaro.org> |
Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging
migration 1st pull for 9.0
- We lost Juan and Leo in the maintainers file - Steven's suspend state fix - S
Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging
migration 1st pull for 9.0
- We lost Juan and Leo in the maintainers file - Steven's suspend state fix - Steven's fix for coverity on migrate_mode - Avihai's migration cleanup series
# -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+ # Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D # =sfuM # -----END PGP SIGNATURE----- # gpg: Signature made Thu 04 Jan 2024 04:30:07 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown] # gpg: aka "Peter Xu <peterx@redhat.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706
* tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu: (26 commits) migration: fix coverity migrate_mode finding migration/multifd: Remove unnecessary usage of local Error migration: Remove unnecessary usage of local Error migration: Fix migration_channel_read_peek() error path migration/multifd: Remove error_setg() in migration_ioc_process_incoming() migration/multifd: Fix leaking of Error in TLS error flow migration/multifd: Simplify multifd_channel_connect() if else statement migration/multifd: Fix error message in multifd_recv_initial_packet() migration: Remove errp parameter in migration_fd_process_incoming() migration: Refactor migration_incoming_setup() migration: Remove nulling of hostname in migrate_init() migration: Remove migrate_max_downtime() declaration tests/qtest: postcopy migration with suspend tests/qtest: precopy migration with suspend tests/qtest: option to suspend during migration tests/qtest: migration events migration: preserve suspended for bg_migration migration: preserve suspended for snapshot migration: preserve suspended runstate migration: propagate suspended runstate ...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
show more ...
|
b12635ff | 13-Nov-2023 |
Steve Sistare <steven.sistare@oracle.com> |
migration: fix coverity migrate_mode finding
Coverity diagnoses a possible out-of-range array index here ...
static GSList *migration_blockers[MIG_MODE__MAX];
fill_source_migration_info()
migration: fix coverity migrate_mode finding
Coverity diagnoses a possible out-of-range array index here ...
static GSList *migration_blockers[MIG_MODE__MAX];
fill_source_migration_info() { GSList *cur_blocker = migration_blockers[migrate_mode()];
... because it does not know that MIG_MODE__MAX will never be returned as a migration mode. To fix, assert so in migrate_mode().
Fixes: fa3673e497a1 ("migration: per-mode blockers")
Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1699907025-215450-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
3fc58efa | 31-Dec-2023 |
Avihai Horon <avihaih@nvidia.com> |
migration/multifd: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead of errp is needed if errp is passed to void functions, where it is l
migration/multifd: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead of errp is needed if errp is passed to void functions, where it is later dereferenced to see if an error occurred.
There are several places in multifd.c that use local Error although it is not needed. Change these places to use errp directly.
Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231231093016.14204-12-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
b6f4c0c7 | 31-Dec-2023 |
Avihai Horon <avihaih@nvidia.com> |
migration: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead of errp is needed if errp is passed to void functions, where it is later der
migration: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead of errp is needed if errp is passed to void functions, where it is later dereferenced to see if an error occurred.
There are several places in migration.c that use local Error although it is not needed. Change these places to use errp directly.
Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231231093016.14204-11-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
4f8cf323 | 31-Dec-2023 |
Avihai Horon <avihaih@nvidia.com> |
migration: Fix migration_channel_read_peek() error path
migration_channel_read_peek() calls qio_channel_readv_full() and handles both cases of return value == 0 and return value < 0 the same way, by
migration: Fix migration_channel_read_peek() error path
migration_channel_read_peek() calls qio_channel_readv_full() and handles both cases of return value == 0 and return value < 0 the same way, by calling error_setg() with errp. However, if return value < 0, errp is already set, so calling error_setg() with errp will lead to an assert.
Fix it by handling these cases separately, calling error_setg() with errp only in return value == 0 case.
Fixes: 6720c2b32725 ("migration: check magic value for deciding the mapping of channels") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231231093016.14204-10-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
1d3886f8 | 31-Dec-2023 |
Avihai Horon <avihaih@nvidia.com> |
migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
If multifd_load_setup() fails in migration_ioc_process_incoming(), error_setg() is called with errp. This will lead to an a
migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
If multifd_load_setup() fails in migration_ioc_process_incoming(), error_setg() is called with errp. This will lead to an assert because in that case errp already contains an error.
Fix it by removing the redundant error_setg().
Fixes: 6720c2b32725 ("migration: check magic value for deciding the mapping of channels") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231231093016.14204-9-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
6ae208ce | 31-Dec-2023 |
Avihai Horon <avihaih@nvidia.com> |
migration/multifd: Fix leaking of Error in TLS error flow
If there is an error in multifd TLS handshake task, multifd_tls_outgoing_handshake() retrieves the error with qio_task_propagate_error() but
migration/multifd: Fix leaking of Error in TLS error flow
If there is an error in multifd TLS handshake task, multifd_tls_outgoing_handshake() retrieves the error with qio_task_propagate_error() but never frees it.
Fix it by freeing the obtained Error.
In addition, the error is not reported at all, so report it with migrate_set_error().
Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231231093016.14204-8-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|