63f64d77 | 26-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Fix qmp_query_migrate mbps value
The QMP command query_migrate might see incorrect throughput numbers if it runs after we've set the migration completion status but before migration_calcu
migration: Fix qmp_query_migrate mbps value
The QMP command query_migrate might see incorrect throughput numbers if it runs after we've set the migration completion status but before migration_calculate_complete() has updated s->total_time and s->mbps.
The migration status would show COMPLETED, but the throughput value would be the one from the last iteration and not the one from the whole migration. This will usually be a larger value due to the time period being smaller (one iteration).
Move migration_calculate_complete() earlier so that the status MIGRATION_STATUS_COMPLETED is only emitted after the final counters update. Keep everything under the BQL so the QMP thread sees the updates as atomic.
Rename migration_calculate_complete to migration_completion_end to reflect its new purpose of also updating s->state.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240226143335.14282-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
cbdafc1b | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: options incompatible with cpr
Fail the migration request if options are set that are incompatible with cpr.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu
migration: options incompatible with cpr
Fail the migration request if options are set that are incompatible with cpr.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9867d4dd | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: stop vm for cpr
When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being
migration: stop vm for cpr
When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
4af667f8 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: notifier error checking
Check the status returned by migration notifiers for event type MIG_EVENT_PRECOPY_SETUP, and report errors. None of the notifiers return an error status at this t
migration: notifier error checking
Check the status returned by migration notifiers for event type MIG_EVENT_PRECOPY_SETUP, and report errors. None of the notifiers return an error status at this time.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
bf78a046 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: refactor migrate_fd_connect failures
Move common code for the error path in migrate_fd_connect to a shared fail label. No functional change.
Signed-off-by: Steve Sistare <steven.sistare
migration: refactor migrate_fd_connect failures
Move common code for the error path in migrate_fd_connect to a shared fail label. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
6835f5a1 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: per-mode notifiers
Keep a separate list of migration notifiers for each migration mode.
Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com
migration: per-mode notifiers
Keep a separate list of migration notifiers for each migration mode.
Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
5663dd3f | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: MigrationNotifyFunc
Define MigrationNotifyFunc to improve type safety and simplify migration notifiers.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <pe
migration: MigrationNotifyFunc
Define MigrationNotifyFunc to improve type safety and simplify migration notifiers.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
c763a23e | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: remove postcopy_after_devices
postcopy_after_devices and migration_in_postcopy_after_devices are no longer used, so delete them.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
migration: remove postcopy_after_devices
postcopy_after_devices and migration_in_postcopy_after_devices are no longer used, so delete them.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9d9babf7 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: MigrationEvent for notifiers
Passing MigrationState to notifiers is unsound because they could access unstable migration state internals or even modify the state. Instead, pass the minim
migration: MigrationEvent for notifiers
Passing MigrationState to notifiers is unsound because they could access unstable migration state internals or even modify the state. Instead, pass the minimal info needed in a new MigrationEvent struct, which could be extended in the future if needed.
Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
3e775730 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: convert to NotifierWithReturn
Change all migration notifiers to type NotifierWithReturn, so notifiers can return an error status in a future patch. For now, pass NULL for the notifier er
migration: convert to NotifierWithReturn
Change all migration notifiers to type NotifierWithReturn, so notifiers can return an error status in a future patch. For now, pass NULL for the notifier error parameter, and do not check the return value.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sistare@oracle.com [peterx: dropped unexpected update to roms/seabios-hppa] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
d91f33c7 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: remove error from notifier data
Remove the error object from opaque data passed to notifiers. Use the new error parameter passed to the notifier instead.
Signed-off-by: Steve Sistare <st
migration: remove error from notifier data
Remove the error object from opaque data passed to notifiers. Use the new error parameter passed to the notifier instead.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
be19d836 | 22-Feb-2024 |
Steve Sistare <steven.sistare@oracle.com> |
notify: pass error to notifier with return
Pass an error object as the third parameter to "notifier with return" notifiers, so clients no longer need to bundle an error object in the opaque data. T
notify: pass error to notifier with return
Pass an error object as the third parameter to "notifier with return" notifiers, so clients no longer need to bundle an error object in the opaque data. The new parameter is used in a later patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
c9a7e83c | 22-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop unnecessary helper to destroy IOC
Both socket_send_channel_destroy() and multifd_send_channel_destroy() are unnecessary wrappers to destroy an IOC, as the only thing to do is
migration/multifd: Drop unnecessary helper to destroy IOC
Both socket_send_channel_destroy() and multifd_send_channel_destroy() are unnecessary wrappers to destroy an IOC, as the only thing to do is to release the final IOC reference. We have plenty of code that destroys an IOC using direct unref() already; keep that style.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
72b90b96 | 22-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Cleanup outgoing_args in state destroy
outgoing_args is a global cache of socket address to be reused in multifd. Freeing the cache in per-channel destructor is more or less a hac
migration/multifd: Cleanup outgoing_args in state destroy
outgoing_args is a global cache of socket address to be reused in multifd. Freeing the cache in per-channel destructor is more or less a hack. Move it to multifd_send_cleanup_state() so it only get checked once. Use a small helper to do so because it's internal of socket.c.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
770de49c | 22-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Make multifd_channel_connect() return void
It never fails, drop the retval and also the Error**.
Suggested-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <faros
migration/multifd: Make multifd_channel_connect() return void
It never fails, drop the retval and also the Error**.
Suggested-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
0518b5d8 | 22-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop registered_yank
With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equa
migration/multifd: Drop registered_yank
With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equal meaning of "p->c != NULL".
Drop registered_yank by checking p->c instead.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9221e3c6 | 22-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Cleanup TLS iochannel referencing
Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") introduced a thread for TLS channels, which will r
migration/multifd: Cleanup TLS iochannel referencing
Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") introduced a thread for TLS channels, which will resolve the issue on blocking the main thread. However in the same commit p->c is slightly abused just to be able to pass over the pointer "p" into the thread.
That's the major reason we'll need to conditionally free the io channel in the fault paths.
To clean it up, using a separate structure to pass over both "p" and "tioc" in the tls handshake thread. Then we can make it a rule that p->c will never be set until the channel is completely setup. With that, we can drop the tricky conditional unref of the io channel in the error path.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
d13f0026 | 20-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Release recv sem_sync earlier
Now that multifd_recv_terminate_threads() is called only once, release the recv side sem_sync earlier like we do for the send side.
Signed-off-by: F
migration/multifd: Release recv sem_sync earlier
Now that multifd_recv_terminate_threads() is called only once, release the recv side sem_sync earlier like we do for the send side.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
11dd7be5 | 20-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Remove p->quit from recv side
Like we did on the sending side, replace the p->quit per-channel flag with a global atomic 'exiting' flag.
Signed-off-by: Fabiano Rosas <farosas@sus
migration/multifd: Remove p->quit from recv side
Like we did on the sending side, replace the p->quit per-channel flag with a global atomic 'exiting' flag.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
93fa9dc2 | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Add a synchronization point for channel creation
It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the chan
migration/multifd: Add a synchronization point for channel creation
It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight.
This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash.
Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads.
A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel.
Reported-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
2576ae48 | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Unify multifd and TLS connection paths
During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice,
migration/multifd: Unify multifd and TLS connection paths
During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished.
This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated.
Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS.
Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
dd904bc1 | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Move multifd_send_setup into migration thread
We currently have an unfavorable situation around multifd channels creation and the migration thread execution.
We create the multif
migration/multifd: Move multifd_send_setup into migration thread
We currently have an unfavorable situation around multifd channels creation and the migration thread execution.
We create the multifd channels with qio_channel_socket_connect_async -> qio_task_run_in_thread, but only connect them at the multifd_new_send_channel_async callback, called from qio_task_complete, which is registered as a glib event.
So at multifd_send_setup() we create the channels, but they will only be actually usable after the whole multifd_send_setup() calling stack returns back to the main loop. Which means that the migration thread is already up and running without any possibility for the multifd channels to be ready on time.
We currently rely on the channels-ready semaphore blocking multifd_send_sync_main() until channels start to come up and release it. However there have been bugs recently found when a channel's creation fails and multifd_send_cleanup() is allowed to run while other channels are still being created.
Let's start to organize this situation by moving the multifd_send_setup() call into the migration thread. That way we unblock the main-loop to dispatch the completion callbacks and actually have a chance of getting the multifd channels ready for when the migration thread needs them.
The next patches will deal with the synchronization aspects.
Note that this takes multifd_send_setup() out of the BQL.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
bd8b0a8f | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Move multifd_send_setup error handling in to the function
Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around.
Rev
migration/multifd: Move multifd_send_setup error handling in to the function
Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
a2a63c4a | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Remove p->running
We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created.
However, there are at least
migration/multifd: Remove p->running
We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created.
However, there are at least two bugs in this logic:
1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join().
2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked.
Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage.
Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that.
Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches.
Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
e1921f10 | 06-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/multifd: Join the TLS thread
We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether.
Fixes: a1af605bd5 ("migration
migration/multifd: Join the TLS thread
We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether.
Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|