3ab4441d | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side
migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:
- multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only
- multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling.
Use multifd_send_set_error() in the three old call sites where only the error will be set.
Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup().
Both helpers will need to set quitting=1.
Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
859ebaf3 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Forbid spurious wakeups
Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory
migration/multifd: Forbid spurious wakeups
Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory was right it was there because multifd used to hit that when still debugging.
Let's drop it and see what can explode; as long as it's not reaching soft-freeze.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
25a1f878 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Move header prepare/fill into send_prepare()
This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy.
Now w
migration/multifd: Move header prepare/fill into send_prepare()
This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy.
Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later.
So now the API looks like:
p->pages -----------> send_prepare() -------------> IOVs
This also prepares for the case where the input can be extended to even not any p->pages. But that's for later.
This patch will achieve similar goal of what Fabiano used to propose here:
https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de
However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano.
Co-developed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
452b2057 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: multifd_send_prepare_header()
Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender.
It's fine to setup the IOV[0] _before_ send_prepare(
migration/multifd: multifd_send_prepare_header()
Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender.
It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in.
With this helper, we can already slightly clean up the zero copy path.
Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
8a9ef173 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Move trace_multifd_send|recv()
Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var.
Reviewed-by
migration/multifd: Move trace_multifd_send|recv()
Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
db7e1cc5 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Move total_normal_pages accounting
Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures.
Reviewed
migration/multifd: Move total_normal_pages accounting
Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
05b7ec18 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Rename p->num_packets and clean it up
This field, no matter whether on src or dest, is only used for debugging purpose.
They can even be removed already, unless it still more or
migration/multifd: Rename p->num_packets and clean it up
This field, no matter whether on src or dest, is only used for debugging purpose.
They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t).
So let's keep them for now, but make them much easier to understand, by doing below:
- Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num.
- Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1.
- Move them to where we send/recv the packets. They're:
- multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
83c560fb | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop pages->num check in sender thread
Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead.
Reviewed-by: Fabiano Rosas <farosas
migration/multifd: Drop pages->num check in sender thread
Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
e3cce9af | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Simplify locking in sender thread
The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations
migration/multifd: Simplify locking in sender thread
The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations, because the requester can already read pending_job safely even without the lock, because the requester is currently the only one who can assign a task.
Drop that lock complication on both sides:
(1) in the sender thread, always take the mutex until job done (2) in the requester thread, check pending_job clear lockless
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
f5f48a78 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Separate SYNC request with normal jobs
Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a
migration/multifd: Separate SYNC request with normal jobs
Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request.
The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages.
Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job.
However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where:
- migration thread enqueue a job to send some pages, pending_job++ (0->1)
- [...before the selected multifd sender thread wakes up...]
- migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC
- multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages
This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information.
I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky.
One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated.
This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags.
With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway.
Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0().
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
efd8c543 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop MultiFDSendParams.normal[] array
This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should b
migration/multifd: Drop MultiFDSendParams.normal[] array
This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array.
Alongside, the normal_num is also redundant, which is the same to p->pages->num.
This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
836eca47 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Postpone reset of MultiFDPages_t
Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "*pages" stru
migration/multifd: Postpone reset of MultiFDPages_t
Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "*pages" struct will not be reused anyway until pending_job is cleared.
Move that to the end after the job is completed, provide a helper to reset a "*pages" object. Use that same helper when free the object too.
This prepares us to keep using p->pages in the follow up patches, where we may drop p->normal[].
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
15f3f21d | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Multifd send side has two fields to indicate error quits:
- MultiFDSendParams.quit - &multifd_send_state->exiting
Merge them
migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Multifd send side has two fields to indicate error quits:
- MultiFDSendParams.quit - &multifd_send_state->exiting
Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock.
A few more things done on top of this altogether:
- multifd_send_terminate_threads()
Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state().
- multifd_send_sync_main()
In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit.
- multifd_tls_outgoing_handshake()
Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel.
- multifd_new_send_channel_async()
Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
48c0f5d5 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: multifd_send_kick_main()
When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon.
Provide a hel
migration/multifd: multifd_send_kick_main()
When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon.
Provide a helper for it and deduplicate the code.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
8888a552 | 02-Feb-2024 |
Peter Xu <peterx@redhat.com> |
migration/multifd: Drop stale comment for multifd zero copy
We've already done that with multifd_flush_after_each_section, for multifd in general. Drop the stale "TODO-like" comment.
Reviewed-by:
migration/multifd: Drop stale comment for multifd zero copy
We've already done that with multifd_flush_after_each_section, for multifd in general. Drop the stale "TODO-like" comment.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
06152b89 | 30-Jan-2024 |
William Roche <william.roche@oracle.com> |
migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memor
migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace:
Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start
To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM.
Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
44d0d456 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and scheduling together and wrap the dispatch with an intermediary function that will ensure we always keep the ref/unref balanced.
Also move the responsibility of deleting the BH into the wrapper and remove the now unnecessary pointers.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
699d9476 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the current_migration object.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https
migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the current_migration object.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
9cf26896 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Ev
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Even on this load-side function, we might still use the MigrationState, e.g to check for capabilities.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
59094cfa | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use.
migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
27eb8499 | 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half to run without holding a reference to the 'current_migration' object, w
migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half to run without holding a reference to the 'current_migration' object, which leads to a segmentation fault if the BH is still live after migration_shutdown() has dropped the last reference to current_migration.
In my system the bug manifests as migrate_multifd() returning true when it shouldn't and multifd_load_shutdown() calling multifd_recv_terminate_threads() which crashes due to an uninitialized multifd_recv_state.
Fix the issue by holding a reference to the object when scheduling the BH and dropping it before returning from the BH. The same is already done for the cleanup_bh at migrate_fd_cleanup_schedule().
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969 Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
0a5d1108 | 11-Sep-2023 |
Fabiano Rosas <farosas@suse.de> |
migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering yank functions. Query for features instead.
The yank method for all channels used with migra
migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering yank functions. Query for features instead.
The yank method for all channels used with migration code currently is to call the qio_channel_shutdown() function, so query for QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the future for indicating whether a channel supports yanking, but that seems overkill at the moment.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20230911171320.24372-9-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
b0504edd | 17-Jan-2024 |
Peter Xu <peterx@redhat.com> |
migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means this check:
remaining_size < s->threshold_size
Must have been d
migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means this check:
remaining_size < s->threshold_size
Must have been done already, actually at migration_iteration_run():
if (must_precopy <= s->threshold_size) { qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
That should be after one round of ram_state_pending_estimate(). It makes the 2nd check meaningless and can be dropped.
To say it in another way, when reaching ->state_pending_exact(), we unconditionally sync dirty bits for precopy.
Then we can drop migrate_get_current() there too.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240117075848.139045-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
a8629e0c | 17-Jan-2024 |
Peter Xu <peterx@redhat.com> |
migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t. Make it always clear that it's never a negative.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro
migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t. Make it always clear that it's never a negative.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
918f620d | 17-Jan-2024 |
Markus Armbruster <armbru@redhat.com> |
migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails. Plug the leak with g_autoptr().
Fixes: 967f2de5c9ec (migration: Implement MigrateChannelLi
migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails. Plug the leak with g_autoptr().
Fixes: 967f2de5c9ec (migration: Implement MigrateChannelList to hmp migration flow.) v8.2.0-rc0 Fixes: CID 1533125 Signed-off-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20240117140722.3979657-1-armbru@redhat.com [peterx: fix CID number as reported by Peter Maydell] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|