Revision tags: v9.1.0 |
|
#
97d2b66d |
| 13-Aug-2024 |
Nicholas Piggin <npiggin@gmail.com> |
savevm: Fix load_snapshot error path crash
An error path missed setting *errp, which can cause a NULL deref.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Nicholas Piggin <npiggi
savevm: Fix load_snapshot error path crash
An error path missed setting *errp, which can cause a NULL deref.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20240813050638.446172-11-npiggin@gmail.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20240813202329.1237572-19-alex.bennee@linaro.org>
show more ...
|
#
c80e2251 |
| 27-Jun-2024 |
Akihiko Odaki <akihiko.odaki@daynix.com> |
migration: Free removed SaveStateEntry
This fixes LeakSanitizer warnings.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Ro
migration: Free removed SaveStateEntry
This fixes LeakSanitizer warnings.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
4146b77e |
| 19-Jun-2024 |
Peter Xu <peterx@redhat.com> |
migration/postcopy: Add postcopy-recover-setup phase
This patch adds a migration state on src called "postcopy-recover-setup". The new state will describe the intermediate step starting from when th
migration/postcopy: Add postcopy-recover-setup phase
This patch adds a migration state on src called "postcopy-recover-setup". The new state will describe the intermediate step starting from when the src QEMU received a postcopy recovery request, until the migration channels are properly established, but before the recovery process take place.
The request came from Libvirt where Libvirt currently rely on the migration state events to detect migration state changes. That works for most of the migration process but except postcopy recovery failures at the beginning.
Currently postcopy recovery only has two major states:
- postcopy-paused: this is the state that both sides of QEMU will be in for a long time as long as the migration channel was interrupted.
- postcopy-recover: this is the state where both sides of QEMU handshake with each other, preparing for a continuation of postcopy which used to be interrupted.
The issue here is when the recovery port is invalid, the src QEMU will take the URI/channels, noticing the ports are not valid, and it'll silently keep in the postcopy-paused state, with no event sent to Libvirt. In this case, the only thing Libvirt can do is to poll the migration status with a proper interval, however that's less optimal.
Considering that this is the only case where Libvirt won't get a notification from QEMU on such events, let's add postcopy-recover-setup state to mimic what we have with the "setup" state of a newly initialized migration, describing the phase of connection establishment.
With that, postcopy recovery will have two paths to go now, and either path will guarantee an event generated. Now the events will look like this during a recovery process on src QEMU:
- Initially when the recovery is initiated on src, QEMU will go from "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have this event.
- Depending on whether the channel re-establishment is succeeded:
- In succeeded case, src QEMU will move from "postcopy-recover-setup" to "postcopy-recover". Old QEMUs also have this event.
- In failure case, src QEMU will move from "postcopy-recover-setup" to "postcopy-paused" again. Old QEMUs don't have this event.
This guarantees that Libvirt will always receive a notification for recovery process properly.
One thing to mention is, such new status is only needed on src QEMU not both. On dest QEMU, the state machine doesn't change. Hence the events don't change either. It's done like so because dest QEMU may not have an explicit point of setup start. E.g., it can happen that when dest QEMUs doesn't use migrate-recover command to use a new URI/channel, but the old URI/channels can be reused in recovery, in which case the old ports simply can work again after the network routes are fixed up.
Add a new helper postcopy_is_paused() detecting whether postcopy is still paused, taking RECOVER_SETUP into account too. When using it on both src/dst, a slight change is done altogether to always wait for the semaphore before checking the status, because for both sides a sem_post() will be required for a recovery.
Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Prasad Pandit <ppandit@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Buglink: https://issues.redhat.com/browse/RHEL-38485 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
60ce4767 |
| 19-Jun-2024 |
Peter Xu <peterx@redhat.com> |
migration: Rename thread debug names
The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on cha
migration: Rename thread debug names
The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen".
Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction.
For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".
We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet.
Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
fdac62db |
| 13-May-2024 |
Markus Armbruster <armbru@redhat.com> |
migration: Rephrase message on failure to save / load Xen device state
Functions that use an Error **errp parameter to return errors should not also report them to the user, because reporting is the
migration: Rephrase message on failure to save / load Xen device state
Functions that use an Error **errp parameter to return errors should not also report them to the user, because reporting is the caller's job. When the caller does, the error is reported twice. When it doesn't (because it recovered from the error), there is no error to report, i.e. the report is bogus.
qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate this principle: they call qemu_save_device_state() and qemu_loadvm_state(), which call error_report_err().
I wish I could clean this up now, but migration's error reporting is too complicated (confused?) for me to mess with it.
Instead, I'm merely improving the error reported by qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the QMP core from
An IO error has occurred
to saving Xen device state failed
and
loading Xen device state failed
respectively.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240513141703.549874-6-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
eef0bae3 |
| 30-Apr-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Remove block migration
The block migration has been considered obsolete since QEMU 8.2 in favor of the more flexible storage migration provided by the blockdev-mirror driver. Two releases
migration: Remove block migration
The block migration has been considered obsolete since QEMU 8.2 in favor of the more flexible storage migration provided by the blockdev-mirror driver. Two releases have passed so now it's time to remove it.
Deprecation commit 66db46ca83 ("migration: Deprecate block migration").
Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
show more ...
|
#
00580786 |
| 12-Mar-2024 |
Philippe Mathieu-Daudé <philmd@linaro.org> |
qapi: Inline and remove QERR_MIGRATION_ACTIVE definition
Address the comment added in commit 4629ed1e98 ("qerror: Finally unused, clean up"), from 2015:
/* * These macros will go away, please
qapi: Inline and remove QERR_MIGRATION_ACTIVE definition
Address the comment added in commit 4629ed1e98 ("qerror: Finally unused, clean up"), from 2015:
/* * These macros will go away, please don't use * in new code, and do not add new ones! */
Mechanical transformation using sed, manually removing the definition in include/qapi/qmp/qerror.h.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240312141343.3168265-10-armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> [Straightforward conflict with commit aeaafb1e59f (migration: export migration_is_running) resolved]
show more ...
|
#
e4fa064d |
| 20-Mar-2024 |
Cédric Le Goater <clg@redhat.com> |
migration: Add Error** argument to .load_setup() handler
This will be useful to report errors at a higher level, mostly in VFIO today.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Review
migration: Add Error** argument to .load_setup() handler
This will be useful to report errors at a higher level, mostly in VFIO today.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-9-clg@redhat.com [peterx: drop comment for ERRP_GUARD, per Markus] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
01c3ac68 |
| 20-Mar-2024 |
Cédric Le Goater <clg@redhat.com> |
migration: Add Error** argument to .save_setup() handler
The purpose is to record a potential error in the migration stream if qemu_savevm_state_setup() fails. Most of the current .save_setup() hand
migration: Add Error** argument to .save_setup() handler
The purpose is to record a potential error in the migration stream if qemu_savevm_state_setup() fails. Most of the current .save_setup() handlers can be modified to use the Error argument instead of managing their own and calling locally error_report().
Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Cc: John Snow <jsnow@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-8-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
057a2009 |
| 20-Mar-2024 |
Cédric Le Goater <clg@redhat.com> |
migration: Add Error** argument to qemu_savevm_state_setup()
This prepares ground for the changes coming next which add an Error** argument to the .save_setup() handler. Callers of qemu_savevm_state
migration: Add Error** argument to qemu_savevm_state_setup()
This prepares ground for the changes coming next which add an Error** argument to the .save_setup() handler. Callers of qemu_savevm_state_setup() now handle the error and fail earlier setting the migration state from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED.
In qemu_savevm_state(), move the cleanup to preserve the error reported by .save_setup() handlers.
Since the previous behavior was to ignore errors at this step of migration, this change should be examined closely to check that cleanups are still correctly done.
Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-7-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
6138d43a |
| 20-Mar-2024 |
Cédric Le Goater <clg@redhat.com> |
migration: Add Error** argument to vmstate_save()
This will prepare ground for future changes adding an Error** argument to qemu_savevm_state_setup().
Reviewed-by: Prasad Pandit <pjp@fedoraproject.
migration: Add Error** argument to vmstate_save()
This will prepare ground for future changes adding an Error** argument to qemu_savevm_state_setup().
Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-6-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
7afbdada |
| 04-Apr-2024 |
Wei Wang <wei.w.wang@intel.com> |
migration/postcopy: ensure preempt channel is ready before loading states
Before loading the guest states, ensure that the preempt channel has been ready to use, as some of the states (e.g. via virt
migration/postcopy: ensure preempt channel is ready before loading states
Before loading the guest states, ensure that the preempt channel has been ready to use, as some of the states (e.g. via virtio_load) might trigger page faults that will be handled through the preempt channel. So yield to the main thread in the case that the channel create event hasn't been dispatched.
Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread") Originally-by: Lei Wang <lei4.wang@intel.com> Link: https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel.com/T/ Signed-off-by: Lei Wang <lei4.wang@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Link: https://lore.kernel.org/r/20240405034056.23933-1-wei.w.wang@intel.com [peterx: add a todo section, add Fixes and copy stable for 8.0+] Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
aeaafb1e |
| 11-Mar-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: export migration_is_running
Delete the MigrationState parameter from migration_is_running and move it to the public API in misc.h.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com
migration: export migration_is_running
Delete the MigrationState parameter from migration_is_running and move it to the public API in misc.h.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
e6e08e83 |
| 04-Mar-2024 |
Cédric Le Goater <clg@redhat.com> |
migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
When commit bd2270608fa0 ("migration/ram.c: add a notifier chain for precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end
migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
When commit bd2270608fa0 ("migration/ram.c: add a notifier chain for precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of qemu_savevm_state_setup(), it didn't take into account a possible error in the loop calling vmstate_save() or .save_setup() handlers.
Check ret value before calling the notifiers.
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240304122844.1888308-10-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
4e1871c4 |
| 04-Mar-2024 |
Avihai Horon <avihaih@nvidia.com> |
migration: Don't serialize devices in qemu_savevm_state_iterate()
Commit 90697be8896c ("live migration: Serialize vmstate saving in stage 2") introduced device serialization in qemu_savevm_state_ite
migration: Don't serialize devices in qemu_savevm_state_iterate()
Commit 90697be8896c ("live migration: Serialize vmstate saving in stage 2") introduced device serialization in qemu_savevm_state_iterate(). The rationale behind it was to first complete migration of slower changing block devices and only then migrate the RAM, to avoid sending fast changing RAM pages over and over.
This commit was added a long time ago, and while it was useful back then, it is not the case anymore: 1. Block migration is deprecated, see commit 66db46ca83b8 ("migration: Deprecate block migration"). 2. Today there are other iterative devices besides RAM and block, such as VFIO, which are registered for migration after RAM. With current serialization behavior, a fast changing device can block other devices from sending their data, which may prevent migration from converging in some cases.
The issue described in item 2 was observed in several VFIO migration scenarios with switchover-ack capability enabled, where some workload on the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO initial pre-copy data from being sent. Hence, destination could not ack switchover and migration could not converge.
Fix that by not serializing iterative devices in qemu_savevm_state_iterate().
Note that this still doesn't fully prevent device starvation. As correctly pointed out by Peter [1], a fast changing device might constantly consume all allocated bandwidth and block the following devices. However, this scenario is more likely to happen only if max-bandwidth is low.
[1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/
Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240304105339.20713-2-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
4ed49feb |
| 29-Feb-2024 |
Fabiano Rosas <farosas@suse.de> |
migration/ram: Introduce 'mapped-ram' migration capability
Add a new migration capability 'mapped-ram'.
The core of the feature is to ensure that RAM pages are mapped directly to offsets in the res
migration/ram: Introduce 'mapped-ram' migration capability
Add a new migration capability 'mapped-ram'.
The core of the feature is to ensure that RAM pages are mapped directly to offsets in the resulting migration file instead of being streamed at arbitrary points.
The reasons why we'd want such behavior are:
- The resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a VM with, say, 1G of RAM can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory.
- It paves the way to implement O_DIRECT-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets.
- It allows the usage of multifd so we can write RAM pages to the migration file in parallel.
For now, enabling the capability has no effect. The next couple of patches implement the core functionality.
Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
44d0d456 |
| 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and
migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues.
Move BH creation and scheduling together and wrap the dispatch with an intermediary function that will ensure we always keep the ref/unref balanced.
Also move the responsibility of deleting the BH into the wrapper and remove the now unnecessary pointers.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
9cf26896 |
| 19-Jan-2024 |
Fabiano Rosas <farosas@suse.de> |
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Ev
migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Even on this load-side function, we might still use the MigrationState, e.g to check for capabilities.
Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
58b10570 |
| 03-Jan-2024 |
Steve Sistare <steven.sistare@oracle.com> |
migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle
migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle an additional problematic scenario, which is that a running vm must remain running if it loads a suspended snapshot.
To save, the existing vm_stop call now completely stops the suspended state. Finish with vm_resume to leave the vm in the state it had prior to the save, correctly restoring the suspended state.
To load, if the snapshot is not suspended, then vm_stop + vm_resume correctly handles all states, and leaves the vm in the state it had prior to the load. However, if the snapshot is suspended, restoration is trickier. First, call vm_resume to restore the state to suspended so the current state matches the saved state. Then, if the pre-load state is running, call wakeup to resume running.
Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and RUN_STATE_RESTORE_VM did not change runstate if the current state was suspended, but now it does, so allow these transitions.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1704312341-66640-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
show more ...
|
#
a77ffe95 |
| 20-Dec-2023 |
Richard Henderson <richard.henderson@linaro.org> |
migration: Constify VMState
Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-67-richard.henderson@l
migration: Constify VMState
Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-67-richard.henderson@linaro.org>
show more ...
|
#
20270019 |
| 20-Dec-2023 |
Richard Henderson <richard.henderson@linaro.org> |
migration: Make VMStateDescription.subsections const
Allow the array of pointers to itself be const. Propagate this through the copies of this field.
Tested-by: Philippe Mathieu-Daudé <philmd@linar
migration: Make VMStateDescription.subsections const
Allow the array of pointers to itself be const. Propagate this through the copies of this field.
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-2-richard.henderson@linaro.org>
show more ...
|
#
b49f4755 |
| 05-Dec-2023 |
Stefan Hajnoczi <stefanha@redhat.com> |
block: remove AioContext locking
This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users.
There isn't a clean way to split
block: remove AioContext locking
This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users.
There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-ID: <20231205182011.1976568-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
show more ...
|
#
721da039 |
| 26-Oct-2023 |
Cédric Le Goater <clg@redhat.com> |
util/uuid: Add UUID_STR_LEN definition
qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a define for this size and u
util/uuid: Add UUID_STR_LEN definition
qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a define for this size and use it where required.
Cc: Fam Zheng <fam@euphon.net> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>
show more ...
|
#
3e5f3bcd |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints f
migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:
https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com
Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage.
Besides the checkpoints in the previous patch, this patch also added destination checkpoints.
On src, we have these checkpoints added:
- src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed
On dst, we have these checkpoints added:
- dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-*: record BH steps to resume VM for precopy - dst-postcopy-bh-*: record BH steps to resume VM for postcopy
On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information.
With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary.
Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints.
Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>
show more ...
|
#
3c80f142 |
| 30-Oct-2023 |
Peter Xu <peterx@redhat.com> |
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the
migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems:
- savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime
- They don't have an identifier to show the type of sections, so no way to filter downtime information either easily.
We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load.
Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently.
Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both.
Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>
show more ...
|