| 9d7bd082 | 19-Nov-2019 |
Michael Roth <mdroth@linux.vnet.ibm.com> |
virtio-pci: disable vring processing when bus-mastering is disabled
Currently the SLOF firmware for pseries guests will disable/re-enable a PCI device multiple times via IO/MEM/MASTER bits of PCI_CO
virtio-pci: disable vring processing when bus-mastering is disabled
Currently the SLOF firmware for pseries guests will disable/re-enable a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between.
In QEMU, when PCI_COMMAND_MASTER is disabled we disable the corresponding IOMMU memory region, so DMA accesses (including to vring fields like idx/flags) will no longer undergo the necessary translation. Normally we wouldn't expect this to happen since it would be misbehavior on the driver side to continue driving DMA requests.
However, in the case of pseries, with iommu_platform=on, we trigger the following sequence when tearing down the virtio-blk dataplane ioeventfd in response to the guest unsetting PCI_COMMAND_MASTER:
#2 0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757 #3 0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950 #4 0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255 #5 0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776 #6 0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550 #7 0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546 #8 0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8) at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527 #9 0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8) at /home/mdroth/w/qemu.git/util/aio-posix.c:520 #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0) at /home/mdroth/w/qemu.git/util/aio-posix.c:607 #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true) at /home/mdroth/w/qemu.git/util/aio-posix.c:639 #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0) at /home/mdroth/w/qemu.git/util/aio-wait.c:71 #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>) at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288 #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245 #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38) at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237 #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292 #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>) at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613
I.e. the calling code is only scheduling a one-shot BH for virtio_blk_data_plane_stop_bh, but somehow we end up trying to process an additional virtqueue entry before we get there. This is likely due to the following check in virtio_queue_host_notifier_aio_poll:
static bool virtio_queue_host_notifier_aio_poll(void *opaque) { EventNotifier *n = opaque; VirtQueue *vq = container_of(n, VirtQueue, host_notifier); bool progress;
if (!vq->vring.desc || virtio_queue_empty(vq)) { return false; }
progress = virtio_queue_notify_aio_vq(vq);
namely the call to virtio_queue_empty(). In this case, since no new requests have actually been issued, shadow_avail_idx == last_avail_idx, so we actually try to access the vring via vring_avail_idx() to get the latest non-shadowed idx:
int virtio_queue_empty(VirtQueue *vq) { bool empty; ...
if (vq->shadow_avail_idx != vq->last_avail_idx) { return 0; }
rcu_read_lock(); empty = vring_avail_idx(vq) == vq->last_avail_idx; rcu_read_unlock(); return empty;
but since the IOMMU region has been disabled we get a bogus value (0 usually), which causes virtio_queue_empty() to falsely report that there are entries to be processed, which causes errors such as:
"virtio: zero sized buffers are not allowed"
or
"virtio-blk missing headers"
and puts the device in an error state.
This patch works around the issue by introducing virtio_set_disabled(), which sets a 'disabled' flag to bypass checks like virtio_queue_empty() when bus-mastering is disabled. Since we'd check this flag at all the same sites as vdev->broken, we replace those checks with an inline function which checks for either vdev->broken or vdev->disabled.
The 'disabled' flag is only migrated when set, which should be fairly rare, but to maintain migration compatibility we disable it's use for older machine types. Users requiring the use of the flag in conjunction with older machine types can set it explicitly as a virtio-device option.
NOTES:
- This leaves some other oddities in play, like the fact that DRIVER_OK also gets unset in response to bus-mastering being disabled, but not restored (however the device seems to continue working) - Similarly, we disable the host notifier via virtio_bus_stop_ioeventfd(), which seems to move the handling out of virtio-blk dataplane and back into the main IO thread, and it ends up staying there till a reset (but otherwise continues working normally)
Cc: David Gibson <david@gibson.dropbear.id.au>, Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
| fcd3f2cc | 12-Dec-2019 |
Igor Mammedov <imammedo@redhat.com> |
numa: properly check if numa is supported
Commit aa57020774b, by mistake used MachineClass::numa_mem_supported to check if NUMA is supported by machine and also as unrelated change set it to true fo
numa: properly check if numa is supported
Commit aa57020774b, by mistake used MachineClass::numa_mem_supported to check if NUMA is supported by machine and also as unrelated change set it to true for sbsa-ref board.
Luckily change didn't break machines that support NUMA, as the field is set to true for them.
But the field is not intended for checking if NUMA is supported and will be flipped to false within this release for new machine types.
Fix it: - by using previously used condition !mc->cpu_index_to_instance_props || !mc->get_default_cpu_node_id the first time and then use MachineState::numa_state down the road to check if NUMA is supported - dropping stray sbsa-ref chunk
Fixes: aa57020774b690a22be72453b8e91c9b5a68c516 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1576154936-178362-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
show more ...
|
| 40f03bd5 | 05-Dec-2019 |
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> |
hw/core/qdev: cleanup Error ** variables
Rename Error ** parameter in check_only_migratable to common errp.
In device_set_realized:
- Move "if (local_err != NULL)" closer to error setters.
- Dr
hw/core/qdev: cleanup Error ** variables
Rename Error ** parameter in check_only_migratable to common errp.
In device_set_realized:
- Move "if (local_err != NULL)" closer to error setters.
- Drop 'Error **local_errp': it doesn't save any LoCs, but it's very unusual.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20191205174635.18758-10-vsementsov@virtuozzo.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
show more ...
|
| 46472d82 | 13-Nov-2019 |
Paolo Bonzini <pbonzini@redhat.com> |
xen: convert "-machine igd-passthru" to an accelerator property
The first machine property to fall is Xen's Intel integrated graphics passthrough. The "-machine igd-passthru" option does not set an
xen: convert "-machine igd-passthru" to an accelerator property
The first machine property to fall is Xen's Intel integrated graphics passthrough. The "-machine igd-passthru" option does not set anymore a property on the machine object, but desugars to a GlobalProperty on accelerator objects.
The setter is very simple, since the value ends up in a global variable, so this patch also provides an example before the more complicated cases that follow it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
| 88ed5db1 | 06-Nov-2019 |
Greg Kurz <groug@kaod.org> |
numa: Add missing \n to error message
If memory allocation fails when using -mem-path, QEMU is supposed to print out a message to indicate that fallback to anonymous RAM is deprecated. This is done
numa: Add missing \n to error message
If memory allocation fails when using -mem-path, QEMU is supposed to print out a message to indicate that fallback to anonymous RAM is deprecated. This is done with error_printf() which does output buffering. As a consequence, the message is only printed at the next flush, eg. when quiting QEMU, and it also lacks a trailing newline:
qemu-system-ppc64: unable to map backing store for guest RAM: Cannot allocate memory qemu-system-ppc64: warning: falling back to regular RAM allocation QEMU 4.1.50 monitor - type 'help' for more information (qemu) q This is deprecated. Make sure that -mem-path specified path has sufficient resources to allocate -m specified RAM amountgreg@boss02:~/Work/qemu/qemu-spapr$
Add the missing \n to fix both issues.
Fixes: cb79224b7e4b "deprecate -mem-path fallback to anonymous RAM" Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157304440026.351774.14607704217028190097.stgit@bahia.lan> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
show more ...
|
| a1190ab6 | 29-Oct-2019 |
Jens Freimann <jfreimann@redhat.com> |
migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figu
migration: allow unplug during migration for failover devices
In "b06424de62 migration: Disable hotplug/unplug during migration" we added a check to disable unplug for all devices until we have figured out what works. For failover primary devices qdev_unplug() is called from the migration handler, i.e. during migration.
This patch adds a flag to DeviceState which is set to false for all devices and makes an exception for PCI devices that are also primary devices in a failover pair.
Signed-off-by: Jens Freimann <jfreimann@redhat.com> Message-Id: <20191029114905.6856-8-jfreimann@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
| 78b6eaa6 | 08-Oct-2019 |
Peter Maydell <peter.maydell@linaro.org> |
ptimer: Provide new transaction-based API
Provide the new transaction-based API. If a ptimer is created using ptimer_init() rather than ptimer_init_with_bh(), then instead of providing a QEMUBH, it
ptimer: Provide new transaction-based API
Provide the new transaction-based API. If a ptimer is created using ptimer_init() rather than ptimer_init_with_bh(), then instead of providing a QEMUBH, it provides a pointer to the callback function directly, and has opted into the transaction API. All calls to functions which modify ptimer state: - ptimer_set_period() - ptimer_set_freq() - ptimer_set_limit() - ptimer_set_count() - ptimer_run() - ptimer_stop() must be between matched calls to ptimer_transaction_begin() and ptimer_transaction_commit(). When ptimer_transaction_commit() is called it will evaluate the state of the timer after all the changes in the transaction, and call the callback if necessary.
In the old API the individual update functions generally would call ptimer_trigger() immediately, which would schedule the QEMUBH. In the new API the update functions will instead defer the "set s->next_event and call ptimer_reload()" work to ptimer_transaction_commit().
Because ptimer_trigger() can now immediately call into the device code which may then call other ptimer functions that update ptimer_state fields, we must be more careful in ptimer_reload() not to cache fields from ptimer_state across the ptimer_trigger() call. (This was harmless with the QEMUBH mechanism as the BH would not be invoked until much later.)
We use assertions to check that: * the functions modifying ptimer state are not called outside a transaction block * ptimer_transaction_begin() and _commit() calls are paired * the transaction API is not used with a QEMUBH ptimer
There is some slight repetition of code: * most of the set functions have similar looking "if s->bh call ptimer_reload, otherwise set s->need_reload" code * ptimer_init() and ptimer_init_with_bh() have similar code We deliberately don't try to avoid this repetition, because it will all be deleted when the QEMUBH version of the API is removed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-3-peter.maydell@linaro.org
show more ...
|