ebdede64 | 18-Jun-2024 |
Stefano Garzarella <sgarzare@redhat.com> |
libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD message if MFD_ALLOW_SEALING is not defined, since it's not able to
libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD message if MFD_ALLOW_SEALING is not defined, since it's not able to create a memfd.
VHOST_USER_GET_INFLIGHT_FD is used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask that feature if the backend is not able to properly handle these messages.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240618100043.144657-5-sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
92b58bc7 | 18-Jun-2024 |
Stefano Garzarella <sgarzare@redhat.com> |
libvhost-user: fail vu_message_write() if sendmsg() is failing
In vu_message_write() we use sendmsg() to send the message header, then a write() to send the payload.
If sendmsg() fails we should av
libvhost-user: fail vu_message_write() if sendmsg() is failing
In vu_message_write() we use sendmsg() to send the message header, then a write() to send the payload.
If sendmsg() fails we should avoid sending the payload, since we were unable to send the header.
Discovered before fixing the issue with the previous patch, where sendmsg() failed on macOS due to wrong parameters, but the frontend still sent the payload which the backend incorrectly interpreted as a wrong header.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240618100043.144657-4-sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
52767e10 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP
We already use MADV_NORESERVE to deal with sparse memory regions. Let's also set madvise(MADV_DONTDUMP), otherwise a crash of the process c
libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP
We already use MADV_NORESERVE to deal with sparse memory regions. Let's also set madvise(MADV_DONTDUMP), otherwise a crash of the process can result in us allocating all memory in the mmap'ed region for dumping purposes.
This change implies that the mmap'ed rings won't be included in a coredump. If ever required for debugging purposes, we could mark only the mapped rings MADV_DODUMP.
Ignore errors during madvise() for now.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-15-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
67f4f663 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions
Currently, we try to remap all rings whenever we add a single new memory region. That doesn't quite make sense, be
libvhost-user: Dynamically remap rings after (temporarily?) removing memory regions
Currently, we try to remap all rings whenever we add a single new memory region. That doesn't quite make sense, because we already map rings when setting the ring address, and panic if that goes wrong. Likely, that handling was simply copied from set_mem_table code, where we actually have to remap all rings.
Remapping all rings might require us to walk quite a lot of memory regions to perform the address translations. Ideally, we'd simply remove that remapping.
However, let's be a bit careful. There might be some weird corner cases where we might temporarily remove a single memory region (e.g., resize it), that would have worked for now. Further, a ring might be located on hotplugged memory, and as the VM reboots, we might unplug that memory, to hotplug memory before resetting the ring addresses.
So let's unmap affected rings as we remove a memory region, and try dynamically mapping the ring again when required.
Acked-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-14-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
2a290227 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Factor out vq usability check
Let's factor it out to prepare for further changes.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.c
libvhost-user: Factor out vq usability check
Let's factor it out to prepare for further changes.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-13-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
b2b63008 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Use most of mmap_offset as fd_offset
In the past, QEMU would create memory regions that could partially cover hugetlb pages, making mmap() fail if we would use the mmap_offset as an f
libvhost-user: Use most of mmap_offset as fd_offset
In the past, QEMU would create memory regions that could partially cover hugetlb pages, making mmap() fail if we would use the mmap_offset as an fd_offset. For that reason, we never used the mmap_offset as an offset into the fd and instead always mapped the fd from the very start.
However, that can easily result in us mmap'ing a lot of unnecessary parts of an fd, possibly repeatedly.
QEMU nowadays does not create memory regions that partially cover huge pages -- it never really worked with postcopy. QEMU handles merging of regions that partially cover huge pages (due to holes in boot memory) since 2018 in c1ece84e7c93 ("vhost: Huge page align and merge").
Let's be a bit careful and not unconditionally convert the mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb size and pass as much as we can as fd_offset, making sure that we call mmap() with a properly aligned offset.
With QEMU and a virtio-mem device that is fully plugged (50GiB using 50 memslots) the qemu-storage daemon process consumes in the VA space 1281GiB before this change and 58GiB after this change.
================ Vhost user message ================ Request: VHOST_USER_ADD_MEM_REG (37) Flags: 0x9 Size: 40 Fds: 59 Adding region 4 guest_phys_addr: 0x0000000200000000 memory_size: 0x0000000040000000 userspace_addr: 0x00007fb73bffe000 old mmap_offset: 0x0000000080000000 fd_offset: 0x0000000080000000 new mmap_offset: 0x0000000000000000 mmap_addr: 0x00007f02f1bdc000 Successfully added new region ================ Vhost user message ================ Request: VHOST_USER_ADD_MEM_REG (37) Flags: 0x9 Size: 40 Fds: 59 Adding region 5 guest_phys_addr: 0x0000000240000000 memory_size: 0x0000000040000000 userspace_addr: 0x00007fb77bffe000 old mmap_offset: 0x00000000c0000000 fd_offset: 0x00000000c0000000 new mmap_offset: 0x0000000000000000 mmap_addr: 0x00007f0284000000 Successfully added new region
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-12-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
a3c0118c | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
Let's speed up GPA to memory region / virtual address lookup. Store the memory regions ordered by guest physical addresses, and use bina
libvhost-user: Speedup gpa_to_mem_region() and vu_gpa_to_va()
Let's speed up GPA to memory region / virtual address lookup. Store the memory regions ordered by guest physical addresses, and use binary search for address translation, as well as when adding/removing memory regions.
Most importantly, this will speed up GPA->VA address translation when we have many memslots.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-11-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
60ccdca4 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Factor out search for memory region by GPA and simplify
Memory regions cannot overlap, and if we ever hit that case something would be really flawed.
For example, when vhost code in
libvhost-user: Factor out search for memory region by GPA and simplify
Memory regions cannot overlap, and if we ever hit that case something would be really flawed.
For example, when vhost code in QEMU decides to increase the size of memory regions to cover full huge pages, it makes sure to never create overlaps, and if there would be overlaps, it would bail out.
QEMU commits 48d7c9757749 ("vhost: Merge sections added to temporary list"), c1ece84e7c93 ("vhost: Huge page align and merge") and e7b94a84b6cb ("vhost: Allow adjoining regions") added and clarified that handling and how overlaps are impossible.
Consequently, each GPA can belong to at most one memory region, and everything else doesn't make sense. Let's factor out our search to prepare for further changes.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-10-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
9c254cb4 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Don't search for duplicates when removing memory regions
We cannot have duplicate memory regions, something would be deeply flawed elsewhere. Let's just stop the search once we found
libvhost-user: Don't search for duplicates when removing memory regions
We cannot have duplicate memory regions, something would be deeply flawed elsewhere. Let's just stop the search once we found an entry.
We'll add more sanity checks when adding memory regions later.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-9-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
c6f90b78 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Don't zero out memory for memory regions
dev->nregions always covers only valid entries. Stop zeroing out other array elements that are unused.
Reviewed-by: Raphael Norwitz <raphael@
libvhost-user: Don't zero out memory for memory regions
dev->nregions always covers only valid entries. Stop zeroing out other array elements that are unused.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-8-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
4f865c3b | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: No need to check for NULL when unmapping
We never add a memory region if mmap() failed. Therefore, no need to check for NULL.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Ack
libvhost-user: No need to check for NULL when unmapping
We never add a memory region if mmap() failed. Therefore, no need to check for NULL.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-7-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
93fec23d | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Factor out adding a memory region
Let's factor it out, reducing quite some code duplication and perparing for further changes.
If we fail to mmap a region and panic, we now simply do
libvhost-user: Factor out adding a memory region
Let's factor it out, reducing quite some code duplication and perparing for further changes.
If we fail to mmap a region and panic, we now simply don't add that (broken) region.
Note that we now increment dev->nregions as we are successfully adding memory regions, and don't increment dev->nregions if anything went wrong.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-6-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
05a58ce4 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()
Let's reduce some code duplication and prepare for further changes.
Reviewed-by: Raphael Norwitz <raphael@enfabric
libvhost-user: Merge vu_set_mem_table_exec_postcopy() into vu_set_mem_table_exec()
Let's reduce some code duplication and prepare for further changes.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-5-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
bec58209 | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Factor out removing all mem regions
Let's factor it out. Note that the check for MAP_FAILED was wrong as we never set mmap_addr if mmap() failed. We'll remove the NULL check separatel
libvhost-user: Factor out removing all mem regions
Let's factor it out. Note that the check for MAP_FAILED was wrong as we never set mmap_addr if mmap() failed. We'll remove the NULL check separately.
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-4-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
0fa6344c | 14-Feb-2024 |
David Hildenbrand <david@redhat.com> |
libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509
Let's support up to 509 mem slots, just like vhost in the kernel usually does and the rust vhost-user implementation recently [1] started doing
libvhost-user: Bump up VHOST_USER_MAX_RAM_SLOTS to 509
Let's support up to 509 mem slots, just like vhost in the kernel usually does and the rust vhost-user implementation recently [1] started doing. This is required to properly support memory hotplug, either using multiple DIMMs (ACPI supports up to 256) or using virtio-mem.
The 509 used to be the KVM limit, it supported 512, but 3 were used for internal purposes. Currently, KVM supports more than 512, but it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot memory), except when other memory devices like PCI devices with BARs are used. So, 509 seems to work well for vhost in the kernel.
Details can be found in the QEMU change that made virtio-mem consume up to 256 mem slots across all virtio-mem devices. [2]
509 mem slots implies 509 VMAs/mappings in the worst case (even though, in practice with virtio-mem we won't be seeing more than ~260 in most setups).
With max_map_count under Linux defaulting to 64k, 509 mem slots still correspond to less than 1% of the maximum number of mappings. There are plenty left for the application to consume.
[1] https://github.com/rust-vmm/vhost/pull/224 [2] https://lore.kernel.org/all/20230926185738.277351-1-david@redhat.com/
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-3-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
61499d87 | 06-Oct-2023 |
Thomas Huth <thuth@redhat.com> |
libvhost-user: Fix compiler warning with -Wshadow=local
Rename shadowing variables to make this code compilable with -Wshadow=local.
Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231
libvhost-user: Fix compiler warning with -Wshadow=local
Rename shadowing variables to make this code compilable with -Wshadow=local.
Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231006121129.487251-1-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
show more ...
|