Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
67c48662 |
| 08-Feb-2023 |
Thomas Huth <thuth@redhat.com> |
KVM: PPC: Standardize on "int" return types in the powerpc KVM code
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. S
KVM: PPC: Standardize on "int" return types in the powerpc KVM code
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. Some outlier functions use "long" instead for no good reason (they do not really require long values here). Let's standardize on "int" here to avoid casting the values back and forth between the two types.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53 |
|
#
6fa1efea |
| 03-Jul-2022 |
Deming Wang <wangdeming@inspur.com> |
KVM: PPC: Book3s: Use arg->size directly in kvm_vm_ioctl_create_spapr_tce()
The size variable is just a copy of args->size, neither size nor args are modifed, so just use args->size directly.
Signe
KVM: PPC: Book3s: Use arg->size directly in kvm_vm_ioctl_create_spapr_tce()
The size variable is just a copy of args->size, neither size nor args are modifed, so just use args->size directly.
Signed-off-by: Deming Wang <wangdeming@inspur.com> [mpe: Reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220703172932.11329-1-wangdeming@inspur.com
show more ...
|
Revision tags: v5.15.52, v5.15.51 |
|
#
4dee21e0 |
| 28-Jun-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Do not warn when userspace asked for too big TCE table
KVM manages emulated TCE tables for guest LIOBNs by a two level table which maps up to 128TiB with 16MB IOMMU pages (enabled in QEMU
KVM: PPC: Do not warn when userspace asked for too big TCE table
KVM manages emulated TCE tables for guest LIOBNs by a two level table which maps up to 128TiB with 16MB IOMMU pages (enabled in QEMU by default) and MAX_ORDER=11 (the kernel's default). Note that the last level of the table is allocated when actual TCE is updated.
However these tables are created via ioctl() on kvmfd and the userspace can trigger WARN_ON_ONCE_GFP(order >= MAX_ORDER, gfp) in mm/page_alloc.c and flood dmesg.
This adds __GFP_NOWARN.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220628080228.1508847-1-aik@ozlabs.ru
show more ...
|
Revision tags: v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38 |
|
#
cad32d9d |
| 06-May-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation.
The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine.
If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version.
On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs.
So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput.
This removes the real mode handlers from KVM and related code from the powernv platform.
This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed.
This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
show more ...
|
Revision tags: v5.15.37, v5.15.36, v5.15.35 |
|
#
26a62b75 |
| 20-Apr-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Fix TCE handling for VFIO
The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks the biggest (16MB). I
KVM: PPC: Fix TCE handling for VFIO
The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks the biggest (16MB). In the case of a passed though PCI device, there is a hardware IOMMU which does not support all pages sizes from the above - P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated 16M IOMMU page we may create several smaller mappings ("TCEs") in the hardware IOMMU.
The code wrongly uses the emulated TCE index instead of hardware TCE index in error handling. The problem is easier to see on POWER8 with multi-level TCE tables (when only the first level is preallocated) as hash mode uses real mode TCE hypercalls handlers. The kernel starts using indirect tables when VMs get bigger than 128GB (depends on the max page order). The very first real mode hcall is going to fail with H_TOO_HARD as in the real mode we cannot allocate memory for TCEs (we can in the virtual mode) but on the way out the code attempts to clear hardware TCEs using emulated TCE indexes which corrupts random kernel memory because it_offset==1<<59 is subtracted from those indexes and the resulting index is out of the TCE table bounds.
This fixes kvmppc_clear_tce() to use the correct TCE indexes.
While at it, this fixes TCE cache invalidation which uses emulated TCE indexes instead of the hardware ones. This went unnoticed as 64bit DMA is used these days and VMs map all RAM in one go and only then do DMA and this is when the TCE cache gets populated.
Potentially this could slow down mapping, however normally 16MB emulated pages are backed by 64K hardware pages so it is one write to the "TCE Kill" per 256 updates which is not that bad considering the size of the cache (1024 TCEs or so).
Fixes: ca1fc489cfa0 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ru
show more ...
|
Revision tags: v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7 |
|
#
5dfbbb66 |
| 18-Sep-2021 |
Len Baker <len.baker@gmx.com> |
KVM: PPC: Replace zero-length array with flexible array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. K
KVM: PPC: Replace zero-length array with flexible array member
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members" [1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2].
Also, make use of the struct_size() helper in kzalloc().
[1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
show more ...
|
#
9dc46d2e |
| 20-Apr-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Fix TCE handling for VFIO
[ Upstream commit 26a62b750a4e6364b0393562f66759b1494c3a01 ]
The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4
KVM: PPC: Fix TCE handling for VFIO
[ Upstream commit 26a62b750a4e6364b0393562f66759b1494c3a01 ]
The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks the biggest (16MB). In the case of a passed though PCI device, there is a hardware IOMMU which does not support all pages sizes from the above - P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated 16M IOMMU page we may create several smaller mappings ("TCEs") in the hardware IOMMU.
The code wrongly uses the emulated TCE index instead of hardware TCE index in error handling. The problem is easier to see on POWER8 with multi-level TCE tables (when only the first level is preallocated) as hash mode uses real mode TCE hypercalls handlers. The kernel starts using indirect tables when VMs get bigger than 128GB (depends on the max page order). The very first real mode hcall is going to fail with H_TOO_HARD as in the real mode we cannot allocate memory for TCEs (we can in the virtual mode) but on the way out the code attempts to clear hardware TCEs using emulated TCE indexes which corrupts random kernel memory because it_offset==1<<59 is subtracted from those indexes and the resulting index is out of the TCE table bounds.
This fixes kvmppc_clear_tce() to use the correct TCE indexes.
While at it, this fixes TCE cache invalidation which uses emulated TCE indexes instead of the hardware ones. This went unnoticed as 64bit DMA is used these days and VMs map all RAM in one go and only then do DMA and this is when the TCE cache gets populated.
Potentially this could slow down mapping, however normally 16MB emulated pages are backed by 64K hardware pages so it is one write to the "TCE Kill" per 256 updates which is not that bad considering the size of the cache (1024 TCEs or so).
Fixes: ca1fc489cfa0 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60 |
|
#
0f22af94 |
| 04-Aug-2021 |
David Matlack <dmatlack@google.com> |
KVM: Move last_used_slot logic out of search_memslots
Make search_memslots unconditionally search all memslots and move the last_used_slot logic up one level to __gfn_to_memslot. This is in preparat
KVM: Move last_used_slot logic out of search_memslots
Make search_memslots unconditionally search all memslots and move the last_used_slot logic up one level to __gfn_to_memslot. This is in preparation for introducing a per-vCPU last_used_slot.
As part of this change convert existing callers of search_memslots to __gfn_to_memslot to avoid making any functional changes.
Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11 |
|
#
cf59eb13 |
| 21-Sep-2020 |
Wang Wensheng <wangwensheng4@huawei.com> |
KVM: PPC: Book3S: Fix symbol undeclared warnings
Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static?
KVM: PPC: Book3S: Fix symbol undeclared warnings
Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol 'kvmppc_radix_set_pte_at' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol 'kvmhv_p9_guest_entry' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol 'iommu_tce_kill_rm' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol 'kvmppc_tce_iommu_do_map' was not declared. Should it be static? arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr' was not declared. Should it be static?
Those symbols are used only in the files that define them so make them static to fix the warnings.
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
eb173559 |
| 19-Sep-2020 |
Jing Xiangfeng <jingxiangfeng@huawei.com> |
KVM: PPC: Book3S: Remove redundant initialization of variable ret
The variable ret is being initialized with '-ENOMEM' that is meaningless. So remove it.
Signed-off-by: Jing Xiangfeng <jingxiangfen
KVM: PPC: Book3S: Remove redundant initialization of variable ret
The variable ret is being initialized with '-ENOMEM' that is meaningless. So remove it.
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40 |
|
#
ab8b65be |
| 10-May-2020 |
Qian Cai <cai@lca.pw> |
KVM: PPC: Book3S: Fix some RCU-list locks
It is unsafe to traverse kvm->arch.spapr_tce_tables and stt->iommu_tables without the RCU read lock held. Also, add cond_resched_rcu() in places with the RC
KVM: PPC: Book3S: Fix some RCU-list locks
It is unsafe to traverse kvm->arch.spapr_tce_tables and stt->iommu_tables without the RCU read lock held. Also, add cond_resched_rcu() in places with the RCU read lock held that could take a while to finish.
arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 no locks held by qemu-kvm/4265.
stack backtrace: CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2 Call Trace: [c000201a8690f720] [c000000000715948] dump_stack+0xfc/0x174 (unreliable) [c000201a8690f770] [c0000000001d9470] lockdep_rcu_suspicious+0x140/0x164 [c000201a8690f7f0] [c008000010b9fb48] kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm] [c000201a8690f870] [c008000010b8462c] kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm] [c000201a8690f8a0] [c008000010b84710] kvm_vfio_destroy+0x88/0x140 [kvm] [c000201a8690f8f0] [c008000010b7d488] kvm_put_kvm+0x370/0x600 [kvm] [c000201a8690f990] [c008000010b7e3c0] kvm_vm_release+0x38/0x60 [kvm] [c000201a8690f9c0] [c0000000005223f4] __fput+0x124/0x330 [c000201a8690fa20] [c000000000151cd8] task_work_run+0xb8/0x130 [c000201a8690fa70] [c0000000001197e8] do_exit+0x4e8/0xfa0 [c000201a8690fb70] [c00000000011a374] do_group_exit+0x64/0xd0 [c000201a8690fbb0] [c000000000132c90] get_signal+0x1f0/0x1200 [c000201a8690fcc0] [c000000000020690] do_notify_resume+0x130/0x3c0 [c000201a8690fda0] [c000000000038d64] syscall_exit_prepare+0x1a4/0x280 [c000201a8690fe20] [c00000000000c8f8] system_call_common+0xf8/0x278
==== arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4264: #0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000200c9ed0c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm]
==== arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
==== arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-kvm/4257: #0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24 |
|
#
4d395762 |
| 28-Feb-2020 |
Peter Xu <peterx@redhat.com> |
KVM: Remove unnecessary asm/kvm_host.h includes
Remove includes of asm/kvm_host.h from files that already include linux/kvm_host.h to make it more obvious that there is no ordering issue between the
KVM: Remove unnecessary asm/kvm_host.h includes
Remove includes of asm/kvm_host.h from files that already include linux/kvm_host.h to make it more obvious that there is no ordering issue between the two headers. linux/kvm_host.h includes asm/kvm_host.h to pick up architecture specific settings, and this will never change, i.e. including asm/kvm_host.h after linux/kvm_host.h may seem problematic, but in practice is simply redundant.
Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14 |
|
#
8a9c8925 |
| 26-Nov-2019 |
Leonardo Bras <leonardo@linux.ibm.com> |
KVM: PPC: Book3S: Replace current->mm by kvm->mm
Given that in kvm_create_vm() there is: kvm->mm = current->mm;
And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO;
I s
KVM: PPC: Book3S: Replace current->mm by kvm->mm
Given that in kvm_create_vm() there is: kvm->mm = current->mm;
And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO;
I see no reason to keep using current->mm instead of kvm->mm.
By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct.
Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8 |
|
#
149487bd |
| 21-Oct-2019 |
Sean Christopherson <sean.j.christopherson@intel.com> |
KVM: Add separate helper for putting borrowed reference to kvm
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor f
KVM: Add separate helper for putting borrowed reference to kvm
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken.
[*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12 |
|
#
01b7d128 |
| 29-Aug-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Invalidate multiple TCEs at once
Invalidating a TCE cache entry for each updated TCE is quite expensive. This makes use of the new iommu_table_ops::xchg_no_kill()/tce_kill() callba
KVM: PPC: Book3S: Invalidate multiple TCEs at once
Invalidating a TCE cache entry for each updated TCE is quite expensive. This makes use of the new iommu_table_ops::xchg_no_kill()/tce_kill() callbacks to bring down the time spent in mapping a huge guest DMA window; roughly 20s to 10s for each guest's 100GB of DMA space.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-3-aik@ozlabs.ru
show more ...
|
Revision tags: v5.2.11 |
|
#
4f916593 |
| 25-Aug-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we do anything with the existing tables, there is a small window when a check in kvmppc_tce_validate might pass and right after that the guest alters the page with TCEs which can cause early exit from the handler and leave srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap) (real mode) locked.
This fixes the bug by jumping to the common exit code with an appropriate unlock.
Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190826045520.92153-1-aik@ozlabs.ru
show more ...
|
#
ddfd151f |
| 25-Aug-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we do anything with the existing tables, there is a small window when a check in kvmppc_tce_validate might pass and right after that the guest alters the page of TCEs, causing an early exit from the handler and leaving srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap) (real mode) locked.
This fixes the bug by jumping to the common exit code with an appropriate unlock.
Cc: stable@vger.kernel.org # v4.11+ Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2 |
|
#
79eb597c |
| 16-Jul-2019 |
Daniel Jordan <daniel.m.jordan@oracle.com> |
mm: add account_locked_vm utility function
locked_vm accounting is done roughly the same way in five places, so unify them in a helper.
Include the helper's caller in the debug print to distinguish
mm: add account_locked_vm utility function
locked_vm accounting is done roughly the same way in five places, so unify them in a helper.
Include the helper's caller in the debug print to distinguish between callsites.
Error codes stay the same, so user-visible behavior does too. The one exception is that the -EPERM case in tce_account_locked_vm is removed because Alexey has never seen it triggered.
[daniel.m.jordan@oracle.com: v3] Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com [sfr@canb.auug.org.au: fix mm/util.c] Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.com Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Cc: Alan Tull <atull@kernel.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Moritz Fischer <mdf@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Steve Sistare <steven.sistare@oracle.com> Cc: Wu Hao <hao.wu@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6 |
|
#
d94d71cb |
| 29-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation 51 franklin street fifth floor boston ma 02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 67 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6 |
|
#
e1a1ef84 |
| 29-Mar-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Allocate guest TCEs on demand too
We already allocate hardware TCE tables in multiple levels and skip intermediate levels when we can, now it is a turn of the KVM TCE tables. Thank
KVM: PPC: Book3S: Allocate guest TCEs on demand too
We already allocate hardware TCE tables in multiple levels and skip intermediate levels when we can, now it is a turn of the KVM TCE tables. Thankfully these are allocated already in 2 levels.
This moves the table's last level allocation from the creating helper to kvmppc_tce_put() and kvm_spapr_tce_fault(). Since such allocation cannot be done in real mode, this creates a virtual mode version of kvmppc_tce_put() which handles allocations.
This adds kvmppc_rm_ioba_validate() to do an additional test if the consequent kvmppc_tce_put() needs a page which has not been allocated; if this is the case, we bail out to virtual mode handlers.
The allocations are protected by a new mutex as kvm->lock is not suitable for the task because the fault handler is called with the mmap_sem held but kvmhv_setup_mmu() locks kvm->lock and mmap_sem in the reverse order.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
2001825e |
| 29-Mar-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S HV: Avoid lockdep debugging in TCE realmode handlers
The kvmppc_tce_to_ua() helper is called from real and virtual modes and it works fine as long as CONFIG_DEBUG_LOCKDEP is not ena
KVM: PPC: Book3S HV: Avoid lockdep debugging in TCE realmode handlers
The kvmppc_tce_to_ua() helper is called from real and virtual modes and it works fine as long as CONFIG_DEBUG_LOCKDEP is not enabled. However if the lockdep debugging is on, the lockdep will most likely break in kvm_memslots() because of srcu_dereference_check() so we need to use PPC-own kvm_memslots_raw() which uses realmode safe rcu_dereference_raw_notrace().
This creates a realmode copy of kvmppc_tce_to_ua() which replaces kvm_memslots() with kvm_memslots_raw().
Since kvmppc_rm_tce_to_ua() becomes static and can only be used inside HV KVM, this moves it earlier under CONFIG_KVM_BOOK3S_HV_POSSIBLE.
This moves truly virtual-mode kvmppc_tce_to_ua() to where it belongs and drops the prmap parameter which was never used in the virtual mode.
Fixes: d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls", 2016-02-15) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
#
345077c8 |
| 29-Mar-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Protect memslots while validating user address
Guest physical to user address translation uses KVM memslots and reading these requires holding the kvm->srcu lock. However recently
KVM: PPC: Book3S: Protect memslots while validating user address
Guest physical to user address translation uses KVM memslots and reading these requires holding the kvm->srcu lock. However recently introduced kvmppc_tce_validate() broke the rule (see the lockdep warning below).
This moves srcu_read_lock(&vcpu->kvm->srcu) earlier to protect kvmppc_tce_validate() as well.
============================= WARNING: suspicious RCU usage 5.1.0-rc2-le_nv2_aikATfstn1-p1 #380 Not tainted ----------------------------- include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-ppc/8020: #0: 0000000094972fe9 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0xdc/0x850 [kvm]
stack backtrace: CPU: 44 PID: 8020 Comm: qemu-system-ppc Not tainted 5.1.0-rc2-le_nv2_aikATfstn1-p1 #380 Call Trace: [c000003fece8f740] [c000000000bcc134] dump_stack+0xe8/0x164 (unreliable) [c000003fece8f790] [c000000000181be0] lockdep_rcu_suspicious+0x130/0x170 [c000003fece8f810] [c0000000000d5f50] kvmppc_tce_to_ua+0x280/0x290 [c000003fece8f870] [c00800001a7e2c78] kvmppc_tce_validate+0x80/0x1b0 [kvm] [c000003fece8f8e0] [c00800001a7e3fac] kvmppc_h_put_tce+0x94/0x3e4 [kvm] [c000003fece8f9a0] [c00800001a8baac4] kvmppc_pseries_do_hcall+0x30c/0xce0 [kvm_hv] [c000003fece8fa10] [c00800001a8bd89c] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] [c000003fece8fae0] [c00800001a7d95dc] kvmppc_vcpu_run+0x34/0x48 [kvm] [c000003fece8fb00] [c00800001a7d56bc] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [c000003fece8fb90] [c00800001a7c3618] kvm_vcpu_ioctl+0x460/0x850 [kvm] [c000003fece8fd00] [c00000000041c4f4] do_vfs_ioctl+0xe4/0x930 [c000003fece8fdb0] [c00000000041ce04] ksys_ioctl+0xc4/0x110 [c000003fece8fe00] [c00000000041ce78] sys_ioctl+0x28/0x80 [c000003fece8fe20] [c00000000000b5a4] system_call+0x5c/0x70
Fixes: 42de7b9e2167 ("KVM: PPC: Validate TCEs against preregistered memory page sizes", 2018-09-10) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25 |
|
#
716cb116 |
| 20-Feb-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Book3S: Improve KVM reference counting
The anon fd's ops releases the KVM reference in the release hook. However we reference the KVM object after we create the fd so there is small window
KVM: PPC: Book3S: Improve KVM reference counting
The anon fd's ops releases the KVM reference in the release hook. However we reference the KVM object after we create the fd so there is small window when the release function can be called and dereferenced the KVM object which potentially may free it.
It is not a problem at the moment as the file is created and KVM is referenced under the KVM lock and the release function obtains the same lock before dereferencing the KVM (although the lock is not held when calling kvm_put_kvm()) but it is potentially fragile against future changes.
This references the KVM object before creating a file.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v4.19.24, v4.19.23, v4.19.22, v4.19.21 |
|
#
a67614cc |
| 11-Feb-2019 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Release all hardware TCE tables attached to a group
The SPAPR TCE KVM device references all hardware IOMMU tables assigned to some IOMMU group to ensure that in-kernel KVM acceleration of
KVM: PPC: Release all hardware TCE tables attached to a group
The SPAPR TCE KVM device references all hardware IOMMU tables assigned to some IOMMU group to ensure that in-kernel KVM acceleration of H_PUT_TCE can work. The tables are references when an IOMMU group gets registered with the VFIO KVM device by the KVM_DEV_VFIO_GROUP_ADD ioctl; KVM_DEV_VFIO_GROUP_DEL calls into the dereferencing code in kvm_spapr_tce_release_iommu_group() which walks through the list of LIOBNs, finds a matching IOMMU table and calls kref_put() when found.
However that code stops after the very first successful derefencing leaving other tables referenced till the SPAPR TCE KVM device is destroyed which normally happens on guest reboot or termination so if we do hotplug and unplug in a loop, we are leaking IOMMU tables here.
This removes a premature return to let kvm_spapr_tce_release_iommu_group() find and dereference all attached tables.
Fixes: 121f80ba68f ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
show more ...
|
Revision tags: v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11 |
|
#
c10c21ef |
| 19-Dec-2018 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
powerpc/vfio/iommu/kvm: Do not pin device memory
This new memory does not have page structs as it is not plugged to the host so gup() will fail anyway.
This adds 2 helpers: - mm_iommu_newdev() to p
powerpc/vfio/iommu/kvm: Do not pin device memory
This new memory does not have page structs as it is not plugged to the host so gup() will fail anyway.
This adds 2 helpers: - mm_iommu_newdev() to preregister the "memory device" memory so the rest of API can still be used; - mm_iommu_is_devmem() to know if the physical address is one of thise new regions which we must avoid unpinning of.
This adds @mm to tce_page_is_contained() and iommu_tce_xchg() to test if the memory is device memory to avoid pfn_to_page().
This adds a check for device memory in mm_iommu_ua_mark_dirty_rm() which does delayed pages dirtying.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show more ...
|