Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1 |
|
#
53f408ca |
| 07-Nov-2023 |
Thomas Gleixner <tglx@linutronix.de> |
hrtimers: Push pending hrtimers away from outgoing CPU earlier
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ]
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") sol
hrtimers: Push pending hrtimers away from outgoing CPU earlier
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ]
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug") solved the straight forward CPU hotplug deadlock vs. the scheduler bandwidth timer. Yu discovered a more involved variant where a task which has a bandwidth timer started on the outgoing CPU holds a lock and then gets throttled. If the lock required by one of the CPU hotplug callbacks the hotplug operation deadlocks because the unthrottling timer event is not handled on the dying CPU and can only be recovered once the control CPU reaches the hotplug state which pulls the pending hrtimers from the dead CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying callbacks. Nothing can queue a hrtimer on the dying CPU at that point because all other CPUs spin in stop_machine() with interrupts disabled and once the operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liu Tie <liutie4@huawei.com> Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.10, v6.6, v6.5.9, v6.5.8 |
|
#
2b13e0d5 |
| 16-Oct-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64/arm: xen: enlighten: Fix KPTI checks
[ Upstream commit 20f3b8eafe0ba5d3c69d5011a9b07739e9645132 ]
When KPTI is in use, we cannot register a runstate region as XEN requires that this is always
arm64/arm: xen: enlighten: Fix KPTI checks
[ Upstream commit 20f3b8eafe0ba5d3c69d5011a9b07739e9645132 ]
When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_starting_cpu() must avoid registering each CPU's runstate region, and xen_guest_init() must avoid setting up features that depend upon it.
We tried to ensure that in commit:
f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled")
... where we added checks for xen_kernel_unmapped_at_usr(), which wraps arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit arm.
Unfortunately, as xen_guest_init() is an early_initcall, this happens before secondary CPUs are booted and arm64 has finalized the ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as secondary CPUs are onlined. On a big.LITTLE system where the boot CPU does not require KPTI but some secondary CPUs do, this will result in xen_guest_init() intializing features that depend on the runstate region, and xen_starting_cpu() registering the runstate region on some CPUs before KPTI is subsequent enabled, resulting the the problems the aforementioned commit tried to avoid.
Handle this more robsutly by deferring the initialization of the runstate region until secondary CPUs have been initialized and the ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is moved into a new hotplug starting function which is registered later when we're certain that KPTI will not be used.
Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
ef7d9593 |
| 11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove CPU hotplug infrastructure
There are no users of the cpu hotplug hooks in xfs now, so remove it. This reverts f1653c2e2831e ("xfs: introduce CPU hotplug infrastructure").
Signed-off-by:
xfs: remove CPU hotplug infrastructure
There are no users of the cpu hotplug hooks in xfs now, so remove it. This reverts f1653c2e2831e ("xfs: introduce CPU hotplug infrastructure").
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29 |
|
#
e0a99a83 |
| 15-May-2023 |
Anna-Maria Behnsen <anna-maria@linutronix.de> |
Documentation: core-api/cpuhotplug: Fix state names
Dynamic allocated hotplug states in documentation and the comment above cpuhp_state enum do not match the code. To not get confused by wrong docum
Documentation: core-api/cpuhotplug: Fix state names
Dynamic allocated hotplug states in documentation and the comment above cpuhp_state enum do not match the code. To not get confused by wrong documentation, change to proper state names.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230515162038.62703-1-anna-maria@linutronix.de
show more ...
|
#
9636be85 |
| 23-May-2023 |
Michael Kelley <mikelley@microsoft.com> |
x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
These commits
a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PC
x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
These commits
a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured.
When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE.
When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state.
Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough.
Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later.
Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
show more ...
|
#
18415f33 |
| 12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/I
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
There is often significant latency in the early stages of CPU bringup, and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and then waiting for it to respond before moving on to the next.
Allow a platform to enable parallel setup which brings all to be onlined CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the control CPU (BP) is single-threaded the important part is the last state CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
This allows the CPUs to run up to the first sychronization point cpuhp_ap_sync_alive() where they wait for the control CPU to release them one by one for the full onlining procedure.
This parallelism depends on the CPU hotplug core sync mechanism which ensures that the parallel brought up CPUs wait for release before touching any state which would make the CPU visible to anything outside the hotplug control mechanism.
To handle the SMT constraints of X86 correctly the bringup happens in two iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up the primary SMT threads of each core first, which can load the microcode without the need to rendevouz with the thread siblings. Once that's completed it brings up the secondary SMT threads.
Co-developed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
show more ...
|
#
a631be92 |
| 12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
The bring up logic of a to be onlined CPU consists of several parts, which are considered to be a single hotplug state:
1) Control CPU issu
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
The bring up logic of a to be onlined CPU consists of several parts, which are considered to be a single hotplug state:
1) Control CPU issues the wake-up
2) To be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the complete bring-up.
3) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up.
Allow to split this into two states:
1) Control CPU issues the wake-up
After that the to be onlined CPU starts up, does the minimal initialization, reports to be alive and waits for release into the full bring-up. As this can run after the control CPU dropped the hotplug locks the code which is executed on the AP before it reports alive has to be carefully audited to not violate any of the hotplug constraints, especially not modifying any of the various cpumasks.
This is really only meant to avoid waiting for the AP to react on the wake-up. Of course an architecture can move strict CPU related setup functionality, e.g. microcode loading, with care before the synchronization point to save further pointless waiting time.
2) Control CPU waits for the alive report and releases the upcoming CPU for the complete bring-up.
This allows that the two states can be split up to run all to be onlined CPUs up to state #1 on the control CPU and then at a later point run state #2. This spares some of the latencies of the full serialized per CPU bringup by avoiding the per CPU wakeup/wait serialization. The assumption is that the first AP already waits when the last AP has been woken up. This obvioulsy depends on the hardware latencies and depending on the timings this might still not completely eliminate all wait scenarios.
This split is just a preparatory step for enabling the parallel bringup later. The boot time bringup is still fully serialized. It has a separate config switch so that architectures which want to support parallel bringup can test the split of the CPUHP_BRINGUG step separately.
To enable this the architecture must support the CPU hotplug core sync mechanism and has to be audited that there are no implicit hotplug state dependencies which require a fully serialized bringup.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205257.080801387@linutronix.de
show more ...
|
#
6f062123 |
| 12-May-2023 |
Thomas Gleixner <tglx@linutronix.de> |
cpu/hotplug: Add CPU state tracking and synchronization
The CPU state tracking and synchronization mechanism in smpboot.c is completely independent of the hotplug code and all logic around it is imp
cpu/hotplug: Add CPU state tracking and synchronization
The CPU state tracking and synchronization mechanism in smpboot.c is completely independent of the hotplug code and all logic around it is implemented in architecture specific code.
Except for the state reporting of the AP there is absolutely nothing architecture specific and the sychronization and decision functions can be moved into the generic hotplug core code.
Provide an integrated variant and add the core synchronization and decision points. This comes in two flavours:
1) DEAD state synchronization
Updated by the architecture code once the AP reaches the point where it is ready to be torn down by the control CPU, e.g. by removing power or clocks or tear down via the hypervisor.
The control CPU waits for this state to be reached with a timeout. If the state is reached an architecture specific cleanup function is invoked.
2) Full state synchronization
This extends #1 with AP alive synchronization. This is new functionality, which allows to replace architecture specific wait mechanims, e.g. cpumasks, completely.
It also prevents that an AP which is in a limbo state can be brought up again. This can happen when an AP failed to report dead state during a previous off-line operation.
The dead synchronization is what most architectures use. Only x86 makes a bringup decision based on that state at the moment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Helge Deller <deller@gmx.de> # parisc Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/20230512205256.476305035@linutronix.de
show more ...
|
Revision tags: v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22 |
|
#
16812c96 |
| 29-Mar-2023 |
Kan Liang <kan.liang@linux.intel.com> |
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online
------------[ cut here ]------------ Volunt
iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug
A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online
------------[ cut here ]------------ Voluntary context switch within RCU read-side critical section! WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4f4/0x580 RIP: 0010:rcu_note_context_switch+0x4f4/0x580 Call Trace: <TASK> ? perf_event_update_userpage+0x104/0x150 __schedule+0x8d/0x960 ? perf_event_set_state.part.82+0x11/0x50 schedule+0x44/0xb0 schedule_timeout+0x226/0x310 ? __perf_event_disable+0x64/0x1a0 ? _raw_spin_unlock+0x14/0x30 wait_for_completion+0x94/0x130 __wait_rcu_gp+0x108/0x130 synchronize_rcu+0x67/0x70 ? invoke_rcu_core+0xb0/0xb0 ? __bpf_trace_rcu_stall_warning+0x10/0x10 perf_pmu_migrate_context+0x121/0x370 iommu_pmu_cpu_offline+0x6a/0xa0 ? iommu_pmu_del+0x1e0/0x1e0 cpuhp_invoke_callback+0x129/0x510 cpuhp_thread_fun+0x94/0x150 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> ---[ end trace 0000000000000000 ]---
The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(), when migrating a PMU to a new CPU. However, the current for_each_iommu() is within RCU read-side critical section.
Two methods were considered to fix the issue. - Use the dmar_global_lock to replace the RCU read lock when going through the drhd list. But it triggers a lockdep warning. - Use the cpuhp_setup_state_multi() to set up a dedicated state for each IOMMU PMU. The lock can be avoided.
The latter method is implemented in this patch. Since each IOMMU PMU has a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track the state. The state can be dynamically allocated now. Remove the CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.
Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
show more ...
|
Revision tags: v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2 |
|
#
d2c48b23 |
| 16-Feb-2023 |
Pierre Gondois <pierre.gondois@arm.com> |
firmware: arm_sdei: Fix sleep from invalid context BUG
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers:
BUG: sleeping function called from invalid context at kernel/l
firmware: arm_sdei: Fix sleep from invalid context BUG
Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20
sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled.
SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1]
Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them.
Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU.
[1]: https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/
Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v6.1.12, v6.1.11, v6.1.10, v6.1.9 |
|
#
46284c6c |
| 31-Jan-2023 |
Kan Liang <kan.liang@linux.intel.com> |
iommu/vt-d: Support cpumask for IOMMU perfmon
The perf subsystem assumes that all counters are by default per-CPU. So the user space tool reads a counter from each CPU. However, the IOMMU counters a
iommu/vt-d: Support cpumask for IOMMU perfmon
The perf subsystem assumes that all counters are by default per-CPU. So the user space tool reads a counter from each CPU. However, the IOMMU counters are system-wide and can be read from any CPU. Here we use a CPU mask to restrict counting to one CPU to handle the issue. (with CPU hotplug notifier to choose a different CPU if the chosen one is taken off-line).
The CPU is exposed to /sys/bus/event_source/devices/dmar*/cpumask for the user space perf tool.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-6-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
show more ...
|
Revision tags: v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11 |
|
#
aaf12a7b |
| 30-Nov-2022 |
Chao Gao <chao.gao@intel.com> |
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
The CPU STARTING section doesn't allow callbacks to fail. Move KVM's hotplug callback to ONLINE section so that it can abort onlining a C
KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
The CPU STARTING section doesn't allow callbacks to fail. Move KVM's hotplug callback to ONLINE section so that it can abort onlining a CPU in certain cases to avoid potentially breaking VMs running on existing CPUs. For example, when KVM fails to enable hardware virtualization on the hotplugged CPU.
Place KVM's hotplug state before CPUHP_AP_SCHED_WAIT_EMPTY as it ensures when offlining a CPU, all user tasks and non-pinned kernel tasks have left the CPU, i.e. there cannot be a vCPU task around. So, it is safe for KVM's CPU offline callback to disable hardware virtualization at that point. Likewise, KVM's online callback can enable hardware virtualization before any vCPU task gets a chance to run on hotplugged CPUs.
Drop kvm_x86_check_processor_compatibility()'s WARN that IRQs are disabled, as the ONLINE section runs with IRQs disabled. The WARN wasn't intended to be a requirement, e.g. disabling preemption is sufficient, the IRQ thing was purely an aggressive sanity check since the helper was only ever invoked via SMP function call.
Rename KVM's CPU hotplug callbacks accordingly.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> [sean: drop WARN that IRQs are disabled] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-42-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
#
466d27e4 |
| 30-Nov-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Simplify the CPUHP logic
For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers.
It looks prett
KVM: arm64: Simplify the CPUHP logic
For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers.
It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else.
This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd.
Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
Revision tags: v6.0.10, v5.15.80, v6.0.9, v5.15.79 |
|
#
92125c3a |
| 10-Nov-2022 |
Nick Child <nnac123@linux.ibm.com> |
ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints
When CPU's are added and removed, ibmvnic devices will reassign hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD t
ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints
When CPU's are added and removed, ibmvnic devices will reassign hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD to signal to ibmvnic devices that the CPU has been removed and it is time to reset affinity hint assignments. On the other hand, when CPU's are being added, add a state instance to CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity hints once the new CPU's are online. This implementation is based on the virtio_net driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v6.0.8, v5.15.78, v6.0.7, v5.15.77 |
|
#
30f89e52 |
| 02-Nov-2022 |
Juergen Gross <jgross@suse.com> |
x86/cacheinfo: Switch cache_ap_init() to hotplug callback
Instead of explicitly calling cache_ap_init() in identify_secondary_cpu() use a CPU hotplug callback instead. By registering the callback on
x86/cacheinfo: Switch cache_ap_init() to hotplug callback
Instead of explicitly calling cache_ap_init() in identify_secondary_cpu() use a CPU hotplug callback instead. By registering the callback only after having started the non-boot CPUs and initializing cache_aps_delayed_init with "true", calling set_cache_aps_delayed_init() at boot time can be dropped.
It should be noted that this change results in cache_ap_init() being called a little bit later when hotplugging CPUs. By using a new hotplug slot right at the start of the low level bringup this is not problematic, as no operations requiring a specific caching mode are performed that early in CPU initialization.
Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102074713.21493-15-jgross@suse.com Signed-off-by: Borislav Petkov <bp@suse.de>
show more ...
|
Revision tags: v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56 |
|
#
71610ab1 |
| 20-Jul-2022 |
Bibo Mao <maobibo@loongson.cn> |
LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed cpu. However as different platforms require different methods to conf
LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed cpu. However as different platforms require different methods to configure clocks, the code is platform-specific, and probably belongs to firmware/pmu or cpu regulator, rather than generic arch/loongarch code.
Also, there is no such register on QEMU virt machine since the clock/frequency regulation is not emulated.
This patch removes the hard-coded clock register accesses in generic LoongArch cpu hotplug flow.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
#
dd281e1a |
| 20-Jul-2022 |
Huacai Chen <chenhuacai@loongson.cn> |
irqchip: Add Loongson Extended I/O interrupt controller support
EIOINTC stands for "Extended I/O Interrupts" that described in Section 11.2 of "Loongson 3A5000 Processor Reference Manual". For more
irqchip: Add Loongson Extended I/O interrupt controller support
EIOINTC stands for "Extended I/O Interrupts" that described in Section 11.2 of "Loongson 3A5000 Processor Reference Manual". For more information please refer Documentation/loongarch/irq-chip-model.rst.
Loongson-3A5000 has 4 cores per NUMA node, and each NUMA node has an EIOINTC; while Loongson-3C5000 has 16 cores per NUMA node, and each NUMA node has 4 EIOINTCs. In other words, 16 cores of one NUMA node in Loongson-3C5000 are organized in 4 groups, each group connects to an EIOINTC. We call the "group" here as an EIOINTC node, so each EIOINTC node always includes 4 cores (both in Loongson-3A5000 and Loongson- 3C5000).
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1658314292-35346-12-git-send-email-lvjianmin@loongson.cn
show more ...
|
Revision tags: v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51 |
|
#
66637ab1 |
| 28-Jun-2022 |
Guangbin Huang <huangguangbin2@huawei.com> |
drivers/perf: hisi: add driver for HNS3 PMU
HNS3(HiSilicon Network System 3) PMU is RCiEP device in HiSilicon SoC NIC, supports collection of performance statistics such as bandwidth, latency, packe
drivers/perf: hisi: add driver for HNS3 PMU
HNS3(HiSilicon Network System 3) PMU is RCiEP device in HiSilicon SoC NIC, supports collection of performance statistics such as bandwidth, latency, packet rate and interrupt rate.
NIC of each SICL has one PMU device for it. Driver registers each PMU device to perf, and exports information of supported events, filter mode of each event, bdf range, hardware clock frequency, identifier and so on via sysfs.
Each PMU device has its own registers of control, counters and interrupt, and it supports 8 hardware events, each hardward event has its own registers for configuration, counters and interrupt.
Filter options contains: config - select event port - select physical port of nic tc - select tc(must be used with port) func - select PF/VF queue - select queue of PF/VF(must be used with func) intr - select interrupt number(must be used with func) global - select all functions of IO DIE
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20220628063419.38514-3-huangguangbin2@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
46859ac8 |
| 31-May-2022 |
Huacai Chen <chenhuacai@loongson.cn> |
LoongArch: Add multi-processor (SMP) support
LoongArch-based procesors have 4, 8 or 16 cores per package. This patch adds multi-processor (SMP) support for LoongArch.
Reviewed-by: WANG Xuerui <git@
LoongArch: Add multi-processor (SMP) support
LoongArch-based procesors have 4, 8 or 16 cores per package. This patch adds multi-processor (SMP) support for LoongArch.
Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
show more ...
|
Revision tags: v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35 |
|
#
6b79738b |
| 15-Apr-2022 |
Qi Liu <liuqi115@huawei.com> |
drivers/perf: hisi: Add Support for CPA PMU
On HiSilicon Hip09 platform, there is a CPA (Coherency Protocol Agent) on each SICL (Super IO Cluster) which implements packet format translation, route p
drivers/perf: hisi: Add Support for CPA PMU
On HiSilicon Hip09 platform, there is a CPA (Coherency Protocol Agent) on each SICL (Super IO Cluster) which implements packet format translation, route parsing and traffic statistics.
CPA PMU has 8 PMU counters and interrupt is supported to handle counter overflow. Let's support its driver under the framework of HiSilicon PMU driver.
Signed-off-by: Qi Liu <liuqi115@huawei.com> Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20220415102352.6665-3-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25 |
|
#
e9991434 |
| 18-Feb-2022 |
Atish Patra <atish.patra@wdc.com> |
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V SBI specification added a PMU extension that allows to configure start/stop any pmu counter. The RISC-V perf can use most of the ge
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V SBI specification added a PMU extension that allows to configure start/stop any pmu counter. The RISC-V perf can use most of the generic perf features except interrupt overflow and event filtering based on privilege mode which will be added in future.
It also allows to monitor a handful of firmware counters that can provide insights into firmware activity during a performance analysis.
Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
show more ...
|
Revision tags: v5.15.24, v5.15.23 |
|
#
68fa55f0 |
| 10-Feb-2022 |
Bharat Bhushan <bbhushan2@marvell.com> |
perf/marvell: cn10k DDR perf event core ownership
As DDR perf event counters are not per core, so they should be accessed only by one core at a time. Select new core when previously owning core is g
perf/marvell: cn10k DDR perf event core ownership
As DDR perf event counters are not per core, so they should be accessed only by one core at a time. Select new core when previously owning core is going offline.
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com> Reviewed-by: Bhaskara Budiredla <bbudiredla@marvell.com> Link: https://lore.kernel.org/r/20220211045346.17894-5-bbhushan2@marvell.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
#
3191dd5a |
| 13-Feb-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random: clear fast pool, crng, and batches in cpuhp bring up
For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire
random: clear fast pool, crng, and batches in cpuhp bring up
For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire irq handler, simply take care of the extreme edge case of resetting count to zero in the cpuhp online handler, just after workqueues have been reenabled. This simplifies the code a bit and lets us use vanilla variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still disabled, we clear out the per-cpu crng and its batches, so that it always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
show more ...
|
Revision tags: v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7 |
|
#
8404b0fb |
| 02-Dec-2021 |
Qi Liu <liuqi115@huawei.com> |
drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported to sample bandwidth, latency, buffer occupation etc.
Each PMU RCiEP devic
drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported to sample bandwidth, latency, buffer occupation etc.
Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is registered as a PMU in /sys/bus/event_source/devices, so users can select target PMU, and use filter to do further sets.
Filtering options contains: event - select the event. port - select target Root Ports. Information of Root Ports are shown under sysfs. bdf - select requester_id of target EP device. trig_len - set trigger condition for starting event statistics. trig_mode - set trigger mode. 0 means starting to statistic when bigger than trigger condition, and 1 means smaller. thr_len - set threshold for statistics. thr_mode - set threshold mode. 0 means count when bigger than threshold, and 1 means smaller.
Acked-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/20211202080633.2919-3-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
show more ...
|
Revision tags: v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24 |
|
#
4570ddda |
| 12-Mar-2021 |
Daniel Lezcano <daniel.lezcano@linaro.org> |
powercap/drivers/dtpm: Encapsulate even more the code
In order to increase the self-encapsulation of the dtpm generic code, the following changes are adding a power update ops to the dtpm ops. That
powercap/drivers/dtpm: Encapsulate even more the code
In order to increase the self-encapsulation of the dtpm generic code, the following changes are adding a power update ops to the dtpm ops. That allows the generic code to call directly the dtpm backend function to update the power values.
The power update function does compute the power characteristics when the function is invoked. In the case of the CPUs, the power consumption depends on the number of online CPUs. The online CPUs mask is not up to date at CPUHP_AP_ONLINE_DYN state in the tear down callback. That is the reason why the online / offline are at separate state. As there is already an existing state for DTPM, this one is only moved to the DEAD state, so there is no addition of new state with these changes. The dtpm node is not removed when the cpu is unplugged.
That simplifies the code for the next changes and results in a more self-encapsulated code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://lore.kernel.org/r/20210312130411.29833-1-daniel.lezcano@linaro.org
show more ...
|