#
0eaf812a |
| 19-Aug-2024 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Limit the period on Haswell
commit 25dfc9e357af8aed1ca79b318a73f2c59c1f0b2b upstream.
Running the ltp test cve-2015-3290 concurrently reports the following warnings.
perfevents: ir
perf/x86/intel: Limit the period on Haswell
commit 25dfc9e357af8aed1ca79b318a73f2c59c1f0b2b upstream.
Running the ltp test cve-2015-3290 concurrently reports the following warnings.
perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370 Call Trace: <NMI> ? __warn+0xa4/0x220 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? report_bug+0x3e/0xa0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? irq_work_claim+0x1e/0x40 ? intel_pmu_handle_irq+0x285/0x370 perf_event_nmi_handler+0x3d/0x60 nmi_handle+0x104/0x330
Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)
The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW.
HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far.
Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support") Reported-by: Li Huafei <lihuafei1@huawei.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
30912a7f |
| 04-Jan-2024 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
commit 971079464001c6856186ca137778e534d983174a upstream.
When commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation f
KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
commit 971079464001c6856186ca137778e534d983174a upstream.
When commit c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") switched the initialization of cpuc->guest_switch_msrs to use compound literals, it screwed up the boolean logic:
+ u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; ... - arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask; - arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable); + .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
Before the patch, the value of arr[0].guest would have been intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask. The intent is to always treat PEBS events as host-only because, while the guest runs, there is no way to tell the processor about the virtual address where to put PEBS records intended for the host.
Unfortunately, the new expression can be expanded to
(intel_ctrl & ~cpuc->intel_ctrl_host_mask) | (intel_ctrl & ~pebs_mask)
which makes no sense; it includes any bit that isn't *both* marked as exclude_guest and using PEBS. So, reinstate the old logic. Another way to write it could be "intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask)", presumably the intention of the author of the faulty. However, I personally find the repeated application of A AND NOT B to be a bit more readable.
This shows up as guest failures when running concurrent long-running perf workloads on the host, and was reported to happen with rcutorture. All guests on a given host would die simultaneously with something like an instruction fault or a segmentation violation.
Reported-by: Paul E. McKenney <paulmck@kernel.org> Analyzed-by: Sean Christopherson <seanjc@google.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Cc: stable@vger.kernel.org Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
a430021f |
| 22-May-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Add Crestmont PMU
The Grand Ridge and Sierra Forest are successors to Snow Ridge. They both have Crestmont core. From the core PMU's perspective, they are similar to the e-core of MT
perf/x86/intel: Add Crestmont PMU
The Grand Ridge and Sierra Forest are successors to Snow Ridge. They both have Crestmont core. From the core PMU's perspective, they are similar to the e-core of MTL. The only difference is the LBR event logging feature, which will be implemented in the following patches.
Create a non-hybrid PMU setup for Grand Ridge and Sierra Forest.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/20230522113040.2329924-1-kan.liang@linux.intel.com
show more ...
|
#
882cdb06 |
| 07-Aug-2023 |
Peter Zijlstra <peterz@infradead.org> |
x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont micro-architecture. It fits the pre-existing naming scheme perfectly fine, adhere to it.
Signed-off-by: Peter Zijl
x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont micro-architecture. It fits the pre-existing naming scheme perfectly fine, adhere to it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
show more ...
|
#
27c68c21 |
| 04-Jul-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
On SPR, the load latency event needs an auxiliary event in the same group to work properly. There's a check in intel_pmu_hw_config()
perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
On SPR, the load latency event needs an auxiliary event in the same group to work properly. There's a check in intel_pmu_hw_config() for this to iterate sibling events and find a mem-loads-aux event.
The for_each_sibling_event() has a lockdep assert to make sure if it disabled hardirq or hold leader->ctx->mutex. This works well if the given event has a separate leader event since perf_try_init_event() grabs the leader->ctx->mutex to protect the sibling list. But it can cause a problem when the event itself is a leader since the event is not initialized yet and there's no ctx for the event.
Actually I got a lockdep warning when I run the below command on SPR, but I guess it could be a NULL pointer dereference.
$ perf record -d -e cpu/mem-loads/uP true
The code path to the warning is:
sys_perf_event_open() perf_event_alloc() perf_init_event() perf_try_init_event() x86_pmu_event_init() hsw_hw_config() intel_pmu_hw_config() for_each_sibling_event() lockdep_assert_event_ctx()
We don't need for_each_sibling_event() when it's a standalone event. Let's return the error code directly.
Fixes: f3c0eba28704 ("perf: Add a few assertions") Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230704181516.3293665-1-namhyung@kernel.org
show more ...
|
#
a6742cb9 |
| 15-Jun-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
When counting a FRONTEND event, the MSR_PEBS_FRONTEND is not correctly set on GNR and MTL p-core.
The umask value for the FRONTEND events is
perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
When counting a FRONTEND event, the MSR_PEBS_FRONTEND is not correctly set on GNR and MTL p-core.
The umask value for the FRONTEND events is changed on GNR and MTL. The new umask is missing in the extra_regs[] table.
Add a dedicated intel_gnr_extra_regs[] for GNR and MTL p-core.
Fixes: bc4000fdb009 ("perf/x86/intel: Add Granite Rapids") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230615173242.3726364-1-kan.liang@linux.intel.com
show more ...
|
#
3c845304 |
| 17-May-2023 |
Like Xu <likexu@tencent.com> |
perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
After commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may
perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
After commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause the VMX hardware MSR switching mechanism to save/restore invalid values for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest. Fix it by using the active host value from cpuc->active_pebs_data_cfg.
Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
show more ...
|
#
10d95a31 |
| 04-May-2023 |
Dapeng Mi <dapeng1.mi@linux.intel.com> |
perf/x86/intel: Define bit macros for FixCntrCtl MSR
Define bit macros for FixCntrCtl MSR and replace the bit hardcoding with these bit macros. This would make code be more human-readable.
Perf com
perf/x86/intel: Define bit macros for FixCntrCtl MSR
Define bit macros for FixCntrCtl MSR and replace the bit hardcoding with these bit macros. This would make code be more human-readable.
Perf commands 'perf stat -e "instructions,cycles,ref-cycles"' and 'perf record -e "instructions,cycles,ref-cycles"' pass.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230504072128.3653470-1-dapeng1.mi@linux.intel.com
show more ...
|
#
bc4000fd |
| 14-Mar-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Add Granite Rapids
From core PMU's perspective, Granite Rapids is similar to the Sapphire Rapids. The key differences include:
- Doesn't need the AUX event workaround for the mem l
perf/x86/intel: Add Granite Rapids
From core PMU's perspective, Granite Rapids is similar to the Sapphire Rapids. The key differences include:
- Doesn't need the AUX event workaround for the mem load event. (Implement in this patch).
- Support Retire Latency (Has been implemented in the commit c87a31093c70 ("perf/x86: Support Retire Latency"))
- The event list, which will be supported in the perf tool later.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230314170041.2967712-1-kan.liang@linux.intel.com
show more ...
|
#
13738a36 |
| 09-Nov-2022 |
Like Xu <likexu@tencent.com> |
perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models
According to Intel SDM, the EPT-friendly PEBS is supported by all the platforms after ICX, ADL and the future platforms with PEBS f
perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models
According to Intel SDM, the EPT-friendly PEBS is supported by all the platforms after ICX, ADL and the future platforms with PEBS format 5.
Currently the only in-kernel user of this capability is KVM, which has very limited support for hybrid core pmu, so ADL and its successors do not currently expose this capability. When both hybrid core and PEBS format 5 are present, KVM will decide on its own merits.
Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-perf-users@vger.kernel.org Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221109082802.27543-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
show more ...
|
#
6795e558 |
| 06-Jan-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Add Emerald Rapids
From core PMU's perspective, Emerald Rapids is the same as the Sapphire Rapids. The only difference is the event list, which will be supported in the perf tool lat
perf/x86/intel: Add Emerald Rapids
From core PMU's perspective, Emerald Rapids is the same as the Sapphire Rapids. The only difference is the event list, which will be supported in the perf tool later.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230106160449.3566477-1-kan.liang@linux.intel.com
show more ...
|
#
eb55b455 |
| 18-Jan-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf/core: Add perf_sample_save_brstack() helper
When we saves the branch stack to the perf sample data, we needs to update the sample flags and the dynamic size. To make sure this is done consiste
perf/core: Add perf_sample_save_brstack() helper
When we saves the branch stack to the perf sample data, we needs to update the sample flags and the dynamic size. To make sure this is done consistently, add the perf_sample_save_brstack() helper and convert all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
show more ...
|
#
eb467aaa |
| 04-Jan-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Support Architectural PerfMon Extension leaf
The new CPUID leaf 0x23 reports the "true view" of PMU resources.
The sub-leaf 1 reports the available general-purpose counters and fixe
perf/x86/intel: Support Architectural PerfMon Extension leaf
The new CPUID leaf 0x23 reports the "true view" of PMU resources.
The sub-leaf 1 reports the available general-purpose counters and fixed counters. Update the number of counters and fixed counters when the sub-leaf is detected.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230104201349.1451191-5-kan.liang@linux.intel.com
show more ...
|
#
c87a3109 |
| 04-Jan-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86: Support Retire Latency
Retire Latency reports the number of elapsed core clocks between the retirement of the instruction indicated by the Instruction Pointer field of the PEBS record and
perf/x86: Support Retire Latency
Retire Latency reports the number of elapsed core clocks between the retirement of the instruction indicated by the Instruction Pointer field of the PEBS record and the retirement of the prior instruction. It's enumerated by the IA32_PERF_CAPABILITIES.PEBS_TIMING_INFO[17].
Add flag PMU_FL_RETIRE_LATENCY to indicate the availability of the feature.
The Retire Latency is not supported by the fixed counter 0 on p-core of MTL.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230104201349.1451191-3-kan.liang@linux.intel.com
show more ...
|
#
38aaf921 |
| 04-Jan-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86: Add Meteor Lake support
From PMU's perspective, Meteor Lake is similar to Alder Lake. Both are hybrid platforms, with e-core and p-core.
The key differences include: - The e-core supports
perf/x86: Add Meteor Lake support
From PMU's perspective, Meteor Lake is similar to Alder Lake. Both are hybrid platforms, with e-core and p-core.
The key differences include: - The e-core supports 2 PDIST GP counters (GP0 & GP1) - New MSRs for the Module Snoop Response Events on the e-core. - New Data Source fields are introduced for the e-core. - There are 8 GP counters for the e-core. - The load latency AUX event is not required for the p-core anymore. - Retire Latency (Support in a separate patch) for both cores.
Since most of the code in the intel_pmu_init() should be the same as Alder Lake, to avoid code duplication, share the path with Alder Lake.
Add new specific functions of extra_regs, and get_event_constraints to support the OCR events, Module Snoop Response Events and 2 PDIST GP counters on e-core.
Add new MTL specific mem_attrs which drops the load latency AUX event.
The Data Source field is extended to 4:0, which can contains max 32 sources.
The Retire Latency is implemented with a separate patch.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230104201349.1451191-2-kan.liang@linux.intel.com
show more ...
|
#
6f8faf47 |
| 31-Oct-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
The intel_pebs_isolation quirk checks both model number and stepping. Cooper Lake has a different stepping (11) than the other Skylake
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
The intel_pebs_isolation quirk checks both model number and stepping. Cooper Lake has a different stepping (11) than the other Skylake Xeon. It cannot benefit from the optimization in commit 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering").
Add the stepping of Cooper Lake into the isolation_ucodes[] table.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154550.571663-1-kan.liang@linux.intel.com
show more ...
|
#
bd275681 |
| 08-Oct-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which
perf: Rewrite core context handling
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged).
Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU
*Current design:*
Currently we have a per task and per cpu perf_event_contexts:
task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------'
Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task.
Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU.
The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc.
Each perf_event is then associated with its PMU and one perf_event_context.
*Proposed design:*
New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures:
task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------'
With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key:
{cpu, pmu, cgroup, group_index}
Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc.
Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information.
Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context.
Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events.
Much thanks to Ravi for picking this up and pushing it towards completion.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
show more ...
|
#
50b0c97b |
| 28-Sep-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86: Add new Raptor Lake S support
From PMU's perspective, the new Raptor Lake S is the same as the other of hybrid {ALDER,RAPTOP}LAKE.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Sig
perf/x86: Add new Raptor Lake S support
From PMU's perspective, the new Raptor Lake S is the same as the other of hybrid {ALDER,RAPTOP}LAKE.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220928153331.3757388-1-kan.liang@linux.intel.com
show more ...
|
#
b4e12b2d |
| 08-Sep-2022 |
Namhyung Kim <namhyung@kernel.org> |
perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infra
perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLY
There's no in-tree user anymore. Let's get rid of it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
show more ...
|
#
fae9ebde |
| 04-Aug-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf/x86/intel: Optimize FIXED_CTR_CTRL access
All the fixed counters share a fixed control register. The current perf reads and re-writes the fixed control register for each fixed counter disable/e
perf/x86/intel: Optimize FIXED_CTR_CTRL access
All the fixed counters share a fixed control register. The current perf reads and re-writes the fixed control register for each fixed counter disable/enable, which is unnecessary.
When changing the fixed control register, the entire PMU must be disabled via the global control register. The changing cannot be taken effect until the entire PMU is re-enabled. Only updating the fixed control register once right before the entire PMU re-enabling is enough.
The read of the fixed control register is not necessary either. The value can be cached in the per CPU cpu_hw_events.
Test results:
Counting all the fixed counters with the perf bench sched pipe as below on a SPR machine.
$perf stat -e cycles,instructions,ref-cycles,slots --no-inherit -- taskset -c 1 perf bench sched pipe
The Total elapsed time reduces from 5.36s (without the patch) to 4.99s (with the patch), which is ~6.9% improvement.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com
show more ...
|
#
1acab2e0 |
| 11-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf/x86/intel: Remove x86_pmu::update_topdown_event
Now that it is all internal to the intel driver, remove x86_pmu::update_topdown_event.
Assumes that is_topdown_count(event) can only be true whe
perf/x86/intel: Remove x86_pmu::update_topdown_event
Now that it is all internal to the intel driver, remove x86_pmu::update_topdown_event.
Assumes that is_topdown_count(event) can only be true when the hardware has topdown stuff and the function is set.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.771635301@infradead.org
show more ...
|
#
23685167 |
| 11-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf/x86/intel: Remove x86_pmu::set_topdown_event_period
Now that it is all internal to the intel driver, remove x86_pmu::set_topdown_event_period.
Signed-off-by: Peter Zijlstra (Intel) <peterz@inf
perf/x86/intel: Remove x86_pmu::set_topdown_event_period
Now that it is all internal to the intel driver, remove x86_pmu::set_topdown_event_period.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.706354189@infradead.org
show more ...
|
#
28f0f3c4 |
| 10-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf/x86: Change x86_pmu::limit_period signature
In preparation for making it a static_call, change the signature.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.ke
perf/x86: Change x86_pmu::limit_period signature
In preparation for making it a static_call, change the signature.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org
show more ...
|
#
e577bb17 |
| 10-May-2022 |
Peter Zijlstra <peterz@infradead.org> |
perf/x86/intel: Move the topdown stuff into the intel driver
Use the new x86_pmu::{set_period,update}() methods to push the topdown stuff into the Intel driver, where it belongs.
Signed-off-by: Pet
perf/x86/intel: Move the topdown stuff into the intel driver
Use the new x86_pmu::{set_period,update}() methods to push the topdown stuff into the Intel driver, where it belongs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.505933457@infradead.org
show more ...
|
#
a9a931e2 |
| 01-Sep-2022 |
Kan Liang <kan.liang@linux.intel.com> |
perf: Use sample_flags for branch stack
Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver.
Remove the br_stack from the perf_sample_data_init() to minimize t
perf: Use sample_flags for branch stack
Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver.
Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
show more ...
|