Home
last modified time | relevance | path

Searched hist:fae9ebde9696385fa2e993e752cf68d9781f3ea0 (Results 1 – 2 of 2) sorted by relevance

/openbmc/linux/arch/x86/events/
H A Dperf_event.hdiff fae9ebde9696385fa2e993e752cf68d9781f3ea0 Thu Aug 04 09:07:29 CDT 2022 Kan Liang <kan.liang@linux.intel.com> perf/x86/intel: Optimize FIXED_CTR_CTRL access

All the fixed counters share a fixed control register. The current
perf reads and re-writes the fixed control register for each fixed
counter disable/enable, which is unnecessary.

When changing the fixed control register, the entire PMU must be
disabled via the global control register. The changing cannot be taken
effect until the entire PMU is re-enabled. Only updating the fixed
control register once right before the entire PMU re-enabling is
enough.

The read of the fixed control register is not necessary either. The
value can be cached in the per CPU cpu_hw_events.

Test results:

Counting all the fixed counters with the perf bench sched pipe as below
on a SPR machine.

$perf stat -e cycles,instructions,ref-cycles,slots --no-inherit --
taskset -c 1 perf bench sched pipe

The Total elapsed time reduces from 5.36s (without the patch) to 4.99s
(with the patch), which is ~6.9% improvement.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com
/openbmc/linux/arch/x86/events/intel/
H A Dcore.cdiff fae9ebde9696385fa2e993e752cf68d9781f3ea0 Thu Aug 04 09:07:29 CDT 2022 Kan Liang <kan.liang@linux.intel.com> perf/x86/intel: Optimize FIXED_CTR_CTRL access

All the fixed counters share a fixed control register. The current
perf reads and re-writes the fixed control register for each fixed
counter disable/enable, which is unnecessary.

When changing the fixed control register, the entire PMU must be
disabled via the global control register. The changing cannot be taken
effect until the entire PMU is re-enabled. Only updating the fixed
control register once right before the entire PMU re-enabling is
enough.

The read of the fixed control register is not necessary either. The
value can be cached in the per CPU cpu_hw_events.

Test results:

Counting all the fixed counters with the perf bench sched pipe as below
on a SPR machine.

$perf stat -e cycles,instructions,ref-cycles,slots --no-inherit --
taskset -c 1 perf bench sched pipe

The Total elapsed time reduces from 5.36s (without the patch) to 4.99s
(with the patch), which is ~6.9% improvement.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com