f989dc00 | 13-Sep-2023 |
James Clark <james.clark@arm.com> |
perf pmu: Move pmu__find_core_pmu() to pmus.c
[ Upstream commit 3d0f5f456a5786573ba6a3358178c8db580e4b85 ]
pmu__find_core_pmu() more logically belongs in pmus.c because it iterates over all PMUs, s
perf pmu: Move pmu__find_core_pmu() to pmus.c
[ Upstream commit 3d0f5f456a5786573ba6a3358178c8db580e4b85 ]
pmu__find_core_pmu() more logically belongs in pmus.c because it iterates over all PMUs, so move it to pmus.c
At the same time rename it to perf_pmus__find_core_pmu() to match the naming convention in this file.
list_prepare_entry() can't be used in perf_pmus__scan_core() anymore now that it's called from the same compilation unit. This is with -O2 (specifically -O1 -ftree-vrp -finline-functions -finline-small-functions) which allow the bounds of the array access to be determined at compile time. list_prepare_entry() subtracts the offset of the 'list' member in struct perf_pmu from &core_pmus, which isn't a struct perf_pmu. The compiler sees that pmu results in &core_pmus - 8 and refuses to compile. At runtime this works because list_for_each_entry_continue() always adds the offset back again before dereferencing ->next, but it's technically undefined behavior. With -fsanitize=undefined an additional warning is generated.
Using list_first_entry_or_null() to get the first entry here avoids doing &core_pmus - 8 but has the same result and fixes both the compile warning and the undefined behavior warning. There are other uses of list_prepare_entry() in pmus.c, but the compiler doesn't seem to be able to see that they can also be called with &core_pmus, so I won't change any at this time.
Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Haixin Yu <yuhaixin.yhx@linux.alibaba.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230913153355.138331-2-james.clark@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Stable-dep-of: d9c5f5f94c2d ("perf pmu: Count sys and cpuid JSON events separately") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
d3ced099 | 26-Mar-2024 |
James Clark <james.clark@arm.com> |
perf test shell arm_coresight: Increase buffer size for Coresight basic tests
[ Upstream commit 10b6ee3b597b1b1b4dc390aaf9d589664af31df9 ]
These tests record in a mode that includes kernel trace bu
perf test shell arm_coresight: Increase buffer size for Coresight basic tests
[ Upstream commit 10b6ee3b597b1b1b4dc390aaf9d589664af31df9 ]
These tests record in a mode that includes kernel trace but look for samples of a userspace process. This makes them sensitive to any kernel compilation options that increase the amount of time spent in the kernel. If the trace buffer is completely filled before userspace is reached then the test will fail. Double the buffer size to fix this.
The other tests in the same file aren't sensitive to this for various reasons, for example the iterate devices test filters by userspace trace only. But in order to keep coverage of all the modes, increase the buffer size rather than filtering by userspace for the basic tests.
Fixes: d1efa4a0a696e487 ("perf cs-etm: Add separate decode paths for timeless and per-thread modes") Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240326113749.257250-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
aa4158e3 | 03-Sep-2023 |
Yang Jihong <yangjihong1@huawei.com> |
perf record: Move setting tracking events before record__init_thread_masks()
[ Upstream commit 1285ab300d598ead593b190af65a16f4b0843c68 ]
User space tasks can migrate between CPUs, so when tracing
perf record: Move setting tracking events before record__init_thread_masks()
[ Upstream commit 1285ab300d598ead593b190af65a16f4b0843c68 ]
User space tasks can migrate between CPUs, so when tracing selected CPUs, sideband for all CPUs is needed. In this case set the cpu map of the evsel to all online CPUs. This may modify the original cpu map of the evlist.
Therefore, need to check whether the preceding scenario exists before record__init_thread_masks().
Dummy tracking has been set in record__open(), move it before record__init_thread_masks() and add a helper for unified processing.
The sys_perf_event_open invoked is as follows:
# perf --debug verbose=3 record -e cpu-clock -D 100 true <SNIP> Opening: cpu-clock ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0 (PERF_COUNT_SW_CPU_CLOCK) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|PERIOD|IDENTIFIER read_format ID|LOST disabled 1 inherit 1 freq 1 sample_id_all 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 10318 cpu 0 group_fd -1 flags 0x8 = 5 sys_perf_event_open: pid 10318 cpu 1 group_fd -1 flags 0x8 = 6 sys_perf_event_open: pid 10318 cpu 2 group_fd -1 flags 0x8 = 7 sys_perf_event_open: pid 10318 cpu 3 group_fd -1 flags 0x8 = 9 sys_perf_event_open: pid 10318 cpu 4 group_fd -1 flags 0x8 = 10 sys_perf_event_open: pid 10318 cpu 5 group_fd -1 flags 0x8 = 11 sys_perf_event_open: pid 10318 cpu 6 group_fd -1 flags 0x8 = 12 sys_perf_event_open: pid 10318 cpu 7 group_fd -1 flags 0x8 = 13 Opening: dummy:u ------------------------------------------------------------ perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x9 (PERF_COUNT_SW_DUMMY) { sample_period, sample_freq } 1 sample_type IP|TID|TIME|IDENTIFIER read_format ID|LOST disabled 1 inherit 1 exclude_kernel 1 exclude_hv 1 mmap 1 comm 1 enable_on_exec 1 task 1 sample_id_all 1 exclude_guest 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 ------------------------------------------------------------ sys_perf_event_open: pid 10318 cpu 0 group_fd -1 flags 0x8 = 14 sys_perf_event_open: pid 10318 cpu 1 group_fd -1 flags 0x8 = 15 sys_perf_event_open: pid 10318 cpu 2 group_fd -1 flags 0x8 = 16 sys_perf_event_open: pid 10318 cpu 3 group_fd -1 flags 0x8 = 17 sys_perf_event_open: pid 10318 cpu 4 group_fd -1 flags 0x8 = 18 sys_perf_event_open: pid 10318 cpu 5 group_fd -1 flags 0x8 = 19 sys_perf_event_open: pid 10318 cpu 6 group_fd -1 flags 0x8 = 20 sys_perf_event_open: pid 10318 cpu 7 group_fd -1 flags 0x8 = 21 <SNIP>
'perf test' needs to update base-record & system-wide-dummy attr expected values for test-record-C0:
1. Because a dummy sideband event is added to the sampling of specified CPUs. When evlist contains evsel of different sample_type, evlist__config() will change the default PERF_SAMPLE_ID bit to PERF_SAMPLE_IDENTIFICATION bit. The attr sample_type expected value of base-record and system-wide-dummy in test-record-C0 needs to be updated.
2. The perf record uses evlist__add_aux_dummy() instead of evlist__add_dummy() to add a dummy event. The expected value of system-wide-dummy attr needs to be updated.
The 'perf test' result is as follows:
# ./perf test list 2>&1 | grep 'Setup struct perf_event_attr' 17: Setup struct perf_event_attr # ./perf test 17 17: Setup struct perf_event_attr : Ok
Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20230904023340.12707-4-yangjihong1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Stable-dep-of: 792bc998baf9 ("perf record: Fix debug message placement for test consumption") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
28a50a15 | 10-Apr-2024 |
James Clark <james.clark@arm.com> |
perf tests: Apply attributes to all events in object code reading test
[ Upstream commit 2dade41a533f337447b945239b87ff31a8857890 ]
PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being op
perf tests: Apply attributes to all events in object code reading test
[ Upstream commit 2dade41a533f337447b945239b87ff31a8857890 ]
PERF_PMU_CAP_EXTENDED_HW_TYPE results in multiple events being opened on heterogeneous systems. Currently this test only sets its required attributes on the first event. Not disabling enable_on_exec on the other events causes the test to fail because the forked objdump processes are sampled. No tracking event is opened so Perf only knows about its own mappings causing the objdump samples to give the following error:
$ perf test -vvv "object code reading"
Reading object code for memory address: 0xffff9aaa55ec thread__find_map failed ---- end(-1) ---- 24: Object code reading : FAILED!
Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spoorthy S <spoorts2@in.ibm.com> Link: https://lore.kernel.org/r/20240410103458.813656-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
c7e8c0e6 | 01-Dec-2023 |
Veronika Molnarova <vmolnaro@redhat.com> |
perf test record user-regs: Fix mask for vg register
[ Upstream commit 28b01743ca752cea5ab182297d8b912b22f2a2d1 ]
The 'vg' register for arm64 shows up in --user_regs as available when masking the v
perf test record user-regs: Fix mask for vg register
[ Upstream commit 28b01743ca752cea5ab182297d8b912b22f2a2d1 ]
The 'vg' register for arm64 shows up in --user_regs as available when masking the variable AT_HWCAP with 1 << 22 returns '1' as done in perf_regs.c.
However, in subtests for support of SVE, the check for the 'vg' register is done by masking the variable AT_HWCAP with the value 0x200000 which is equals to 1 << 21 instead of 1 << 22.
This results in inconsistencies on certain systems where the test expects that the 'vg' register is not operational when it is, and vice-versa.
During the testing on a machine that the test expected not to have the 'vg' register available, 'perf record' with the option --user-regs showed records for the 'vg' register together with all of the others, which means that the mask for the subtest of perf_event_attr is off by one.
Change the value of the mask from 0x200000 to 0x400000 to correct it.
Fixes: 9440ebdc333dd12e ("perf test arm64: Add attr tests for new VG register") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Link: https://lore.kernel.org/r/20231201194617.13012-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
f0005f17 | 30-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf metric: Add #num_cpus_online literal
Returns the number of CPUs online, unlike #num_cpus that returns the number present.
Add a test of the property.
This will be used in future Intel metrics
perf metric: Add #num_cpus_online literal
Returns the number of CPUs online, unlike #num_cpus that returns the number present.
Add a test of the property.
This will be used in future Intel metrics.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230830073026.1829912-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
d2045f87 | 26-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf jevents: Use "default_core" for events with no Unit
The JSON Unit field encodes the name of the PMU to match the events to. When no name is given it has meant the "cpu" core PMU except for test
perf jevents: Use "default_core" for events with no Unit
The JSON Unit field encodes the name of the PMU to match the events to. When no name is given it has meant the "cpu" core PMU except for tests.
On ARM, Intel hybrid and s390 the core PMU is named differently which means that using "cpu" for this case causes the events not to get matched to the PMU.
Introduce a new "default_core" string for this case and in the pmu__name_match force all core PMUs to match this name.
Fixes: 2e255b4f9f41f137 ("perf jevents: Group events by PMU") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20230826062203.1058041-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
a84260e3 | 25-Aug-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
It has system-wide test and cpu-list test but the cpu-list test fails sometimes. It runs sleep command on CPU1 and measur
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
It has system-wide test and cpu-list test but the cpu-list test fails sometimes. It runs sleep command on CPU1 and measure both user.slice and system.slice cgroups by default (on systemd-based systems).
But if the system was idle enough, sometime the system.slice gets no count and it makes the test failing. Maybe that's because it only looks at the CPU1, let's add CPU0 to increase the chance it finds some tasks.
Fixes: 7901086014bbaa3a ("perf test: Add a new test for perf stat cgroup BPF counter") Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230825164152.165610-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
68ca249c | 25-Aug-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf test shell stat_bpf_counters: Fix test on Intel
As of now, bpf counters (bperf) don't support event groups. But the default perf stat includes topdown metrics if supported (on recent Intel mac
perf test shell stat_bpf_counters: Fix test on Intel
As of now, bpf counters (bperf) don't support event groups. But the default perf stat includes topdown metrics if supported (on recent Intel machines) which require groups. That makes perf stat exiting.
$ sudo perf stat --bpf-counter true bpf managed perf events do not yet support groups.
Actually the test explicitly uses cycles event only, but it missed to pass the option when it checks the availability of the command.
Fixes: 2c0cb9f56020d2ea ("perf test: Add a shell test for 'perf stat --bpf-counters' new option") Reviewed-by: Song Liu <song@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825164152.165610-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
b7823045 | 24-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf pmu: Make id const and add missing free
The struct pmu id is initialized from pmu_id that is read into allocated memory from a file, as such it needs free-ing in pmu__delete().
Make the id val
perf pmu: Make id const and add missing free
The struct pmu id is initialized from pmu_id that is read into allocated memory from a file, as such it needs free-ing in pmu__delete().
Make the id value const so that we can remove casts in tests.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
970ef02e | 24-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf parse-events: Make term's config const
This avoids casts in tests. Use zfree in a few places to avoid warnings about a freeing a const pointer.
Signed-off-by: Ian Rogers <irogers@google.com> C
perf parse-events: Make term's config const
This avoids casts in tests. Use zfree in a few places to avoid warnings about a freeing a const pointer.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Ming Wang <wangming01@loongson.cn> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230825024002.801955-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
eeb6b129 | 24-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf jevents: Don't append Unit to desc
Unit with the PMU name is appended to desc in jevents.py, but on hybrid platforms it causes the desc to differ from the regular non-hybrid system with a PMU o
perf jevents: Don't append Unit to desc
Unit with the PMU name is appended to desc in jevents.py, but on hybrid platforms it causes the desc to differ from the regular non-hybrid system with a PMU of 'cpu'. Having differing descs means the events don't deduplicate. To make the perf list output not differ, append the Unit on again in the perf list printing code.
On x86 reduces the binary size by 409,600 bytes or about 4%. Update pmu-events test expectations to match the differently generated pmu-events.c code.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824183212.374787-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
8d4b6d37 | 23-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf pmu: Lazily load sysfs aliases
Don't load sysfs aliases for a PMU when the PMU is first created, defer until an alias needs to be found. For the pmu-scan benchmark, average core PMU scanning is
perf pmu: Lazily load sysfs aliases
Don't load sysfs aliases for a PMU when the PMU is first created, defer until an alias needs to be found. For the pmu-scan benchmark, average core PMU scanning is reduced by 30.8%, and average PMU scanning by 12.6%.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
e6ff1eed | 23-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf pmu: Lazily add JSON events
Rather than scanning all JSON events and adding them when a PMU is created, add the alias when the JSON event is needed.
Average core PMU scanning run time reduced
perf pmu: Lazily add JSON events
Rather than scanning all JSON events and adding them when a PMU is created, add the alias when the JSON event is needed.
Average core PMU scanning run time reduced by 60.2%. Average PMU scanning run time reduced by 15%. Page faults with no events reduced by 74 page faults, 4% of total.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
7c52f10c | 23-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf pmu: Cache JSON events table
Cache the JSON events table so that finding it isn't done per event/alias.
Change the events table find so that when the PMU is given, if the PMU has no JSON event
perf pmu: Cache JSON events table
Cache the JSON events table so that finding it isn't done per event/alias.
Change the events table find so that when the PMU is given, if the PMU has no JSON events return null.
Update usage to always use the PMU variable.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
edb217ff | 23-Aug-2023 |
Ian Rogers <irogers@google.com> |
perf pmu: Parse sysfs events directly from a file
Rather than read a sysfs events file into a 256 byte char buffer, pass the FILE* directly to the lex/yacc parser.
This avoids there being a maximum
perf pmu: Parse sysfs events directly from a file
Rather than read a sysfs events file into a 256 byte char buffer, pass the FILE* directly to the lex/yacc parser.
This avoids there being a maximum events file size.
While changing the API, constify some arguments to remove unnecessary casts.
Allocating the read buffer decreases the performance of pmu-scan by around 3%.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230824041330.266337-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|