#
b424eba2 |
| 09-Nov-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Move threads to struct machine
The 'machine' abstraction was introduced with 'perf kvm' where we could have samples for the host and multiple guests, but at the time we ended up keepin
perf session: Move threads to struct machine
The 'machine' abstraction was introduced with 'perf kvm' where we could have samples for the host and multiple guests, but at the time we ended up keeping the list of all machines threads all in session->host_machine.
Move the threads rb_tree to struct machine to separate the namespaces.
Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-mdg7sm6j3va09vtgj49gbsrp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v3.2-rc1 |
|
#
88660563 |
| 29-Oct-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf report: Add progress bar when processing time ordered events
So that for large perf.data files the user can have visual feedback that activity is being performed.
Cc: David Ahern <dsahern@gmai
perf report: Add progress bar when processing time ordered events
So that for large perf.data files the user can have visual feedback that activity is being performed.
Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-3ysn01mpspfrbsy56gznzqqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v3.1, v3.1-rc10, v3.1-rc9 |
|
#
fbe96f29 |
| 30-Sep-2011 |
Stephane Eranian <eranian@google.com> |
perf tools: Make perf.data more self-descriptive (v8)
The goal of this patch is to include more information about the host environment into the perf.data so it is more self-descriptive. Overtime, pr
perf tools: Make perf.data more self-descriptive (v8)
The goal of this patch is to include more information about the host environment into the perf.data so it is more self-descriptive. Overtime, profiles are captured on various machines and it becomes hard to track what was recorded, on what machine and when.
This patch provides a way to solve this by extending the perf.data file with basic information about the host machine. To add those extensions, we leverage the feature bits capabilities of the perf.data format. The change is backward compatible with existing perf.data files.
We define the following useful new extensions: - HEADER_HOSTNAME: the hostname - HEADER_OSRELEASE: the kernel release number - HEADER_ARCH: the hw architecture - HEADER_CPUDESC: generic CPU description - HEADER_NRCPUS: number of online/avail cpus - HEADER_CMDLINE: perf command line - HEADER_VERSION: perf version - HEADER_TOPOLOGY: cpu topology - HEADER_EVENT_DESC: full event description (attrs) - HEADER_CPUID: easy-to-parse low level CPU identication
The small granularity for the entries is to make it easier to extend without breaking backward compatiblity. Many entries are provided as ASCII strings.
Perf report/script have been modified to print the basic information as easy-to-parse ASCII strings. Extended information about CPU and NUMA topology may be requested with the -I option.
Thanks to David Ahern for reviewing and testing the many versions of this patch.
$ perf report --stdio # ======== # captured on : Mon Sep 26 15:22:14 2011 # hostname : quad # os release : 3.1.0-rc4-tip # perf version : 3.1.0-rc4 # arch : x86_64 # nrcpus online : 4 # nrcpus avail : 4 # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz # cpuid : GenuineIntel,6,15,11 # total memory : 8105360 kB # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31, # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # ...
$ perf report --stdio -I # ======== # captured on : Mon Sep 26 15:22:14 2011 # hostname : quad # os release : 3.1.0-rc4-tip # perf version : 3.1.0-rc4 # arch : x86_64 # nrcpus online : 4 # nrcpus avail : 4 # cpudesc : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz # cpuid : GenuineIntel,6,15,11 # total memory : 8105360 kB # cmdline : /home/eranian/perfmon/official/tip/build/tools/perf/perf record date # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 29, 30, 31, # sibling cores : 0-3 # sibling threads : 0 # sibling threads : 1 # sibling threads : 2 # sibling threads : 3 # node0 meminfo : total = 8320608 kB, free = 7571024 kB # node0 cpu list : 0-3 # ======== # ...
Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/20110930134040.GA5575@quad Signed-off-by: Stephane Eranian <eranian@google.com> [ committer notes: Use --show-info in the tools as was in the docs, rename perf_header_fprintf_info to perf_file_section__fprintf_info, fixup conflict with f69b64f7 "perf: Support setting the disassembler style" ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v3.1-rc8, v3.1-rc7, v3.1-rc6 |
|
#
936be503 |
| 06-Sep-2011 |
David Ahern <dsahern@gmail.com> |
perf tool: Fix endianness handling of u32 data in samples
Currently, analyzing PPC data files on x86 the cpu field is always 0 and the tid and pid are backwards. For example, analyzing a PPC file on
perf tool: Fix endianness handling of u32 data in samples
Currently, analyzing PPC data files on x86 the cpu field is always 0 and the tid and pid are backwards. For example, analyzing a PPC file on PPC the pid/tid fields show:
rsyslogd 1210/1212
and analyzing the same PPC file using an x86 perf binary shows:
rsyslogd 1212/1210
The problem is that the swap_op method for samples is perf_event__all64_swap which assumes all elements in the sample_data struct are u64s. cpu, tid and pid are u32s and need to be handled individually. Given that the swap is done before the sample is parsed, the simplest solution is to undo the 64-bit swap of those elements when the sample is parsed and do the proper swap.
The RAW data field is generic and perf cannot have programmatic knowledge of how to treat that data. Instead a warning is given to the user.
Thanks to Anton Blanchard for providing a data file for a mult-CPU PPC system so I could verify the fix for the CPU fields.
v3 -> v4: - fixed use of WARN_ONCE
v2 -> v3: - used WARN_ONCE for message regarding raw data - removed struct wrapper around union - fixed whitespace issues
v1 -> v2: - added a union for undoing the byte-swap on u64 and redoing swap on u32's to address compiler errors (see git commit 65014ab3)
Cc: Anton Blanchard <anton@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0 |
|
#
eda3913b |
| 15-Jul-2011 |
David Ahern <dsahern@gmail.com> |
perf tools: Fix endian conversion reading event attr from file header
The perf_event_attr struct has two __u32's at the top and they need to be swapped individually.
With this change I was able to
perf tools: Fix endian conversion reading event attr from file header
The perf_event_attr struct has two __u32's at the top and they need to be swapped individually.
With this change I was able to analyze a perf.data collected in a 32-bit PPC VM on an x86 system. I tested both 32-bit and 64-bit binaries for the Intel analysis side; both read the PPC perf.data file correctly.
-v2: - changed the existing perf_event__attr_swap() to swap only elements of perf_event_attr and exported it for use in swapping the attributes in the file header - updated swap_ops used for processing events
Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: acme@ghostprotocols.net Cc: peterz@infradead.org Cc: paulus@samba.org Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/1310754849-12474-1-git-send-email-dsahern@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
show more ...
|
Revision tags: v3.0-rc7, v3.0-rc6 |
|
#
5d67be97 |
| 04-Jul-2011 |
Anton Blanchard <anton@samba.org> |
perf report/annotate/script: Add option to specify a CPU range
Add an option to perf report/annotate/script to specify which CPUs to operate on. This enables us to take a single system wide profile
perf report/annotate/script: Add option to specify a CPU range
Add an option to perf report/annotate/script to specify which CPUs to operate on. This enables us to take a single system wide profile and analyse each CPU (or group of CPUs) in isolation.
This was useful when profiling a multiprocess workload where the bottleneck was on one CPU but this was hidden in the overall profile. Per process and per thread breakdowns didn't help because multiple processes were running on each CPU and no single process consumed an entire CPU.
The patch converts the list of CPUs returned by cpu_map__new into a bitmap for fast lookup. I wanted to use -C to be consistent with perf top/record/stat, but unfortunately perf report already uses -C <comms>.
v2: Incorporate suggestions from David Ahern: - Added -c to perf script - Check that SAMPLE_CPU is set when -c is used - Update documentation
v3: Create perf_session__cpu_bitmap()
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten Signed-off-by: Ingo Molnar <mingo@elte.hu>
show more ...
|
Revision tags: v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1 |
|
#
610723f2 |
| 27-May-2011 |
David Ahern <dsahern@gmail.com> |
perf script: Make printing of dso a separate field option
The 'sym' option displays both the function name and the DSO it comes from. Split the display of the dso into a separate option. This allow
perf script: Make printing of dso a separate field option
The 'sym' option displays both the function name and the DSO it comes from. Split the display of the dso into a separate option. This allows display of the ip address and symbol without the dso, thus shortening line lengths - and decluttering the output a bit.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306528124-25861-3-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
787bef17 |
| 27-May-2011 |
David Ahern <dsahern@gmail.com> |
perf script: "sym" field really means show IP data
Currently the "sym" output field is used to dump instruction pointers and callchain stack. Sample addresses can also be converted to symbols, so th
perf script: "sym" field really means show IP data
Currently the "sym" output field is used to dump instruction pointers and callchain stack. Sample addresses can also be converted to symbols, so the meaning of "sym" needs to be fixed. This patch adds an "ip" option and if it is selected the user can also opt to dump symbols for them. If the user opts to dump IP without syms only the address is shown.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306528124-25861-2-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
a2854124 |
| 21-May-2011 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf tools: Pre-check sample size before parsing
Check that the total size of the sample fields having a fixed size do not exceed the one of the whole event. This robustifies the sample parsing.
Si
perf tools: Pre-check sample size before parsing
Check that the total size of the sample fields having a fixed size do not exceed the one of the whole event. This robustifies the sample parsing.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com>
show more ...
|
Revision tags: v2.6.39, v2.6.39-rc7, v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4, v2.6.39-rc3 |
|
#
9cbdb702 |
| 06-Apr-2011 |
David Ahern <daahern@cisco.com> |
perf script: improve validation of sample attributes for output fields
Check for required sample attributes using evsel rather than sample_type in the session header. If the attribute for a default
perf script: improve validation of sample attributes for output fields
Check for required sample attributes using evsel rather than sample_type in the session header. If the attribute for a default field is not present for the event type (e.g., new command operating on file from older kernel) the field is removed from the output list.
Expected event types must exist. For example, if a user specifies
-f trace:time,trace -f sw:time,cpu,sym
the perf.data file must contain both tracepoints and software events (ie., it is an error if either does not exist in the file).
Attribute checking is done once at the beginning of perf-script rather than for each sample.
v1 -> v2: - addressed comments from acme
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1302148460-570-1-git-send-email-daahern@cisco.com Signed-off-by: David Ahern <daahern@cisco.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.39-rc2, v2.6.39-rc1 |
|
#
9e69c210 |
| 15-Mar-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Pass evsel in event_ops->sample()
Resolving the sample->id to an evsel since the most advanced tools, report and annotate, and the others will too when they evolve to properly support
perf session: Pass evsel in event_ops->sample()
Resolving the sample->id to an evsel since the most advanced tools, report and annotate, and the others will too when they evolve to properly support multi-event perf.data files.
Good also because it does an extra validation, checking that the ID is valid when present. When that is not the case, the overhead is just a branch + function call (perf_evlist__id2evsel).
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.38 |
|
#
c0230b2b |
| 09-Mar-2011 |
David Ahern <daahern@cisco.com> |
perf script: Add support for dumping symbols
Add option to dump symbols found in events.
e.g., perf script -f comm,pid,tid,time,trace,sym
swapper 0/0 537.037184: prev_comm=swapper prev_p
perf script: Add support for dumping symbols
Add option to dump symbols found in events.
e.g., perf script -f comm,pid,tid,time,trace,sym
swapper 0/0 537.037184: prev_comm=swapper prev_pid=0 prev_prio=120... ffffffff81030350 perf_trace_sched_switch ([kernel.kallsyms]) ffffffff81382ac5 schedule ([kernel.kallsyms]) ffffffff8100134a cpu_idle ([kernel.kallsyms]) ffffffff81370b39 rest_init ([kernel.kallsyms]) ffffffff81696c23 start_kernel ([kernel.kallsyms].init.text) ffffffff816962af x86_64_start_reservations ([kernel.kallsyms].init.text) ffffffff816963b9 x86_64_start_kernel ([kernel.kallsyms].init.text)
sshd 1675/1675 537.037309: prev_comm=sshd prev_pid=1675 prev_prio=120... ffffffff81030350 perf_trace_sched_switch ([kernel.kallsyms]) ffffffff81382ac5 schedule ([kernel.kallsyms]) ffffffff813837aa schedule_hrtimeout_range_clock ([kernel.kallsyms]) ffffffff81383886 schedule_hrtimeout_range ([kernel.kallsyms]) ffffffff8110c4f9 poll_schedule_timeout ([kernel.kallsyms]) ffffffff8110cd20 do_select ([kernel.kallsyms]) ffffffff8110ced8 core_sys_select ([kernel.kallsyms]) ffffffff8110d00d sys_select ([kernel.kallsyms]) ffffffff81002bc2 system_call ([kernel.kallsyms]) 7f1647e56e93 __GI_select (/lib64/libc-2.12.90.so)
netstat 1692/1692 537.038664: prev_comm=netstat prev_pid=1692 prev_prio=... ffffffff81030350 perf_trace_sched_switch ([kernel.kallsyms]) ffffffff81382ac5 schedule ([kernel.kallsyms]) ffffffff81002c3a sysret_careful ([kernel.kallsyms]) 7f7a6cd1b210 __GI___libc_read (/lib64/libc-2.12.90.so)
Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1299734608-5223-6-git-send-email-daahern@cisco.com> Signed-off-by: David Ahern <daahern@cisco.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
a91e5431 |
| 10-Mar-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Use evlist/evsel for managing perf.data attributes
So that we can reuse things like the id to attr lookup routine (perf_evlist__id2evsel) that uses a hash table instead of the linear l
perf session: Use evlist/evsel for managing perf.data attributes
So that we can reuse things like the id to attr lookup routine (perf_evlist__id2evsel) that uses a hash table instead of the linear lookup done in the older perf_header_attr routines, etc.
Also to make evsels/evlist more pervasive an API, simplyfing using the emerging perf lib.
cc: Arun Sharma <arun@sharma-home.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.38-rc8 |
|
#
e248de33 |
| 05-Mar-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Improve support for sessions with multiple events
By creating an perf_evlist out of the attributes in the perf.data file header, so that we can use evlists and evsels when reading record
perf tools: Improve support for sessions with multiple events
By creating an perf_evlist out of the attributes in the perf.data file header, so that we can use evlists and evsels when reading recorded sessions in addition to when we record sessions.
More work is needed to allow tools to allow the user to select which events are wanted when browsing sessions, be it just one or a subset of them, aggregated or showed at the same time but with different indications on the UI to allow seeing workloads thru different views at the same time.
But the overall goal/trend is to more uniformly use evsels and evlists.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.38-rc7, v2.6.38-rc6, v2.6.38-rc5, v2.6.38-rc4, v2.6.38-rc3 |
|
#
8115d60c |
| 29-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Kill event_t typedef, use 'union perf_event' instead
And move the event_t methods to the perf_event__ too.
No code changes, just namespace consistency.
Cc: Frederic Weisbecker <fweisbe
perf tools: Kill event_t typedef, use 'union perf_event' instead
And move the event_t methods to the perf_event__ too.
No code changes, just namespace consistency.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
8d50e5b4 |
| 29-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Rename 'struct sample_data' to 'struct perf_sample'
Making the namespace more uniform.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <e
perf tools: Rename 'struct sample_data' to 'struct perf_sample'
Making the namespace more uniform.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.38-rc2 |
|
#
d0dd74e8 |
| 21-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Move event__parse_sample to evsel.c
To avoid linking more stuff in the python binding I'm working on, future csets will make the sample type be taken from the evsel itself, but for that
perf tools: Move event__parse_sample to evsel.c
To avoid linking more stuff in the python binding I'm working on, future csets will make the sample type be taken from the evsel itself, but for that we need to first have one file per cpu and per sample_type, not a single perf.data file.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.38-rc1 |
|
#
1b3a0e95 |
| 13-Jan-2011 |
Frederic Weisbecker <fweisbec@gmail.com> |
perf callchain: Feed callchains into a cursor
The callchains are fed with an array of a fixed size. As a result we iterate over each callchains three times:
- 1st to resolve symbols - 2nd to filter
perf callchain: Feed callchains into a cursor
The callchains are fed with an array of a fixed size. As a result we iterate over each callchains three times:
- 1st to resolve symbols - 2nd to filter out context boundaries - 3rd for the insertion into the tree
This also involves some pairs of memory allocation/deallocation everytime we insert a callchain, for the filtered out array of addresses and for the array of symbols that comes along.
Instead, feed the callchains through a linked list with persistent allocations. It brings several pros like:
- Merge the 1st and 2nd iterations in one. That was possible before but in a way that would involve allocating an array slightly taller than necessary because we don't know in advance the number of context boundaries to filter out.
- Much lesser allocations/deallocations. The linked list keeps persistent empty entries for the next usages and is extendable at will.
- Makes it easier for multiple sources of callchains to feed a stacktrace together. This is deemed to pave the way for cfi based callchains wherein traditional frame pointer based kernel stacktraces will precede cfi based user ones, producing an overall callchain which size is hardly predictable. This requirement makes the static array obsolete and makes a linked list based iterator a much more flexible fit.
Basic testing on a big perf file containing callchains (~ 176 MB) has shown a throughput gain of about 11% with perf report.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294977121-5700-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.37 |
|
#
1e7972cc |
| 03-Jan-2011 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf util: Move do_read from session to util
Not really something to be exported from session.c. Rename it to 'readn' as others did in the past.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ing
perf util: Move do_read from session to util
Not really something to be exported from session.c. Rename it to 'readn' as others did in the past.
Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.37-rc8, v2.6.37-rc7, v2.6.37-rc6 |
|
#
21ef97f0 |
| 09-Dec-2010 |
Ian Munsie <imunsie@au1.ibm.com> |
perf session: Fallback to unordered processing if no sample_id_all
If we are running the new perf on an old kernel without support for sample_id_all, we should fall back to the old unordered process
perf session: Fallback to unordered processing if no sample_id_all
If we are running the new perf on an old kernel without support for sample_id_all, we should fall back to the old unordered processing of events. If we didn't than we would *always* process events without timestamps out of order, whether or not we hit a reordering race. In other words, instead of there being a chance of not attributing samples correctly, we would guarantee that samples would not be attributed.
While processing all events without timestamps before events with timestamps may seem like an intuitive solution, it falls down as PERF_RECORD_EXIT events would also be processed before any samples. Even with a workaround for that case, samples before/after an exec would not be attributed correctly.
This patch allows commands to indicate whether they need to fall back to unordered processing, so that commands that do not care about timestamps on every event will not be affected. If we do fallback, this will print out a warning if report -D was invoked.
This patch adds the test in perf_session__new so that we only need to test once per session. Commands that do not use an event_ops (such as record and top) can simply pass NULL in it's place.
Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1291951882-sup-6069@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v2.6.37-rc5 |
|
#
9c90a61c |
| 02-Dec-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf tools: Ask for ID PERF_SAMPLE_ info on all PERF_RECORD_ events
So that we can use -T == --timestamp, asking for PERF_SAMPLE_TIME:
$ perf record -aT $ perf report -D | grep PERF_RECORD_ <
perf tools: Ask for ID PERF_SAMPLE_ info on all PERF_RECORD_ events
So that we can use -T == --timestamp, asking for PERF_SAMPLE_TIME:
$ perf record -aT $ perf report -D | grep PERF_RECORD_ <SNIP> 3 5951915425 0x47530 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff8138c1a2 period: 215979 cpu:3 3 5952026879 0x47588 [0x90]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff810cb480 period: 215979 cpu:3 3 5952059959 0x47618 [0x38]: PERF_RECORD_FORK(6853:6853):(16811:16811) 3 5952138878 0x47650 [0x78]: PERF_RECORD_SAMPLE(IP, 1): 16811/16811: 0xffffffff811bac35 period: 431478 cpu:3 3 5952375068 0x476c8 [0x30]: PERF_RECORD_COMM: find:6853 3 5952395923 0x476f8 [0x50]: PERF_RECORD_MMAP 6853/6853: [0x400000(0x25000) @ 0]: /usr/bin/find 3 5952413756 0x47748 [0xa0]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff810d080f period: 859332 cpu:3 3 5952419837 0x477e8 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44600000(0x21d000) @ 0]: /lib64/ld-2.5.so 3 5952437929 0x47840 [0x48]: PERF_RECORD_MMAP 6853/6853: [0x7fff7e1c9000(0x1000) @ 0x7fff7e1c9000]: [vdso] 3 5952570127 0x47888 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f46200000(0x218000) @ 0]: /lib64/libselinux.so.1 3 5952623637 0x478e0 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44a00000(0x356000) @ 0]: /lib64/libc-2.5.so 3 5952675720 0x47938 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f44e00000(0x204000) @ 0]: /lib64/libdl-2.5.so 3 5952710080 0x47990 [0x58]: PERF_RECORD_MMAP 6853/6853: [0x3f45a00000(0x246000) @ 0]: /lib64/libsepol.so.1 3 5952847802 0x479e8 [0x58]: PERF_RECORD_SAMPLE(IP, 1): 6853/6853: 0xffffffff813897f0 period: 1142536 cpu:3 <SNIP>
First column is the cpu and the second the timestamp.
That way we can investigate problems in the event stream.
If the new perf binary is run on an older kernel, it will disable this feature automatically.
Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1291318772-30880-5-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
640c03ce |
| 02-Dec-2010 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
perf session: Parse sample earlier
At perf_session__process_event, so that we reduce the number of lines in eache tool sample processing routine that now receives a sample_data pointer already parse
perf session: Parse sample earlier
At perf_session__process_event, so that we reduce the number of lines in eache tool sample processing routine that now receives a sample_data pointer already parsed.
This will also be useful in the next patch, where we'll allow sample the identity fields in MMAP, FORK, EXIT, etc, when it will be possible to see (cpu, timestamp) just after before every event.
Also validate callchains in perf_session__process_event, i.e. as early as possible, and keep a counter of the number of events discarded due to invalid callchains, warning the user about it if it happens.
There is an assumption that was kept that all events have the same sample_type, that will be dealt with in the future, when this preexisting limitation will be removed.
Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1291318772-30880-4-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
5c891f38 |
| 30-Nov-2010 |
Thomas Gleixner <tglx@linutronix.de> |
perf session: Allocate chunks of sample objects
The ordered sample code allocates singular reference objects struct sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's silly.
perf session: Allocate chunks of sample objects
The ordered sample code allocates singular reference objects struct sample_queue which have 48byte size on 64bit and 20 bytes on 32bit. That's silly. Allocate ~64k sized chunks and hand them out.
Performance gain: ~ 15%
Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101130163820.398713983@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
020bb75a |
| 30-Nov-2010 |
Thomas Gleixner <tglx@linutronix.de> |
perf session: Cache sample objects
When the sample queue is flushed we free the sample reference objects. Though we need to malloc new objects when we process further. Stop the malloc/free orgy and
perf session: Cache sample objects
When the sample queue is flushed we free the sample reference objects. Though we need to malloc new objects when we process further. Stop the malloc/free orgy and cache the already allocated object for resuage. Only allocate when the cache is empty.
Performance gain: ~ 10%
Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101130163820.338488630@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
a1225dec |
| 30-Nov-2010 |
Thomas Gleixner <tglx@linutronix.de> |
perf session: Fix list sort algorithm
The homebrewn sort algorithm fails to sort in time order. One of the problem spots is that it fails to deal with equal timestamps correctly.
My first gut react
perf session: Fix list sort algorithm
The homebrewn sort algorithm fails to sort in time order. One of the problem spots is that it fails to deal with equal timestamps correctly.
My first gut reaction was to replace the fancy list with an rbtree, but the performance is 3 times worse.
Rewrite it so it works.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101130163819.908482530@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|