perf/Documentation/perf-stat.txt

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event in the form of rN where N is a hexadecimal value
44         - a symbolic or raw PMU event followed by an optional colon
45 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
46 	  linkperf:perf-list[1] man page for details on event modifiers.
48 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
54 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
56 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
69 -i::
70 --no-inherit::
72 -p::
73 --pid=<pid>::
76 -t::
77 --tid=<tid>::
80 -b::
81 --bpf-prog::
83         requiring root rights. bpftool-prog could be used to find program
86   # bpftool prog | head -n 1
89   # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
96         1.102235068 seconds time elapsed
98 --bpf-counters::
100 	allows multiple perf-stat sessions that are counting the same metric (cycles,
103 	"perf config stat.bpf-counter-events=<list_of_events>".
105 --bpf-attr-map::
106 	With option "--bpf-counters", different perf-stat sessions share
108 	Use "--bpf-attr-map" to specify the path of this pinned hashmap.
112 --pfm-events events::
114 including support for event filters. For example '--pfm-events
117 events cannot be mixed together. The latter must be used with the -e
118 option. The -e option and this one can be mixed and matched.  Events
122 -a::
123 --all-cpus::
124         system-wide collection from all CPUs (default if no target is specified)
126 --no-scale::
129 -d::
130 --detailed::
133 	   -d:          detailed events, L1 and LLC data cache
134         -d -d:     more detailed events, dTLB and iTLB events
135      -d -d -d:     very detailed events, adding prefetch events
137 -r::
138 --repeat=<n>::
141 -B::
142 --big-num::
144 	Enabled by default. Use "--no-big-num" to disable.
145 	Default setting can be changed with "perf config stat.big-num=false".
147 -C::
148 --cpu=::
150 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
154 -A::
155 --no-aggr::
158 -n::
159 --null::
160 null run - Don't start any counters.
162 This can be useful to measure just elapsed wall-clock time - or to assess the
165 -v::
166 --verbose::
169 -x SEP::
170 --field-separator SEP::
171 print counts using a CSV-style output to make it easy to import directly into
174 --table:: Display time for each run (-r option), in a table format, e.g.:
176   $ perf stat --null -r 5 --table perf bench sched pipe
181              5.189 (-0.293) #
182              5.189 (-0.294) #
183              5.186 (-0.296) #
188              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
190 -G name::
191 --cgroup name::
193 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
197 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
200 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
203 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
205 --for-each-cgroup name::
208 effect that repeating -e option and -G option for each event x name.  This option
209 cannot be used with -G/--cgroup option.
211 -o file::
212 --output file::
215 --append::
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
218 --log-fd::
220 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
221 with it.  --append may be used here.  Examples:
222      3>results  perf stat --log-fd 3          \-- $cmd
223      3>>results perf stat --log-fd 3 --append \-- $cmd
225 --control=fifo:ctl-fifo[,ack-fifo]::
226 --control=fd:ctl-fd[,ack-fd]::
227 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
228 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
230 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
239  test -p ${ctl_fifo} && unlink ${ctl_fifo}
244  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
248  perf stat -D -1 -e cpu-cycles -a -I 1000       \
249            --control fd:${ctl_fd},${ctl_fd_ack} \
250            \-- sleep 30 &
253  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
254  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
256  exec {ctl_fd_ack}>&-
259  exec {ctl_fd}>&-
262  wait -n ${perf_pid}
266 --pre::
267 --post::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
272 -I msecs::
273 --interval-print msecs::
274 Print count deltas every N milliseconds (minimum: 1ms)
275 The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. …
276 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
280 --interval-count times::
282 This option should be used together with "-I" option.
283 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
285 --interval-clear::
288 --timeout msecs::
289 Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
290 This option is not supported with the "-I" option.
291 	example: 'perf stat --time 2000 -e cycles -a'
293 --metric-only::
295 Don't show any raw values. Not supported with --per-thread.
297 --per-socket::
298 Aggregate counts per processor socket for system-wide mode measurements.  This
300 use --per-socket in addition to -a. (system-wide).  The output includes the
304 --per-die::
305 Aggregate counts per processor die for system-wide mode measurements.  This
307 use --per-die in addition to -a. (system-wide).  The output includes the
311 --per-cache::
312 Aggregate counts per cache instance for system-wide mode measurements.  By
313 default, the aggregation happens for the cache level at the highest index
314 in the system. To specify a particular level, mention the cache level
315 alongside the option in the format [Ll][1-9][0-9]*. For example:
316 Using option "--per-cache=l3" or "--per-cache=L3" will aggregate the
317 information at the boundary of the level 3 cache in the system.
319 --per-core::
320 Aggregate counts per physical processor for system-wide mode measurements.  This
322 use --per-core in addition to -a. (system-wide).  The output includes the
325 --per-thread::
326 Aggregate counts per monitored threads, when monitoring threads (-t option)
327 or processes (-p option).
329 --per-node::
330 Aggregate counts per NUMA nodes for system-wide mode measurements. This
332 mode, use --per-node in addition to -a. (system-wide).
334 -D msecs::
335 --delay msecs::
336 After starting the program, wait msecs before measuring (-1: start with events
340 -T::
341 --transaction::
345 --metric-no-group::
348 --metric-no-group option places events outside of groups and may
349 increase the chance of the event being scheduled - leading to more
351 for metrics like instructions per cycle can be lower - as both metrics
352 may no longer be being measured at the same time.
354 --metric-no-merge::
364 --metric-no-threshold::
373 --quiet::
378 -----------
381 -o file::
382 --output file::
386 -----------
389 -i file::
390 --input file::
393 --per-socket::
394 Aggregate counts per processor socket for system-wide mode measurements.
396 --per-die::
397 Aggregate counts per processor die for system-wide mode measurements.
399 --per-cache::
400 Aggregate counts per cache instance for system-wide mode measurements.  By
401 default, the aggregation happens for the cache level at the highest index
402 in the system. To specify a particular level, mention the cache level
403 alongside the option in the format [Ll][1-9][0-9]*. For example: Using
404 option "--per-cache=l3" or "--per-cache=L3" will aggregate the
405 information at the boundary of the level 3 cache in the system.
407 --per-core::
408 Aggregate counts per physical processor for system-wide mode measurements.
410 -M::
411 --metrics::
423 -A::
424 --no-aggr::
427 --topdown::
428 Print top-down metrics supported by the CPU. This allows to determine
441 mode like -I 1000, as the bottleneck of workloads can change often.
443 This enables --metric-only, unless overridden with --no-metric-only.
450 and -a (global monitoring) is needed, requiring root rights or
451 perf.perf_event_paranoid=-1.
463 --td-level::
464 Print the top-down statistics that equal the input level. It allows
465 users to print the interested top-down metrics level instead of the
466 level 1 top-down metrics.
474 'perf stat -M tma_frontend_bound_group...'.
478 --no-merge::
491 --hybrid-merge::
499 For non-hybrid events, it should be no effect.
501 --smi-cost::
507 The cost of SMI can be measured by (aperf - unhalted core cycles).
510 oriented analysis. --metric_only will be applied by default.
511 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
513 Users who wants to get the actual value can apply --no-metric-only.
515 --all-kernel::
518 --all-user::
521 --percore-show-thread::
530 --summary::
531 Print summary for interval mode (-I).
533 --no-csv-summary::
535 This option must be used with -x and --summary.
538 'stat.no-csv-summary'.
540 $ perf config stat.no-csv-summary=true
542 --cputype::
547 --------
549 $ perf stat \-- make
553         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
554                    0      context-switches:u        #    0.000 K/sec
555                    0      cpu-migrations:u          #    0.000 K/sec
556            3,228,188      page-faults:u             #    0.039 M/sec
560        2,078,861,393      branch-misses:u           #    2.98% of all branches
562         83.409183620 seconds time elapsed
568 -------
570 We always display the time the counters were enabled/alive:
572         83.409183620 seconds time elapsed
574 For workload sessions we also display time the workloads spent in
580 Those times are the very same as displayed by the 'time' tool.
583 ----------
585 With -x, perf stat is able to output a not-quite-CSV format output
587 it is recommended to use a different character like -x \;
591 	- optional usec time stamp in fractions of second (with -I xxx)
592 	- optional CPU, core, or socket identifier
593 	- optional number of logical CPUs aggregated
594 	- counter value
595 	- unit of the counter value or empty
596 	- event name
597 	- run time of counter
598 	- percentage of measurement time the counter was running
599 	- optional variance if multiple values are collected with -r
600 	- optional metric value
601 	- optional unit of metric
605 include::intel-hybrid.txt[]
608 -----------
610 With -j, perf stat is able to print out a JSON format output
613 - timestamp : optional usec time stamp in fractions of second (with -I)
614 - optional aggregate options:
615 		- core : core identifier (with --per-core)
616 		- die : die identifier (with --per-die)
617 		- socket : socket identifier (with --per-socket)
618 		- node : node identifier (with --per-node)
619 		- thread : thread identifier (with --per-thread)
620 - counter-value : counter value
621 - unit : unit of the counter value or empty
622 - event : event name
623 - variance : optional variance if multiple values are collected (with -r)
624 - runtime : run time of counter
625 - metric-value : optional metric value
626 - metric-unit : optional unit of metric
629 --------
630 linkperf:perf-top[1], linkperf:perf-list[1]