perf/Documentation/perf-bench.txt

perf-bench(1)
=============

NAME
----
perf-bench - General framework for benchmark suites

SYNOPSIS
--------
[verse]
'perf bench' [<common options>] <subsystem> <suite> [<options>]

DESCRIPTION
-----------
This 'perf bench' command is a general framework for benchmark suites.

COMMON OPTIONS
--------------
-r::
--repeat=::
Specify number of times to repeat the run (default 10).

-f::
--format=::
Specify format style.
Current available format styles are:

'default'::
Default style. This is mainly for human reading.
---------------------
% perf bench sched pipe                      # with no style specified
(executing 1000000 pipe operations between two tasks)
        Total time:5.855 sec
                5.855061 usecs/op
		170792 ops/sec
---------------------

'simple'::
This simple style is friendly for automated
processing by scripts.
---------------------
% perf bench --format=simple sched pipe      # specified simple
5.988
---------------------

SUBSYSTEM
---------

'sched'::
	Scheduler and IPC mechanisms.

'syscall'::
	System call performance (throughput).

'mem'::
	Memory access performance.

'numa'::
	NUMA scheduling and MM benchmarks.

'futex'::
	Futex stressing benchmarks.

'epoll'::
	Eventpoll (epoll) stressing benchmarks.

'internals'::
	Benchmark internal perf functionality.

'uprobe'::
	Benchmark overhead of uprobe + BPF.

'all'::
	All benchmark subsystems.

SUITES FOR 'sched'
~~~~~~~~~~~~~~~~~~
*messaging*::
Suite for evaluating performance of scheduler and IPC mechanisms.
Based on hackbench by Rusty Russell.

Options of *messaging*
^^^^^^^^^^^^^^^^^^^^^^
-p::
--pipe::
Use pipe() instead of socketpair()

-t::
--thread::
Be multi thread instead of multi process

-g::
--group=::
Specify number of groups

-l::
--nr_loops=::
Specify number of loops

Example of *messaging*
^^^^^^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched messaging                 # run with default
options (20 sender and receiver processes per group)
(10 groups == 400 processes run)

      Total time:0.308 sec

% perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
(20 sender and receiver threads per group)
(20 groups == 800 threads run)

      Total time:0.582 sec
---------------------

*pipe*::
Suite for pipe() system call.
Based on pipe-test-1m.c by Ingo Molnar.

Options of *pipe*
^^^^^^^^^^^^^^^^^
-l::
--loop=::
Specify number of loops.

Example of *pipe*
^^^^^^^^^^^^^^^^^

---------------------
% perf bench sched pipe
(executing 1000000 pipe operations between two tasks)

        Total time:8.091 sec
                8.091833 usecs/op
                123581 ops/sec

% perf bench sched pipe -l 1000              # loop 1000
(executing 1000 pipe operations between two tasks)

        Total time:0.016 sec
                16.948000 usecs/op
                59004 ops/sec
---------------------

SUITES FOR 'syscall'
~~~~~~~~~~~~~~~~~~
*basic*::
Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
cached by glibc.


SUITES FOR 'mem'
~~~~~~~~~~~~~~~~
*memcpy*::
Suite for evaluating performance of simple memory copy in various ways.

Options of *memcpy*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to copy (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to copy (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.

-l::
--nr_loops::
Repeat memcpy invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

*memset*::
Suite for evaluating performance of simple memory set in various ways.

Options of *memset*
^^^^^^^^^^^^^^^^^^^
-l::
--size::
Specify size of memory to set (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).

-f::
--function::
Specify function to set (default: default).
Available functions are depend on the architecture.
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.

-l::
--nr_loops::
Repeat memset invocation this number of times.

-c::
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.

SUITES FOR 'numa'
~~~~~~~~~~~~~~~~~
*mem*::
Suite for evaluating NUMA workloads.

SUITES FOR 'futex'
~~~~~~~~~~~~~~~~~~
*hash*::
Suite for evaluating hash tables.

*wake*::
Suite for evaluating wake calls.

*wake-parallel*::
Suite for evaluating parallel wake calls.

*requeue*::
Suite for evaluating requeue calls.

*lock-pi*::
Suite for evaluating futex lock_pi calls.

SUITES FOR 'epoll'
~~~~~~~~~~~~~~~~~~
*wait*::
Suite for evaluating concurrent epoll_wait calls.

*ctl*::
Suite for evaluating multiple epoll_ctl calls.

SUITES FOR 'internals'
~~~~~~~~~~~~~~~~~~~~~~
*synthesize*::
Suite for evaluating perf's event synthesis performance.

SEE ALSO
--------
linkperf:perf[1]