xref: /openbmc/linux/tools/perf/design.txt (revision 03ab8e6297acd1bc0eedaa050e2a1635c576fd11)
186470930SIngo Molnar
286470930SIngo MolnarPerformance Counters for Linux
386470930SIngo Molnar------------------------------
486470930SIngo Molnar
586470930SIngo MolnarPerformance counters are special hardware registers available on most modern
686470930SIngo MolnarCPUs. These registers count the number of certain types of hw events: such
786470930SIngo Molnaras instructions executed, cachemisses suffered, or branches mis-predicted -
886470930SIngo Molnarwithout slowing down the kernel or applications. These registers can also
986470930SIngo Molnartrigger interrupts when a threshold number of events have passed - and can
1086470930SIngo Molnarthus be used to profile the code that runs on that CPU.
1186470930SIngo Molnar
1286470930SIngo MolnarThe Linux Performance Counter subsystem provides an abstraction of these
1386470930SIngo Molnarhardware capabilities. It provides per task and per CPU counters, counter
1486470930SIngo Molnargroups, and it provides event capabilities on top of those.  It
1586470930SIngo Molnarprovides "virtual" 64-bit counters, regardless of the width of the
1686470930SIngo Molnarunderlying hardware counters.
1786470930SIngo Molnar
1886470930SIngo MolnarPerformance counters are accessed via special file descriptors.
1986470930SIngo MolnarThere's one file descriptor per virtual counter used.
2086470930SIngo Molnar
21b68eebd1SRamkumar RamachandraThe special file descriptor is opened via the sys_perf_event_open()
2286470930SIngo Molnarsystem call:
2386470930SIngo Molnar
240b413e44STim Blechmann   int sys_perf_event_open(struct perf_event_attr *hw_event_uptr,
2586470930SIngo Molnar			     pid_t pid, int cpu, int group_fd,
2686470930SIngo Molnar			     unsigned long flags);
2786470930SIngo Molnar
2886470930SIngo MolnarThe syscall returns the new fd. The fd can be used via the normal
2986470930SIngo MolnarVFS system calls: read() can be used to read the counter, fcntl()
3086470930SIngo Molnarcan be used to set the blocking mode, etc.
3186470930SIngo Molnar
3286470930SIngo MolnarMultiple counters can be kept open at a time, and the counters
3386470930SIngo Molnarcan be poll()ed.
3486470930SIngo Molnar
350b413e44STim BlechmannWhen creating a new counter fd, 'perf_event_attr' is:
3686470930SIngo Molnar
370b413e44STim Blechmannstruct perf_event_attr {
3886470930SIngo Molnar        /*
3986470930SIngo Molnar         * The MSB of the config word signifies if the rest contains cpu
4086470930SIngo Molnar         * specific (raw) counter configuration data, if unset, the next
4186470930SIngo Molnar         * 7 bits are an event type and the rest of the bits are the event
4286470930SIngo Molnar         * identifier.
4386470930SIngo Molnar         */
4486470930SIngo Molnar        __u64                   config;
4586470930SIngo Molnar
4686470930SIngo Molnar        __u64                   irq_period;
4786470930SIngo Molnar        __u32                   record_type;
4886470930SIngo Molnar        __u32                   read_format;
4986470930SIngo Molnar
5086470930SIngo Molnar        __u64                   disabled       :  1, /* off by default        */
5186470930SIngo Molnar                                inherit        :  1, /* children inherit it   */
5286470930SIngo Molnar                                pinned         :  1, /* must always be on PMU */
5386470930SIngo Molnar                                exclusive      :  1, /* only group on PMU     */
5486470930SIngo Molnar                                exclude_user   :  1, /* don't count user      */
5586470930SIngo Molnar                                exclude_kernel :  1, /* ditto kernel          */
5686470930SIngo Molnar                                exclude_hv     :  1, /* ditto hypervisor      */
5786470930SIngo Molnar                                exclude_idle   :  1, /* don't count when idle */
5886470930SIngo Molnar                                mmap           :  1, /* include mmap data     */
5986470930SIngo Molnar                                munmap         :  1, /* include munmap data   */
6086470930SIngo Molnar                                comm           :  1, /* include comm data     */
6186470930SIngo Molnar
6286470930SIngo Molnar                                __reserved_1   : 52;
6386470930SIngo Molnar
6486470930SIngo Molnar        __u32                   extra_config_len;
6586470930SIngo Molnar        __u32                   wakeup_events;  /* wakeup every n events */
6686470930SIngo Molnar
6786470930SIngo Molnar        __u64                   __reserved_2;
6886470930SIngo Molnar        __u64                   __reserved_3;
6986470930SIngo Molnar};
7086470930SIngo Molnar
7186470930SIngo MolnarThe 'config' field specifies what the counter should count.  It
7286470930SIngo Molnaris divided into 3 bit-fields:
7386470930SIngo Molnar
7486470930SIngo Molnarraw_type: 1 bit   (most significant bit)	0x8000_0000_0000_0000
7586470930SIngo Molnartype:	  7 bits  (next most significant)	0x7f00_0000_0000_0000
7686470930SIngo Molnarevent_id: 56 bits (least significant)		0x00ff_ffff_ffff_ffff
7786470930SIngo Molnar
7886470930SIngo MolnarIf 'raw_type' is 1, then the counter will count a hardware event
7986470930SIngo Molnarspecified by the remaining 63 bits of event_config.  The encoding is
8086470930SIngo Molnarmachine-specific.
8186470930SIngo Molnar
8286470930SIngo MolnarIf 'raw_type' is 0, then the 'type' field says what kind of counter
8386470930SIngo Molnarthis is, with the following encoding:
8486470930SIngo Molnar
85b68eebd1SRamkumar Ramachandraenum perf_type_id {
8686470930SIngo Molnar	PERF_TYPE_HARDWARE		= 0,
8786470930SIngo Molnar	PERF_TYPE_SOFTWARE		= 1,
8886470930SIngo Molnar	PERF_TYPE_TRACEPOINT		= 2,
8986470930SIngo Molnar};
9086470930SIngo Molnar
9186470930SIngo MolnarA counter of PERF_TYPE_HARDWARE will count the hardware event
9286470930SIngo Molnarspecified by 'event_id':
9386470930SIngo Molnar
9486470930SIngo Molnar/*
9586470930SIngo Molnar * Generalized performance counter event types, used by the hw_event.event_id
96cdd6c482SIngo Molnar * parameter of the sys_perf_event_open() syscall:
9786470930SIngo Molnar */
98b68eebd1SRamkumar Ramachandraenum perf_hw_id {
9986470930SIngo Molnar	/*
10086470930SIngo Molnar	 * Common hardware events, generalized by the kernel:
10186470930SIngo Molnar	 */
102f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_CPU_CYCLES		= 0,
103f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_INSTRUCTIONS		= 1,
104f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_CACHE_REFERENCES		= 2,
105f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_CACHE_MISSES		= 3,
106f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_BRANCH_INSTRUCTIONS	= 4,
107f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_BRANCH_MISSES		= 5,
108f4dbfa8fSPeter Zijlstra	PERF_COUNT_HW_BUS_CYCLES		= 6,
109*438f1a9fSLike Xu	PERF_COUNT_HW_STALLED_CYCLES_FRONTEND	= 7,
110*438f1a9fSLike Xu	PERF_COUNT_HW_STALLED_CYCLES_BACKEND	= 8,
111*438f1a9fSLike Xu	PERF_COUNT_HW_REF_CPU_CYCLES		= 9,
11286470930SIngo Molnar};
11386470930SIngo Molnar
11486470930SIngo MolnarThese are standardized types of events that work relatively uniformly
11586470930SIngo Molnaron all CPUs that implement Performance Counters support under Linux,
11686470930SIngo Molnaralthough there may be variations (e.g., different CPUs might count
11786470930SIngo Molnarcache references and misses at different levels of the cache hierarchy).
11886470930SIngo MolnarIf a CPU is not able to count the selected event, then the system call
11986470930SIngo Molnarwill return -EINVAL.
12086470930SIngo Molnar
12186470930SIngo MolnarMore hw_event_types are supported as well, but they are CPU-specific
12286470930SIngo Molnarand accessed as raw events.  For example, to count "External bus
12386470930SIngo Molnarcycles while bus lock signal asserted" events on Intel Core CPUs, pass
12486470930SIngo Molnarin a 0x4064 event_id value and set hw_event.raw_type to 1.
12586470930SIngo Molnar
12686470930SIngo MolnarA counter of type PERF_TYPE_SOFTWARE will count one of the available
12786470930SIngo Molnarsoftware events, selected by 'event_id':
12886470930SIngo Molnar
12986470930SIngo Molnar/*
13086470930SIngo Molnar * Special "software" counters provided by the kernel, even if the hardware
13186470930SIngo Molnar * does not support performance counters. These counters measure various
13286470930SIngo Molnar * physical and sw events of the kernel (and allow the profiling of them as
13386470930SIngo Molnar * well):
13486470930SIngo Molnar */
135b68eebd1SRamkumar Ramachandraenum perf_sw_ids {
136f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_CPU_CLOCK		= 0,
137f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_TASK_CLOCK	= 1,
138f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_PAGE_FAULTS	= 2,
139f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_CONTEXT_SWITCHES	= 3,
140f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_CPU_MIGRATIONS	= 4,
141f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_PAGE_FAULTS_MIN	= 5,
142f4dbfa8fSPeter Zijlstra	PERF_COUNT_SW_PAGE_FAULTS_MAJ	= 6,
143f7d79860SAnton Blanchard	PERF_COUNT_SW_ALIGNMENT_FAULTS	= 7,
144f7d79860SAnton Blanchard	PERF_COUNT_SW_EMULATION_FAULTS	= 8,
14586470930SIngo Molnar};
14686470930SIngo Molnar
14786470930SIngo MolnarCounters of the type PERF_TYPE_TRACEPOINT are available when the ftrace event
14886470930SIngo Molnartracer is available, and event_id values can be obtained from
14986470930SIngo Molnar/debug/tracing/events/*/*/id
15086470930SIngo Molnar
15186470930SIngo Molnar
15286470930SIngo MolnarCounters come in two flavours: counting counters and sampling
15386470930SIngo Molnarcounters.  A "counting" counter is one that is used for counting the
15486470930SIngo Molnarnumber of events that occur, and is characterised by having
15586470930SIngo Molnarirq_period = 0.
15686470930SIngo Molnar
15786470930SIngo Molnar
15886470930SIngo MolnarA read() on a counter returns the current value of the counter and possible
15986470930SIngo Molnaradditional values as specified by 'read_format', each value is a u64 (8 bytes)
16086470930SIngo Molnarin size.
16186470930SIngo Molnar
16286470930SIngo Molnar/*
16386470930SIngo Molnar * Bits that can be set in hw_event.read_format to request that
16486470930SIngo Molnar * reads on the counter should return the indicated quantities,
16586470930SIngo Molnar * in increasing order of bit value, after the counter value.
16686470930SIngo Molnar */
167cdd6c482SIngo Molnarenum perf_event_read_format {
16886470930SIngo Molnar        PERF_FORMAT_TOTAL_TIME_ENABLED  =  1,
16986470930SIngo Molnar        PERF_FORMAT_TOTAL_TIME_RUNNING  =  2,
17086470930SIngo Molnar};
17186470930SIngo Molnar
17286470930SIngo MolnarUsing these additional values one can establish the overcommit ratio for a
17386470930SIngo Molnarparticular counter allowing one to take the round-robin scheduling effect
17486470930SIngo Molnarinto account.
17586470930SIngo Molnar
17686470930SIngo Molnar
17786470930SIngo MolnarA "sampling" counter is one that is set up to generate an interrupt
17886470930SIngo Molnarevery N events, where N is given by 'irq_period'.  A sampling counter
17986470930SIngo Molnarhas irq_period > 0. The record_type controls what data is recorded on each
18086470930SIngo Molnarinterrupt:
18186470930SIngo Molnar
18286470930SIngo Molnar/*
18386470930SIngo Molnar * Bits that can be set in hw_event.record_type to request information
18486470930SIngo Molnar * in the overflow packets.
18586470930SIngo Molnar */
186cdd6c482SIngo Molnarenum perf_event_record_format {
18786470930SIngo Molnar        PERF_RECORD_IP          = 1U << 0,
18886470930SIngo Molnar        PERF_RECORD_TID         = 1U << 1,
18986470930SIngo Molnar        PERF_RECORD_TIME        = 1U << 2,
19086470930SIngo Molnar        PERF_RECORD_ADDR        = 1U << 3,
19186470930SIngo Molnar        PERF_RECORD_GROUP       = 1U << 4,
19286470930SIngo Molnar        PERF_RECORD_CALLCHAIN   = 1U << 5,
19386470930SIngo Molnar};
19486470930SIngo Molnar
19586470930SIngo MolnarSuch (and other) events will be recorded in a ring-buffer, which is
19686470930SIngo Molnaravailable to user-space using mmap() (see below).
19786470930SIngo Molnar
19886470930SIngo MolnarThe 'disabled' bit specifies whether the counter starts out disabled
19986470930SIngo Molnaror enabled.  If it is initially disabled, it can be enabled by ioctl
20086470930SIngo Molnaror prctl (see below).
20186470930SIngo Molnar
20286470930SIngo MolnarThe 'inherit' bit, if set, specifies that this counter should count
20386470930SIngo Molnarevents on descendant tasks as well as the task specified.  This only
20486470930SIngo Molnarapplies to new descendents, not to any existing descendents at the
20586470930SIngo Molnartime the counter is created (nor to any new descendents of existing
20686470930SIngo Molnardescendents).
20786470930SIngo Molnar
20886470930SIngo MolnarThe 'pinned' bit, if set, specifies that the counter should always be
20986470930SIngo Molnaron the CPU if at all possible.  It only applies to hardware counters
21086470930SIngo Molnarand only to group leaders.  If a pinned counter cannot be put onto the
21186470930SIngo MolnarCPU (e.g. because there are not enough hardware counters or because of
21286470930SIngo Molnara conflict with some other event), then the counter goes into an
21386470930SIngo Molnar'error' state, where reads return end-of-file (i.e. read() returns 0)
21486470930SIngo Molnaruntil the counter is subsequently enabled or disabled.
21586470930SIngo Molnar
21686470930SIngo MolnarThe 'exclusive' bit, if set, specifies that when this counter's group
21786470930SIngo Molnaris on the CPU, it should be the only group using the CPU's counters.
21886470930SIngo MolnarIn future, this will allow sophisticated monitoring programs to supply
21986470930SIngo Molnarextra configuration information via 'extra_config_len' to exploit
22086470930SIngo Molnaradvanced features of the CPU's Performance Monitor Unit (PMU) that are
22186470930SIngo Molnarnot otherwise accessible and that might disrupt other hardware
22286470930SIngo Molnarcounters.
22386470930SIngo Molnar
22486470930SIngo MolnarThe 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
22586470930SIngo Molnarway to request that counting of events be restricted to times when the
22686470930SIngo MolnarCPU is in user, kernel and/or hypervisor mode.
22786470930SIngo Molnar
22823e232bdSAndrew MurrayFurthermore the 'exclude_host' and 'exclude_guest' bits provide a way
22923e232bdSAndrew Murrayto request counting of events restricted to guest and host contexts when
23023e232bdSAndrew Murrayusing Linux as the hypervisor.
23123e232bdSAndrew Murray
23286470930SIngo MolnarThe 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
23386470930SIngo Molnaroperations, these can be used to relate userspace IP addresses to actual
23486470930SIngo Molnarcode, even after the mapping (or even the whole process) is gone,
23586470930SIngo Molnarthese events are recorded in the ring-buffer (see below).
23686470930SIngo Molnar
23786470930SIngo MolnarThe 'comm' bit allows tracking of process comm data on process creation.
23886470930SIngo MolnarThis too is recorded in the ring-buffer (see below).
23986470930SIngo Molnar
240b68eebd1SRamkumar RamachandraThe 'pid' parameter to the sys_perf_event_open() system call allows the
24186470930SIngo Molnarcounter to be specific to a task:
24286470930SIngo Molnar
24386470930SIngo Molnar pid == 0: if the pid parameter is zero, the counter is attached to the
24486470930SIngo Molnar current task.
24586470930SIngo Molnar
24686470930SIngo Molnar pid > 0: the counter is attached to a specific task (if the current task
24786470930SIngo Molnar has sufficient privilege to do so)
24886470930SIngo Molnar
24986470930SIngo Molnar pid < 0: all tasks are counted (per cpu counters)
25086470930SIngo Molnar
25186470930SIngo MolnarThe 'cpu' parameter allows a counter to be made specific to a CPU:
25286470930SIngo Molnar
25386470930SIngo Molnar cpu >= 0: the counter is restricted to a specific CPU
25486470930SIngo Molnar cpu == -1: the counter counts on all CPUs
25586470930SIngo Molnar
25686470930SIngo Molnar(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.)
25786470930SIngo Molnar
25886470930SIngo MolnarA 'pid > 0' and 'cpu == -1' counter is a per task counter that counts
25986470930SIngo Molnarevents of that task and 'follows' that task to whatever CPU the task
26086470930SIngo Molnargets schedule to. Per task counters can be created by any user, for
26186470930SIngo Molnartheir own tasks.
26286470930SIngo Molnar
26386470930SIngo MolnarA 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
2646b3e0e2eSAlexey Budankovall events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN
2656b3e0e2eSAlexey Budankovprivilege.
26686470930SIngo Molnar
26786470930SIngo MolnarThe 'flags' parameter is currently unused and must be zero.
26886470930SIngo Molnar
26986470930SIngo MolnarThe 'group_fd' parameter allows counter "groups" to be set up.  A
27086470930SIngo Molnarcounter group has one counter which is the group "leader".  The leader
271b68eebd1SRamkumar Ramachandrais created first, with group_fd = -1 in the sys_perf_event_open call
27286470930SIngo Molnarthat creates it.  The rest of the group members are created
27386470930SIngo Molnarsubsequently, with group_fd giving the fd of the group leader.
27486470930SIngo Molnar(A single counter on its own is created with group_fd = -1 and is
27586470930SIngo Molnarconsidered to be a group with only 1 member.)
27686470930SIngo Molnar
27786470930SIngo MolnarA counter group is scheduled onto the CPU as a unit, that is, it will
27886470930SIngo Molnaronly be put onto the CPU if all of the counters in the group can be
27986470930SIngo Molnarput onto the CPU.  This means that the values of the member counters
28086470930SIngo Molnarcan be meaningfully compared, added, divided (to get ratios), etc.,
28186470930SIngo Molnarwith each other, since they have counted events for the same set of
28286470930SIngo Molnarexecuted instructions.
28386470930SIngo Molnar
28486470930SIngo Molnar
28586470930SIngo MolnarLike stated, asynchronous events, like counter overflow or PROT_EXEC mmap
28686470930SIngo Molnartracking are logged into a ring-buffer. This ring-buffer is created and
28786470930SIngo Molnaraccessed through mmap().
28886470930SIngo Molnar
28986470930SIngo MolnarThe mmap size should be 1+2^n pages, where the first page is a meta-data page
290cdd6c482SIngo Molnar(struct perf_event_mmap_page) that contains various bits of information such
29186470930SIngo Molnaras where the ring-buffer head is.
29286470930SIngo Molnar
29386470930SIngo Molnar/*
29486470930SIngo Molnar * Structure of the page that can be mapped via mmap
29586470930SIngo Molnar */
296cdd6c482SIngo Molnarstruct perf_event_mmap_page {
29786470930SIngo Molnar        __u32   version;                /* version number of this structure */
29886470930SIngo Molnar        __u32   compat_version;         /* lowest version this is compat with */
29986470930SIngo Molnar
30086470930SIngo Molnar        /*
30186470930SIngo Molnar         * Bits needed to read the hw counters in user-space.
30286470930SIngo Molnar         *
30386470930SIngo Molnar         *   u32 seq;
30486470930SIngo Molnar         *   s64 count;
30586470930SIngo Molnar         *
30686470930SIngo Molnar         *   do {
30786470930SIngo Molnar         *     seq = pc->lock;
30886470930SIngo Molnar         *
30986470930SIngo Molnar         *     barrier()
31086470930SIngo Molnar         *     if (pc->index) {
31186470930SIngo Molnar         *       count = pmc_read(pc->index - 1);
31286470930SIngo Molnar         *       count += pc->offset;
31386470930SIngo Molnar         *     } else
31486470930SIngo Molnar         *       goto regular_read;
31586470930SIngo Molnar         *
31686470930SIngo Molnar         *     barrier();
31786470930SIngo Molnar         *   } while (pc->lock != seq);
31886470930SIngo Molnar         *
31986470930SIngo Molnar         * NOTE: for obvious reason this only works on self-monitoring
32086470930SIngo Molnar         *       processes.
32186470930SIngo Molnar         */
32286470930SIngo Molnar        __u32   lock;                   /* seqlock for synchronization */
32386470930SIngo Molnar        __u32   index;                  /* hardware counter identifier */
32486470930SIngo Molnar        __s64   offset;                 /* add to hardware counter value */
32586470930SIngo Molnar
32686470930SIngo Molnar        /*
32786470930SIngo Molnar         * Control data for the mmap() data buffer.
32886470930SIngo Molnar         *
32986470930SIngo Molnar         * User-space reading this value should issue an rmb(), on SMP capable
330cdd6c482SIngo Molnar         * platforms, after reading this value -- see perf_event_wakeup().
33186470930SIngo Molnar         */
33286470930SIngo Molnar        __u32   data_head;              /* head in the data section */
33386470930SIngo Molnar};
33486470930SIngo Molnar
33586470930SIngo MolnarNOTE: the hw-counter userspace bits are arch specific and are currently only
33686470930SIngo Molnar      implemented on powerpc.
33786470930SIngo Molnar
33886470930SIngo MolnarThe following 2^n pages are the ring-buffer which contains events of the form:
33986470930SIngo Molnar
340cdd6c482SIngo Molnar#define PERF_RECORD_MISC_KERNEL          (1 << 0)
341cdd6c482SIngo Molnar#define PERF_RECORD_MISC_USER            (1 << 1)
342cdd6c482SIngo Molnar#define PERF_RECORD_MISC_OVERFLOW        (1 << 2)
34386470930SIngo Molnar
34486470930SIngo Molnarstruct perf_event_header {
34586470930SIngo Molnar        __u32   type;
34686470930SIngo Molnar        __u16   misc;
34786470930SIngo Molnar        __u16   size;
34886470930SIngo Molnar};
34986470930SIngo Molnar
35086470930SIngo Molnarenum perf_event_type {
35186470930SIngo Molnar
35286470930SIngo Molnar        /*
35386470930SIngo Molnar         * The MMAP events record the PROT_EXEC mappings so that we can
35486470930SIngo Molnar         * correlate userspace IPs to code. They have the following structure:
35586470930SIngo Molnar         *
35686470930SIngo Molnar         * struct {
35786470930SIngo Molnar         *      struct perf_event_header        header;
35886470930SIngo Molnar         *
35986470930SIngo Molnar         *      u32                             pid, tid;
36086470930SIngo Molnar         *      u64                             addr;
36186470930SIngo Molnar         *      u64                             len;
36286470930SIngo Molnar         *      u64                             pgoff;
36386470930SIngo Molnar         *      char                            filename[];
36486470930SIngo Molnar         * };
36586470930SIngo Molnar         */
366cdd6c482SIngo Molnar        PERF_RECORD_MMAP                 = 1,
367cdd6c482SIngo Molnar        PERF_RECORD_MUNMAP               = 2,
36886470930SIngo Molnar
36986470930SIngo Molnar        /*
37086470930SIngo Molnar         * struct {
37186470930SIngo Molnar         *      struct perf_event_header        header;
37286470930SIngo Molnar         *
37386470930SIngo Molnar         *      u32                             pid, tid;
37486470930SIngo Molnar         *      char                            comm[];
37586470930SIngo Molnar         * };
37686470930SIngo Molnar         */
377cdd6c482SIngo Molnar        PERF_RECORD_COMM                 = 3,
37886470930SIngo Molnar
37986470930SIngo Molnar        /*
380cdd6c482SIngo Molnar         * When header.misc & PERF_RECORD_MISC_OVERFLOW the event_type field
38186470930SIngo Molnar         * will be PERF_RECORD_*
38286470930SIngo Molnar         *
38386470930SIngo Molnar         * struct {
38486470930SIngo Molnar         *      struct perf_event_header        header;
38586470930SIngo Molnar         *
38686470930SIngo Molnar         *      { u64                   ip;       } && PERF_RECORD_IP
38786470930SIngo Molnar         *      { u32                   pid, tid; } && PERF_RECORD_TID
38886470930SIngo Molnar         *      { u64                   time;     } && PERF_RECORD_TIME
38986470930SIngo Molnar         *      { u64                   addr;     } && PERF_RECORD_ADDR
39086470930SIngo Molnar         *
39186470930SIngo Molnar         *      { u64                   nr;
39286470930SIngo Molnar         *        { u64 event, val; }   cnt[nr];  } && PERF_RECORD_GROUP
39386470930SIngo Molnar         *
39486470930SIngo Molnar         *      { u16                   nr,
39586470930SIngo Molnar         *                              hv,
39686470930SIngo Molnar         *                              kernel,
39786470930SIngo Molnar         *                              user;
39886470930SIngo Molnar         *        u64                   ips[nr];  } && PERF_RECORD_CALLCHAIN
39986470930SIngo Molnar         * };
40086470930SIngo Molnar         */
40186470930SIngo Molnar};
40286470930SIngo Molnar
40386470930SIngo MolnarNOTE: PERF_RECORD_CALLCHAIN is arch specific and currently only implemented
40486470930SIngo Molnar      on x86.
40586470930SIngo Molnar
40686470930SIngo MolnarNotification of new events is possible through poll()/select()/epoll() and
40786470930SIngo Molnarfcntl() managing signals.
40886470930SIngo Molnar
40986470930SIngo MolnarNormally a notification is generated for every page filled, however one can
4100b413e44STim Blechmannadditionally set perf_event_attr.wakeup_events to generate one every
41186470930SIngo Molnarso many counter overflow events.
41286470930SIngo Molnar
41386470930SIngo MolnarFuture work will include a splice() interface to the ring-buffer.
41486470930SIngo Molnar
41586470930SIngo Molnar
41686470930SIngo MolnarCounters can be enabled and disabled in two ways: via ioctl and via
41786470930SIngo Molnarprctl.  When a counter is disabled, it doesn't count or generate
41886470930SIngo Molnarevents but does continue to exist and maintain its count value.
41986470930SIngo Molnar
420a59e64a1SNamhyung KimAn individual counter can be enabled with
42186470930SIngo Molnar
422a59e64a1SNamhyung Kim	ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
42386470930SIngo Molnar
42486470930SIngo Molnaror disabled with
42586470930SIngo Molnar
426a59e64a1SNamhyung Kim	ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
42786470930SIngo Molnar
428a59e64a1SNamhyung KimFor a counter group, pass PERF_IOC_FLAG_GROUP as the third argument.
42986470930SIngo MolnarEnabling or disabling the leader of a group enables or disables the
43086470930SIngo Molnarwhole group; that is, while the group leader is disabled, none of the
43186470930SIngo Molnarcounters in the group will count.  Enabling or disabling a member of a
43286470930SIngo Molnargroup other than the leader only affects that counter - disabling an
43386470930SIngo Molnarnon-leader stops that counter from counting but doesn't affect any
43486470930SIngo Molnarother counter.
43586470930SIngo Molnar
43686470930SIngo MolnarAdditionally, non-inherited overflow counters can use
43786470930SIngo Molnar
438cdd6c482SIngo Molnar	ioctl(fd, PERF_EVENT_IOC_REFRESH, nr);
43986470930SIngo Molnar
44086470930SIngo Molnarto enable a counter for 'nr' events, after which it gets disabled again.
44186470930SIngo Molnar
44286470930SIngo MolnarA process can enable or disable all the counter groups that are
44386470930SIngo Molnarattached to it, using prctl:
44486470930SIngo Molnar
445cdd6c482SIngo Molnar	prctl(PR_TASK_PERF_EVENTS_ENABLE);
44686470930SIngo Molnar
447cdd6c482SIngo Molnar	prctl(PR_TASK_PERF_EVENTS_DISABLE);
44886470930SIngo Molnar
44986470930SIngo MolnarThis applies to all counters on the current process, whether created
45086470930SIngo Molnarby this process or by another, and doesn't affect any counters that
45186470930SIngo Molnarthis process has created on other processes.  It only enables or
45286470930SIngo Molnardisables the group leaders, not any other members in the groups.
45386470930SIngo Molnar
454018df72dSMike Frysinger
455018df72dSMike FrysingerArch requirements
456018df72dSMike Frysinger-----------------
457018df72dSMike Frysinger
458018df72dSMike FrysingerIf your architecture does not have hardware performance metrics, you can
459018df72dSMike Frysingerstill use the generic software counters based on hrtimers for sampling.
460018df72dSMike Frysinger
461cdd6c482SIngo MolnarSo to start with, in order to add HAVE_PERF_EVENTS to your Kconfig, you
462018df72dSMike Frysingerwill need at least this:
463cdd6c482SIngo Molnar	- asm/perf_event.h - a basic stub will suffice at first
464018df72dSMike Frysinger	- support for atomic64 types (and associated helper functions)
465018df72dSMike Frysinger
466018df72dSMike FrysingerIf your architecture does have hardware capabilities, you can override the
467cdd6c482SIngo Molnarweak stub hw_perf_event_init() to register hardware counters.
468906010b2SPeter Zijlstra
469906010b2SPeter ZijlstraArchitectures that have d-cache aliassing issues, such as Sparc and ARM,
470906010b2SPeter Zijlstrashould select PERF_USE_VMALLOC in order to avoid these for perf mmap().
471