186470930SIngo Molnar 286470930SIngo MolnarPerformance Counters for Linux 386470930SIngo Molnar------------------------------ 486470930SIngo Molnar 586470930SIngo MolnarPerformance counters are special hardware registers available on most modern 686470930SIngo MolnarCPUs. These registers count the number of certain types of hw events: such 786470930SIngo Molnaras instructions executed, cachemisses suffered, or branches mis-predicted - 886470930SIngo Molnarwithout slowing down the kernel or applications. These registers can also 986470930SIngo Molnartrigger interrupts when a threshold number of events have passed - and can 1086470930SIngo Molnarthus be used to profile the code that runs on that CPU. 1186470930SIngo Molnar 1286470930SIngo MolnarThe Linux Performance Counter subsystem provides an abstraction of these 1386470930SIngo Molnarhardware capabilities. It provides per task and per CPU counters, counter 1486470930SIngo Molnargroups, and it provides event capabilities on top of those. It 1586470930SIngo Molnarprovides "virtual" 64-bit counters, regardless of the width of the 1686470930SIngo Molnarunderlying hardware counters. 1786470930SIngo Molnar 1886470930SIngo MolnarPerformance counters are accessed via special file descriptors. 1986470930SIngo MolnarThere's one file descriptor per virtual counter used. 2086470930SIngo Molnar 21b68eebd1SRamkumar RamachandraThe special file descriptor is opened via the sys_perf_event_open() 2286470930SIngo Molnarsystem call: 2386470930SIngo Molnar 240b413e44STim Blechmann int sys_perf_event_open(struct perf_event_attr *hw_event_uptr, 2586470930SIngo Molnar pid_t pid, int cpu, int group_fd, 2686470930SIngo Molnar unsigned long flags); 2786470930SIngo Molnar 2886470930SIngo MolnarThe syscall returns the new fd. The fd can be used via the normal 2986470930SIngo MolnarVFS system calls: read() can be used to read the counter, fcntl() 3086470930SIngo Molnarcan be used to set the blocking mode, etc. 3186470930SIngo Molnar 3286470930SIngo MolnarMultiple counters can be kept open at a time, and the counters 3386470930SIngo Molnarcan be poll()ed. 3486470930SIngo Molnar 350b413e44STim BlechmannWhen creating a new counter fd, 'perf_event_attr' is: 3686470930SIngo Molnar 370b413e44STim Blechmannstruct perf_event_attr { 3886470930SIngo Molnar /* 3986470930SIngo Molnar * The MSB of the config word signifies if the rest contains cpu 4086470930SIngo Molnar * specific (raw) counter configuration data, if unset, the next 4186470930SIngo Molnar * 7 bits are an event type and the rest of the bits are the event 4286470930SIngo Molnar * identifier. 4386470930SIngo Molnar */ 4486470930SIngo Molnar __u64 config; 4586470930SIngo Molnar 4686470930SIngo Molnar __u64 irq_period; 4786470930SIngo Molnar __u32 record_type; 4886470930SIngo Molnar __u32 read_format; 4986470930SIngo Molnar 5086470930SIngo Molnar __u64 disabled : 1, /* off by default */ 5186470930SIngo Molnar inherit : 1, /* children inherit it */ 5286470930SIngo Molnar pinned : 1, /* must always be on PMU */ 5386470930SIngo Molnar exclusive : 1, /* only group on PMU */ 5486470930SIngo Molnar exclude_user : 1, /* don't count user */ 5586470930SIngo Molnar exclude_kernel : 1, /* ditto kernel */ 5686470930SIngo Molnar exclude_hv : 1, /* ditto hypervisor */ 5786470930SIngo Molnar exclude_idle : 1, /* don't count when idle */ 5886470930SIngo Molnar mmap : 1, /* include mmap data */ 5986470930SIngo Molnar munmap : 1, /* include munmap data */ 6086470930SIngo Molnar comm : 1, /* include comm data */ 6186470930SIngo Molnar 6286470930SIngo Molnar __reserved_1 : 52; 6386470930SIngo Molnar 6486470930SIngo Molnar __u32 extra_config_len; 6586470930SIngo Molnar __u32 wakeup_events; /* wakeup every n events */ 6686470930SIngo Molnar 6786470930SIngo Molnar __u64 __reserved_2; 6886470930SIngo Molnar __u64 __reserved_3; 6986470930SIngo Molnar}; 7086470930SIngo Molnar 7186470930SIngo MolnarThe 'config' field specifies what the counter should count. It 7286470930SIngo Molnaris divided into 3 bit-fields: 7386470930SIngo Molnar 7486470930SIngo Molnarraw_type: 1 bit (most significant bit) 0x8000_0000_0000_0000 7586470930SIngo Molnartype: 7 bits (next most significant) 0x7f00_0000_0000_0000 7686470930SIngo Molnarevent_id: 56 bits (least significant) 0x00ff_ffff_ffff_ffff 7786470930SIngo Molnar 7886470930SIngo MolnarIf 'raw_type' is 1, then the counter will count a hardware event 7986470930SIngo Molnarspecified by the remaining 63 bits of event_config. The encoding is 8086470930SIngo Molnarmachine-specific. 8186470930SIngo Molnar 8286470930SIngo MolnarIf 'raw_type' is 0, then the 'type' field says what kind of counter 8386470930SIngo Molnarthis is, with the following encoding: 8486470930SIngo Molnar 85b68eebd1SRamkumar Ramachandraenum perf_type_id { 8686470930SIngo Molnar PERF_TYPE_HARDWARE = 0, 8786470930SIngo Molnar PERF_TYPE_SOFTWARE = 1, 8886470930SIngo Molnar PERF_TYPE_TRACEPOINT = 2, 8986470930SIngo Molnar}; 9086470930SIngo Molnar 9186470930SIngo MolnarA counter of PERF_TYPE_HARDWARE will count the hardware event 9286470930SIngo Molnarspecified by 'event_id': 9386470930SIngo Molnar 9486470930SIngo Molnar/* 9586470930SIngo Molnar * Generalized performance counter event types, used by the hw_event.event_id 96cdd6c482SIngo Molnar * parameter of the sys_perf_event_open() syscall: 9786470930SIngo Molnar */ 98b68eebd1SRamkumar Ramachandraenum perf_hw_id { 9986470930SIngo Molnar /* 10086470930SIngo Molnar * Common hardware events, generalized by the kernel: 10186470930SIngo Molnar */ 102f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_CPU_CYCLES = 0, 103f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_INSTRUCTIONS = 1, 104f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_CACHE_REFERENCES = 2, 105f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_CACHE_MISSES = 3, 106f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4, 107f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_BRANCH_MISSES = 5, 108f4dbfa8fSPeter Zijlstra PERF_COUNT_HW_BUS_CYCLES = 6, 109*438f1a9fSLike Xu PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7, 110*438f1a9fSLike Xu PERF_COUNT_HW_STALLED_CYCLES_BACKEND = 8, 111*438f1a9fSLike Xu PERF_COUNT_HW_REF_CPU_CYCLES = 9, 11286470930SIngo Molnar}; 11386470930SIngo Molnar 11486470930SIngo MolnarThese are standardized types of events that work relatively uniformly 11586470930SIngo Molnaron all CPUs that implement Performance Counters support under Linux, 11686470930SIngo Molnaralthough there may be variations (e.g., different CPUs might count 11786470930SIngo Molnarcache references and misses at different levels of the cache hierarchy). 11886470930SIngo MolnarIf a CPU is not able to count the selected event, then the system call 11986470930SIngo Molnarwill return -EINVAL. 12086470930SIngo Molnar 12186470930SIngo MolnarMore hw_event_types are supported as well, but they are CPU-specific 12286470930SIngo Molnarand accessed as raw events. For example, to count "External bus 12386470930SIngo Molnarcycles while bus lock signal asserted" events on Intel Core CPUs, pass 12486470930SIngo Molnarin a 0x4064 event_id value and set hw_event.raw_type to 1. 12586470930SIngo Molnar 12686470930SIngo MolnarA counter of type PERF_TYPE_SOFTWARE will count one of the available 12786470930SIngo Molnarsoftware events, selected by 'event_id': 12886470930SIngo Molnar 12986470930SIngo Molnar/* 13086470930SIngo Molnar * Special "software" counters provided by the kernel, even if the hardware 13186470930SIngo Molnar * does not support performance counters. These counters measure various 13286470930SIngo Molnar * physical and sw events of the kernel (and allow the profiling of them as 13386470930SIngo Molnar * well): 13486470930SIngo Molnar */ 135b68eebd1SRamkumar Ramachandraenum perf_sw_ids { 136f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_CPU_CLOCK = 0, 137f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_TASK_CLOCK = 1, 138f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_PAGE_FAULTS = 2, 139f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_CONTEXT_SWITCHES = 3, 140f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_CPU_MIGRATIONS = 4, 141f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_PAGE_FAULTS_MIN = 5, 142f4dbfa8fSPeter Zijlstra PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6, 143f7d79860SAnton Blanchard PERF_COUNT_SW_ALIGNMENT_FAULTS = 7, 144f7d79860SAnton Blanchard PERF_COUNT_SW_EMULATION_FAULTS = 8, 14586470930SIngo Molnar}; 14686470930SIngo Molnar 14786470930SIngo MolnarCounters of the type PERF_TYPE_TRACEPOINT are available when the ftrace event 14886470930SIngo Molnartracer is available, and event_id values can be obtained from 14986470930SIngo Molnar/debug/tracing/events/*/*/id 15086470930SIngo Molnar 15186470930SIngo Molnar 15286470930SIngo MolnarCounters come in two flavours: counting counters and sampling 15386470930SIngo Molnarcounters. A "counting" counter is one that is used for counting the 15486470930SIngo Molnarnumber of events that occur, and is characterised by having 15586470930SIngo Molnarirq_period = 0. 15686470930SIngo Molnar 15786470930SIngo Molnar 15886470930SIngo MolnarA read() on a counter returns the current value of the counter and possible 15986470930SIngo Molnaradditional values as specified by 'read_format', each value is a u64 (8 bytes) 16086470930SIngo Molnarin size. 16186470930SIngo Molnar 16286470930SIngo Molnar/* 16386470930SIngo Molnar * Bits that can be set in hw_event.read_format to request that 16486470930SIngo Molnar * reads on the counter should return the indicated quantities, 16586470930SIngo Molnar * in increasing order of bit value, after the counter value. 16686470930SIngo Molnar */ 167cdd6c482SIngo Molnarenum perf_event_read_format { 16886470930SIngo Molnar PERF_FORMAT_TOTAL_TIME_ENABLED = 1, 16986470930SIngo Molnar PERF_FORMAT_TOTAL_TIME_RUNNING = 2, 17086470930SIngo Molnar}; 17186470930SIngo Molnar 17286470930SIngo MolnarUsing these additional values one can establish the overcommit ratio for a 17386470930SIngo Molnarparticular counter allowing one to take the round-robin scheduling effect 17486470930SIngo Molnarinto account. 17586470930SIngo Molnar 17686470930SIngo Molnar 17786470930SIngo MolnarA "sampling" counter is one that is set up to generate an interrupt 17886470930SIngo Molnarevery N events, where N is given by 'irq_period'. A sampling counter 17986470930SIngo Molnarhas irq_period > 0. The record_type controls what data is recorded on each 18086470930SIngo Molnarinterrupt: 18186470930SIngo Molnar 18286470930SIngo Molnar/* 18386470930SIngo Molnar * Bits that can be set in hw_event.record_type to request information 18486470930SIngo Molnar * in the overflow packets. 18586470930SIngo Molnar */ 186cdd6c482SIngo Molnarenum perf_event_record_format { 18786470930SIngo Molnar PERF_RECORD_IP = 1U << 0, 18886470930SIngo Molnar PERF_RECORD_TID = 1U << 1, 18986470930SIngo Molnar PERF_RECORD_TIME = 1U << 2, 19086470930SIngo Molnar PERF_RECORD_ADDR = 1U << 3, 19186470930SIngo Molnar PERF_RECORD_GROUP = 1U << 4, 19286470930SIngo Molnar PERF_RECORD_CALLCHAIN = 1U << 5, 19386470930SIngo Molnar}; 19486470930SIngo Molnar 19586470930SIngo MolnarSuch (and other) events will be recorded in a ring-buffer, which is 19686470930SIngo Molnaravailable to user-space using mmap() (see below). 19786470930SIngo Molnar 19886470930SIngo MolnarThe 'disabled' bit specifies whether the counter starts out disabled 19986470930SIngo Molnaror enabled. If it is initially disabled, it can be enabled by ioctl 20086470930SIngo Molnaror prctl (see below). 20186470930SIngo Molnar 20286470930SIngo MolnarThe 'inherit' bit, if set, specifies that this counter should count 20386470930SIngo Molnarevents on descendant tasks as well as the task specified. This only 20486470930SIngo Molnarapplies to new descendents, not to any existing descendents at the 20586470930SIngo Molnartime the counter is created (nor to any new descendents of existing 20686470930SIngo Molnardescendents). 20786470930SIngo Molnar 20886470930SIngo MolnarThe 'pinned' bit, if set, specifies that the counter should always be 20986470930SIngo Molnaron the CPU if at all possible. It only applies to hardware counters 21086470930SIngo Molnarand only to group leaders. If a pinned counter cannot be put onto the 21186470930SIngo MolnarCPU (e.g. because there are not enough hardware counters or because of 21286470930SIngo Molnara conflict with some other event), then the counter goes into an 21386470930SIngo Molnar'error' state, where reads return end-of-file (i.e. read() returns 0) 21486470930SIngo Molnaruntil the counter is subsequently enabled or disabled. 21586470930SIngo Molnar 21686470930SIngo MolnarThe 'exclusive' bit, if set, specifies that when this counter's group 21786470930SIngo Molnaris on the CPU, it should be the only group using the CPU's counters. 21886470930SIngo MolnarIn future, this will allow sophisticated monitoring programs to supply 21986470930SIngo Molnarextra configuration information via 'extra_config_len' to exploit 22086470930SIngo Molnaradvanced features of the CPU's Performance Monitor Unit (PMU) that are 22186470930SIngo Molnarnot otherwise accessible and that might disrupt other hardware 22286470930SIngo Molnarcounters. 22386470930SIngo Molnar 22486470930SIngo MolnarThe 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a 22586470930SIngo Molnarway to request that counting of events be restricted to times when the 22686470930SIngo MolnarCPU is in user, kernel and/or hypervisor mode. 22786470930SIngo Molnar 22823e232bdSAndrew MurrayFurthermore the 'exclude_host' and 'exclude_guest' bits provide a way 22923e232bdSAndrew Murrayto request counting of events restricted to guest and host contexts when 23023e232bdSAndrew Murrayusing Linux as the hypervisor. 23123e232bdSAndrew Murray 23286470930SIngo MolnarThe 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap 23386470930SIngo Molnaroperations, these can be used to relate userspace IP addresses to actual 23486470930SIngo Molnarcode, even after the mapping (or even the whole process) is gone, 23586470930SIngo Molnarthese events are recorded in the ring-buffer (see below). 23686470930SIngo Molnar 23786470930SIngo MolnarThe 'comm' bit allows tracking of process comm data on process creation. 23886470930SIngo MolnarThis too is recorded in the ring-buffer (see below). 23986470930SIngo Molnar 240b68eebd1SRamkumar RamachandraThe 'pid' parameter to the sys_perf_event_open() system call allows the 24186470930SIngo Molnarcounter to be specific to a task: 24286470930SIngo Molnar 24386470930SIngo Molnar pid == 0: if the pid parameter is zero, the counter is attached to the 24486470930SIngo Molnar current task. 24586470930SIngo Molnar 24686470930SIngo Molnar pid > 0: the counter is attached to a specific task (if the current task 24786470930SIngo Molnar has sufficient privilege to do so) 24886470930SIngo Molnar 24986470930SIngo Molnar pid < 0: all tasks are counted (per cpu counters) 25086470930SIngo Molnar 25186470930SIngo MolnarThe 'cpu' parameter allows a counter to be made specific to a CPU: 25286470930SIngo Molnar 25386470930SIngo Molnar cpu >= 0: the counter is restricted to a specific CPU 25486470930SIngo Molnar cpu == -1: the counter counts on all CPUs 25586470930SIngo Molnar 25686470930SIngo Molnar(Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.) 25786470930SIngo Molnar 25886470930SIngo MolnarA 'pid > 0' and 'cpu == -1' counter is a per task counter that counts 25986470930SIngo Molnarevents of that task and 'follows' that task to whatever CPU the task 26086470930SIngo Molnargets schedule to. Per task counters can be created by any user, for 26186470930SIngo Molnartheir own tasks. 26286470930SIngo Molnar 26386470930SIngo MolnarA 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts 2646b3e0e2eSAlexey Budankovall events on CPU-x. Per CPU counters need CAP_PERFMON or CAP_SYS_ADMIN 2656b3e0e2eSAlexey Budankovprivilege. 26686470930SIngo Molnar 26786470930SIngo MolnarThe 'flags' parameter is currently unused and must be zero. 26886470930SIngo Molnar 26986470930SIngo MolnarThe 'group_fd' parameter allows counter "groups" to be set up. A 27086470930SIngo Molnarcounter group has one counter which is the group "leader". The leader 271b68eebd1SRamkumar Ramachandrais created first, with group_fd = -1 in the sys_perf_event_open call 27286470930SIngo Molnarthat creates it. The rest of the group members are created 27386470930SIngo Molnarsubsequently, with group_fd giving the fd of the group leader. 27486470930SIngo Molnar(A single counter on its own is created with group_fd = -1 and is 27586470930SIngo Molnarconsidered to be a group with only 1 member.) 27686470930SIngo Molnar 27786470930SIngo MolnarA counter group is scheduled onto the CPU as a unit, that is, it will 27886470930SIngo Molnaronly be put onto the CPU if all of the counters in the group can be 27986470930SIngo Molnarput onto the CPU. This means that the values of the member counters 28086470930SIngo Molnarcan be meaningfully compared, added, divided (to get ratios), etc., 28186470930SIngo Molnarwith each other, since they have counted events for the same set of 28286470930SIngo Molnarexecuted instructions. 28386470930SIngo Molnar 28486470930SIngo Molnar 28586470930SIngo MolnarLike stated, asynchronous events, like counter overflow or PROT_EXEC mmap 28686470930SIngo Molnartracking are logged into a ring-buffer. This ring-buffer is created and 28786470930SIngo Molnaraccessed through mmap(). 28886470930SIngo Molnar 28986470930SIngo MolnarThe mmap size should be 1+2^n pages, where the first page is a meta-data page 290cdd6c482SIngo Molnar(struct perf_event_mmap_page) that contains various bits of information such 29186470930SIngo Molnaras where the ring-buffer head is. 29286470930SIngo Molnar 29386470930SIngo Molnar/* 29486470930SIngo Molnar * Structure of the page that can be mapped via mmap 29586470930SIngo Molnar */ 296cdd6c482SIngo Molnarstruct perf_event_mmap_page { 29786470930SIngo Molnar __u32 version; /* version number of this structure */ 29886470930SIngo Molnar __u32 compat_version; /* lowest version this is compat with */ 29986470930SIngo Molnar 30086470930SIngo Molnar /* 30186470930SIngo Molnar * Bits needed to read the hw counters in user-space. 30286470930SIngo Molnar * 30386470930SIngo Molnar * u32 seq; 30486470930SIngo Molnar * s64 count; 30586470930SIngo Molnar * 30686470930SIngo Molnar * do { 30786470930SIngo Molnar * seq = pc->lock; 30886470930SIngo Molnar * 30986470930SIngo Molnar * barrier() 31086470930SIngo Molnar * if (pc->index) { 31186470930SIngo Molnar * count = pmc_read(pc->index - 1); 31286470930SIngo Molnar * count += pc->offset; 31386470930SIngo Molnar * } else 31486470930SIngo Molnar * goto regular_read; 31586470930SIngo Molnar * 31686470930SIngo Molnar * barrier(); 31786470930SIngo Molnar * } while (pc->lock != seq); 31886470930SIngo Molnar * 31986470930SIngo Molnar * NOTE: for obvious reason this only works on self-monitoring 32086470930SIngo Molnar * processes. 32186470930SIngo Molnar */ 32286470930SIngo Molnar __u32 lock; /* seqlock for synchronization */ 32386470930SIngo Molnar __u32 index; /* hardware counter identifier */ 32486470930SIngo Molnar __s64 offset; /* add to hardware counter value */ 32586470930SIngo Molnar 32686470930SIngo Molnar /* 32786470930SIngo Molnar * Control data for the mmap() data buffer. 32886470930SIngo Molnar * 32986470930SIngo Molnar * User-space reading this value should issue an rmb(), on SMP capable 330cdd6c482SIngo Molnar * platforms, after reading this value -- see perf_event_wakeup(). 33186470930SIngo Molnar */ 33286470930SIngo Molnar __u32 data_head; /* head in the data section */ 33386470930SIngo Molnar}; 33486470930SIngo Molnar 33586470930SIngo MolnarNOTE: the hw-counter userspace bits are arch specific and are currently only 33686470930SIngo Molnar implemented on powerpc. 33786470930SIngo Molnar 33886470930SIngo MolnarThe following 2^n pages are the ring-buffer which contains events of the form: 33986470930SIngo Molnar 340cdd6c482SIngo Molnar#define PERF_RECORD_MISC_KERNEL (1 << 0) 341cdd6c482SIngo Molnar#define PERF_RECORD_MISC_USER (1 << 1) 342cdd6c482SIngo Molnar#define PERF_RECORD_MISC_OVERFLOW (1 << 2) 34386470930SIngo Molnar 34486470930SIngo Molnarstruct perf_event_header { 34586470930SIngo Molnar __u32 type; 34686470930SIngo Molnar __u16 misc; 34786470930SIngo Molnar __u16 size; 34886470930SIngo Molnar}; 34986470930SIngo Molnar 35086470930SIngo Molnarenum perf_event_type { 35186470930SIngo Molnar 35286470930SIngo Molnar /* 35386470930SIngo Molnar * The MMAP events record the PROT_EXEC mappings so that we can 35486470930SIngo Molnar * correlate userspace IPs to code. They have the following structure: 35586470930SIngo Molnar * 35686470930SIngo Molnar * struct { 35786470930SIngo Molnar * struct perf_event_header header; 35886470930SIngo Molnar * 35986470930SIngo Molnar * u32 pid, tid; 36086470930SIngo Molnar * u64 addr; 36186470930SIngo Molnar * u64 len; 36286470930SIngo Molnar * u64 pgoff; 36386470930SIngo Molnar * char filename[]; 36486470930SIngo Molnar * }; 36586470930SIngo Molnar */ 366cdd6c482SIngo Molnar PERF_RECORD_MMAP = 1, 367cdd6c482SIngo Molnar PERF_RECORD_MUNMAP = 2, 36886470930SIngo Molnar 36986470930SIngo Molnar /* 37086470930SIngo Molnar * struct { 37186470930SIngo Molnar * struct perf_event_header header; 37286470930SIngo Molnar * 37386470930SIngo Molnar * u32 pid, tid; 37486470930SIngo Molnar * char comm[]; 37586470930SIngo Molnar * }; 37686470930SIngo Molnar */ 377cdd6c482SIngo Molnar PERF_RECORD_COMM = 3, 37886470930SIngo Molnar 37986470930SIngo Molnar /* 380cdd6c482SIngo Molnar * When header.misc & PERF_RECORD_MISC_OVERFLOW the event_type field 38186470930SIngo Molnar * will be PERF_RECORD_* 38286470930SIngo Molnar * 38386470930SIngo Molnar * struct { 38486470930SIngo Molnar * struct perf_event_header header; 38586470930SIngo Molnar * 38686470930SIngo Molnar * { u64 ip; } && PERF_RECORD_IP 38786470930SIngo Molnar * { u32 pid, tid; } && PERF_RECORD_TID 38886470930SIngo Molnar * { u64 time; } && PERF_RECORD_TIME 38986470930SIngo Molnar * { u64 addr; } && PERF_RECORD_ADDR 39086470930SIngo Molnar * 39186470930SIngo Molnar * { u64 nr; 39286470930SIngo Molnar * { u64 event, val; } cnt[nr]; } && PERF_RECORD_GROUP 39386470930SIngo Molnar * 39486470930SIngo Molnar * { u16 nr, 39586470930SIngo Molnar * hv, 39686470930SIngo Molnar * kernel, 39786470930SIngo Molnar * user; 39886470930SIngo Molnar * u64 ips[nr]; } && PERF_RECORD_CALLCHAIN 39986470930SIngo Molnar * }; 40086470930SIngo Molnar */ 40186470930SIngo Molnar}; 40286470930SIngo Molnar 40386470930SIngo MolnarNOTE: PERF_RECORD_CALLCHAIN is arch specific and currently only implemented 40486470930SIngo Molnar on x86. 40586470930SIngo Molnar 40686470930SIngo MolnarNotification of new events is possible through poll()/select()/epoll() and 40786470930SIngo Molnarfcntl() managing signals. 40886470930SIngo Molnar 40986470930SIngo MolnarNormally a notification is generated for every page filled, however one can 4100b413e44STim Blechmannadditionally set perf_event_attr.wakeup_events to generate one every 41186470930SIngo Molnarso many counter overflow events. 41286470930SIngo Molnar 41386470930SIngo MolnarFuture work will include a splice() interface to the ring-buffer. 41486470930SIngo Molnar 41586470930SIngo Molnar 41686470930SIngo MolnarCounters can be enabled and disabled in two ways: via ioctl and via 41786470930SIngo Molnarprctl. When a counter is disabled, it doesn't count or generate 41886470930SIngo Molnarevents but does continue to exist and maintain its count value. 41986470930SIngo Molnar 420a59e64a1SNamhyung KimAn individual counter can be enabled with 42186470930SIngo Molnar 422a59e64a1SNamhyung Kim ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); 42386470930SIngo Molnar 42486470930SIngo Molnaror disabled with 42586470930SIngo Molnar 426a59e64a1SNamhyung Kim ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); 42786470930SIngo Molnar 428a59e64a1SNamhyung KimFor a counter group, pass PERF_IOC_FLAG_GROUP as the third argument. 42986470930SIngo MolnarEnabling or disabling the leader of a group enables or disables the 43086470930SIngo Molnarwhole group; that is, while the group leader is disabled, none of the 43186470930SIngo Molnarcounters in the group will count. Enabling or disabling a member of a 43286470930SIngo Molnargroup other than the leader only affects that counter - disabling an 43386470930SIngo Molnarnon-leader stops that counter from counting but doesn't affect any 43486470930SIngo Molnarother counter. 43586470930SIngo Molnar 43686470930SIngo MolnarAdditionally, non-inherited overflow counters can use 43786470930SIngo Molnar 438cdd6c482SIngo Molnar ioctl(fd, PERF_EVENT_IOC_REFRESH, nr); 43986470930SIngo Molnar 44086470930SIngo Molnarto enable a counter for 'nr' events, after which it gets disabled again. 44186470930SIngo Molnar 44286470930SIngo MolnarA process can enable or disable all the counter groups that are 44386470930SIngo Molnarattached to it, using prctl: 44486470930SIngo Molnar 445cdd6c482SIngo Molnar prctl(PR_TASK_PERF_EVENTS_ENABLE); 44686470930SIngo Molnar 447cdd6c482SIngo Molnar prctl(PR_TASK_PERF_EVENTS_DISABLE); 44886470930SIngo Molnar 44986470930SIngo MolnarThis applies to all counters on the current process, whether created 45086470930SIngo Molnarby this process or by another, and doesn't affect any counters that 45186470930SIngo Molnarthis process has created on other processes. It only enables or 45286470930SIngo Molnardisables the group leaders, not any other members in the groups. 45386470930SIngo Molnar 454018df72dSMike Frysinger 455018df72dSMike FrysingerArch requirements 456018df72dSMike Frysinger----------------- 457018df72dSMike Frysinger 458018df72dSMike FrysingerIf your architecture does not have hardware performance metrics, you can 459018df72dSMike Frysingerstill use the generic software counters based on hrtimers for sampling. 460018df72dSMike Frysinger 461cdd6c482SIngo MolnarSo to start with, in order to add HAVE_PERF_EVENTS to your Kconfig, you 462018df72dSMike Frysingerwill need at least this: 463cdd6c482SIngo Molnar - asm/perf_event.h - a basic stub will suffice at first 464018df72dSMike Frysinger - support for atomic64 types (and associated helper functions) 465018df72dSMike Frysinger 466018df72dSMike FrysingerIf your architecture does have hardware capabilities, you can override the 467cdd6c482SIngo Molnarweak stub hw_perf_event_init() to register hardware counters. 468906010b2SPeter Zijlstra 469906010b2SPeter ZijlstraArchitectures that have d-cache aliassing issues, such as Sparc and ARM, 470906010b2SPeter Zijlstrashould select PERF_USE_VMALLOC in order to avoid these for perf mmap(). 471