1465f27a3SJiri Olsaperf-c2c(1) 2465f27a3SJiri Olsa=========== 3465f27a3SJiri Olsa 4465f27a3SJiri OlsaNAME 5465f27a3SJiri Olsa---- 6465f27a3SJiri Olsaperf-c2c - Shared Data C2C/HITM Analyzer. 7465f27a3SJiri Olsa 8465f27a3SJiri OlsaSYNOPSIS 9465f27a3SJiri Olsa-------- 10465f27a3SJiri Olsa[verse] 11465f27a3SJiri Olsa'perf c2c record' [<options>] <command> 12465f27a3SJiri Olsa'perf c2c record' [<options>] -- [<record command options>] <command> 13465f27a3SJiri Olsa'perf c2c report' [<options>] 14465f27a3SJiri Olsa 15465f27a3SJiri OlsaDESCRIPTION 16465f27a3SJiri Olsa----------- 17465f27a3SJiri OlsaC2C stands for Cache To Cache. 18465f27a3SJiri Olsa 19465f27a3SJiri OlsaThe perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows 20465f27a3SJiri Olsayou to track down the cacheline contentions. 21465f27a3SJiri Olsa 22465f27a3SJiri OlsaThe tool is based on x86's load latency and precise store facility events 23465f27a3SJiri Olsaprovided by Intel CPUs. These events provide: 24465f27a3SJiri Olsa - memory address of the access 25465f27a3SJiri Olsa - type of the access (load and store details) 26465f27a3SJiri Olsa - latency (in cycles) of the load access 27465f27a3SJiri Olsa 28465f27a3SJiri OlsaThe c2c tool provide means to record this data and report back access details 29465f27a3SJiri Olsafor cachelines with highest contention - highest number of HITM accesses. 30465f27a3SJiri Olsa 31465f27a3SJiri OlsaThe basic workflow with this tool follows the standard record/report phase. 32465f27a3SJiri OlsaUser uses the record command to record events data and report command to 33465f27a3SJiri Olsadisplay it. 34465f27a3SJiri Olsa 35465f27a3SJiri Olsa 36465f27a3SJiri OlsaRECORD OPTIONS 37465f27a3SJiri Olsa-------------- 38465f27a3SJiri Olsa-e:: 39465f27a3SJiri Olsa--event=:: 40465f27a3SJiri Olsa Select the PMU event. Use 'perf mem record -e list' 41465f27a3SJiri Olsa to list available events. 42465f27a3SJiri Olsa 43465f27a3SJiri Olsa-v:: 44465f27a3SJiri Olsa--verbose:: 45465f27a3SJiri Olsa Be more verbose (show counter open errors, etc). 46465f27a3SJiri Olsa 47465f27a3SJiri Olsa-l:: 48465f27a3SJiri Olsa--ldlat:: 49465f27a3SJiri Olsa Configure mem-loads latency. 50465f27a3SJiri Olsa 51465f27a3SJiri Olsa-k:: 52465f27a3SJiri Olsa--all-kernel:: 53465f27a3SJiri Olsa Configure all used events to run in kernel space. 54465f27a3SJiri Olsa 55465f27a3SJiri Olsa-u:: 56465f27a3SJiri Olsa--all-user:: 57465f27a3SJiri Olsa Configure all used events to run in user space. 58465f27a3SJiri Olsa 59465f27a3SJiri OlsaREPORT OPTIONS 60465f27a3SJiri Olsa-------------- 61465f27a3SJiri Olsa-k:: 62465f27a3SJiri Olsa--vmlinux=<file>:: 63465f27a3SJiri Olsa vmlinux pathname 64465f27a3SJiri Olsa 65465f27a3SJiri Olsa-v:: 66465f27a3SJiri Olsa--verbose:: 67465f27a3SJiri Olsa Be more verbose (show counter open errors, etc). 68465f27a3SJiri Olsa 69465f27a3SJiri Olsa-i:: 70465f27a3SJiri Olsa--input:: 71465f27a3SJiri Olsa Specify the input file to process. 72465f27a3SJiri Olsa 73465f27a3SJiri Olsa-N:: 74465f27a3SJiri Olsa--node-info:: 75465f27a3SJiri Olsa Show extra node info in report (see NODE INFO section) 76465f27a3SJiri Olsa 77465f27a3SJiri Olsa-c:: 78465f27a3SJiri Olsa--coalesce:: 79465f27a3SJiri Olsa Specify sorintg fields for single cacheline display. 80465f27a3SJiri Olsa Following fields are available: tid,pid,iaddr,dso 81465f27a3SJiri Olsa (see COALESCE) 82465f27a3SJiri Olsa 83465f27a3SJiri Olsa-g:: 84465f27a3SJiri Olsa--call-graph:: 85465f27a3SJiri Olsa Setup callchains parameters. 86465f27a3SJiri Olsa Please refer to perf-report man page for details. 87465f27a3SJiri Olsa 88465f27a3SJiri Olsa--stdio:: 89465f27a3SJiri Olsa Force the stdio output (see STDIO OUTPUT) 90465f27a3SJiri Olsa 91465f27a3SJiri Olsa--stats:: 92465f27a3SJiri Olsa Display only statistic tables and force stdio mode. 93465f27a3SJiri Olsa 94465f27a3SJiri Olsa--full-symbols:: 95465f27a3SJiri Olsa Display full length of symbols. 96465f27a3SJiri Olsa 9718f278d2SJiri Olsa--no-source:: 9818f278d2SJiri Olsa Do not display Source:Line column. 9918f278d2SJiri Olsa 100465f27a3SJiri OlsaC2C RECORD 101465f27a3SJiri Olsa---------- 102465f27a3SJiri OlsaThe perf c2c record command setup options related to HITM cacheline analysis 103465f27a3SJiri Olsaand calls standard perf record command. 104465f27a3SJiri Olsa 105465f27a3SJiri OlsaFollowing perf record options are configured by default: 106465f27a3SJiri Olsa(check perf record man page for details) 107465f27a3SJiri Olsa 108465f27a3SJiri Olsa -W,-d,--sample-cpu 109465f27a3SJiri Olsa 110465f27a3SJiri OlsaUnless specified otherwise with '-e' option, following events are monitored by 111465f27a3SJiri Olsadefault: 112465f27a3SJiri Olsa 113465f27a3SJiri Olsa cpu/mem-loads,ldlat=30/P 114465f27a3SJiri Olsa cpu/mem-stores/P 115465f27a3SJiri Olsa 116465f27a3SJiri OlsaUser can pass any 'perf record' option behind '--' mark, like (to enable 117465f27a3SJiri Olsacallchains and system wide monitoring): 118465f27a3SJiri Olsa 119465f27a3SJiri Olsa $ perf c2c record -- -g -a 120465f27a3SJiri Olsa 121465f27a3SJiri OlsaPlease check RECORD OPTIONS section for specific c2c record options. 122465f27a3SJiri Olsa 123465f27a3SJiri OlsaC2C REPORT 124465f27a3SJiri Olsa---------- 125465f27a3SJiri OlsaThe perf c2c report command displays shared data analysis. It comes in two 126465f27a3SJiri Olsadisplay modes: stdio and tui (default). 127465f27a3SJiri Olsa 128465f27a3SJiri OlsaThe report command workflow is following: 129465f27a3SJiri Olsa - sort all the data based on the cacheline address 130465f27a3SJiri Olsa - store access details for each cacheline 131465f27a3SJiri Olsa - sort all cachelines based on user settings 132465f27a3SJiri Olsa - display data 133465f27a3SJiri Olsa 134465f27a3SJiri OlsaIn general perf report output consist of 2 basic views: 135465f27a3SJiri Olsa 1) most expensive cachelines list 136465f27a3SJiri Olsa 2) offsets details for each cacheline 137465f27a3SJiri Olsa 138465f27a3SJiri OlsaFor each cacheline in the 1) list we display following data: 139465f27a3SJiri Olsa(Both stdio and TUI modes follow the same fields output) 140465f27a3SJiri Olsa 141465f27a3SJiri Olsa Index 142465f27a3SJiri Olsa - zero based index to identify the cacheline 143465f27a3SJiri Olsa 144465f27a3SJiri Olsa Cacheline 145465f27a3SJiri Olsa - cacheline address (hex number) 146465f27a3SJiri Olsa 147465f27a3SJiri Olsa Total records 148465f27a3SJiri Olsa - sum of all cachelines accesses 149465f27a3SJiri Olsa 150465f27a3SJiri Olsa Rmt/Lcl Hitm 151465f27a3SJiri Olsa - cacheline percentage of all Remote/Local HITM accesses 152465f27a3SJiri Olsa 153465f27a3SJiri Olsa LLC Load Hitm - Total, Lcl, Rmt 154465f27a3SJiri Olsa - count of Total/Local/Remote load HITMs 155465f27a3SJiri Olsa 156465f27a3SJiri Olsa Store Reference - Total, L1Hit, L1Miss 157465f27a3SJiri Olsa Total - all store accesses 158465f27a3SJiri Olsa L1Hit - store accesses that hit L1 159465f27a3SJiri Olsa L1Hit - store accesses that missed L1 160465f27a3SJiri Olsa 161465f27a3SJiri Olsa Load Dram 162465f27a3SJiri Olsa - count of local and remote DRAM accesses 163465f27a3SJiri Olsa 164465f27a3SJiri Olsa LLC Ld Miss 165465f27a3SJiri Olsa - count of all accesses that missed LLC 166465f27a3SJiri Olsa 167465f27a3SJiri Olsa Total Loads 168465f27a3SJiri Olsa - sum of all load accesses 169465f27a3SJiri Olsa 170465f27a3SJiri Olsa Core Load Hit - FB, L1, L2 171465f27a3SJiri Olsa - count of load hits in FB (Fill Buffer), L1 and L2 cache 172465f27a3SJiri Olsa 173465f27a3SJiri Olsa LLC Load Hit - Llc, Rmt 174465f27a3SJiri Olsa - count of LLC and Remote load hits 175465f27a3SJiri Olsa 176465f27a3SJiri OlsaFor each offset in the 2) list we display following data: 177465f27a3SJiri Olsa 178465f27a3SJiri Olsa HITM - Rmt, Lcl 179465f27a3SJiri Olsa - % of Remote/Local HITM accesses for given offset within cacheline 180465f27a3SJiri Olsa 181465f27a3SJiri Olsa Store Refs - L1 Hit, L1 Miss 182465f27a3SJiri Olsa - % of store accesses that hit/missed L1 for given offset within cacheline 183465f27a3SJiri Olsa 184465f27a3SJiri Olsa Data address - Offset 185465f27a3SJiri Olsa - offset address 186465f27a3SJiri Olsa 187465f27a3SJiri Olsa Pid 188465f27a3SJiri Olsa - pid of the process responsible for the accesses 189465f27a3SJiri Olsa 190465f27a3SJiri Olsa Tid 191465f27a3SJiri Olsa - tid of the process responsible for the accesses 192465f27a3SJiri Olsa 193465f27a3SJiri Olsa Code address 194465f27a3SJiri Olsa - code address responsible for the accesses 195465f27a3SJiri Olsa 196465f27a3SJiri Olsa cycles - rmt hitm, lcl hitm, load 197465f27a3SJiri Olsa - sum of cycles for given accesses - Remote/Local HITM and generic load 198465f27a3SJiri Olsa 199465f27a3SJiri Olsa cpu cnt 200465f27a3SJiri Olsa - number of cpus that participated on the access 201465f27a3SJiri Olsa 202465f27a3SJiri Olsa Symbol 203465f27a3SJiri Olsa - code symbol related to the 'Code address' value 204465f27a3SJiri Olsa 205465f27a3SJiri Olsa Shared Object 206465f27a3SJiri Olsa - shared object name related to the 'Code address' value 207465f27a3SJiri Olsa 208465f27a3SJiri Olsa Source:Line 209465f27a3SJiri Olsa - source information related to the 'Code address' value 210465f27a3SJiri Olsa 211465f27a3SJiri Olsa Node 212465f27a3SJiri Olsa - nodes participating on the access (see NODE INFO section) 213465f27a3SJiri Olsa 214465f27a3SJiri OlsaNODE INFO 215465f27a3SJiri Olsa--------- 216465f27a3SJiri OlsaThe 'Node' field displays nodes that accesses given cacheline 217465f27a3SJiri Olsaoffset. Its output comes in 3 flavors: 218465f27a3SJiri Olsa - node IDs separated by ',' 219465f27a3SJiri Olsa - node IDs with stats for each ID, in following format: 220465f27a3SJiri Olsa Node{cpus %hitms %stores} 221465f27a3SJiri Olsa - node IDs with list of affected CPUs in following format: 222465f27a3SJiri Olsa Node{cpu list} 223465f27a3SJiri Olsa 224465f27a3SJiri OlsaUser can switch between above flavors with -N option or 225465f27a3SJiri Olsause 'n' key to interactively switch in TUI mode. 226465f27a3SJiri Olsa 227465f27a3SJiri OlsaCOALESCE 228465f27a3SJiri Olsa-------- 229465f27a3SJiri OlsaUser can specify how to sort offsets for cacheline. 230465f27a3SJiri Olsa 231465f27a3SJiri OlsaFollowing fields are available and governs the final 232465f27a3SJiri Olsaoutput fields set for caheline offsets output: 233465f27a3SJiri Olsa 234465f27a3SJiri Olsa tid - coalesced by process TIDs 235465f27a3SJiri Olsa pid - coalesced by process PIDs 236465f27a3SJiri Olsa iaddr - coalesced by code address, following fields are displayed: 237465f27a3SJiri Olsa Code address, Code symbol, Shared Object, Source line 238465f27a3SJiri Olsa dso - coalesced by shared object 239465f27a3SJiri Olsa 240465f27a3SJiri OlsaBy default the coalescing is setup with 'pid,tid,iaddr'. 241465f27a3SJiri Olsa 242465f27a3SJiri OlsaSTDIO OUTPUT 243465f27a3SJiri Olsa------------ 244465f27a3SJiri OlsaThe stdio output displays data on standard output. 245465f27a3SJiri Olsa 246465f27a3SJiri OlsaFollowing tables are displayed: 247465f27a3SJiri Olsa Trace Event Information 248465f27a3SJiri Olsa - overall statistics of memory accesses 249465f27a3SJiri Olsa 250465f27a3SJiri Olsa Global Shared Cache Line Event Information 251465f27a3SJiri Olsa - overall statistics on shared cachelines 252465f27a3SJiri Olsa 253465f27a3SJiri Olsa Shared Data Cache Line Table 254465f27a3SJiri Olsa - list of most expensive cachelines 255465f27a3SJiri Olsa 256465f27a3SJiri Olsa Shared Cache Line Distribution Pareto 257465f27a3SJiri Olsa - list of all accessed offsets for each cacheline 258465f27a3SJiri Olsa 259465f27a3SJiri OlsaTUI OUTPUT 260465f27a3SJiri Olsa---------- 261465f27a3SJiri OlsaThe TUI output provides interactive interface to navigate 262465f27a3SJiri Olsathrough cachelines list and to display offset details. 263465f27a3SJiri Olsa 264465f27a3SJiri OlsaFor details please refer to the help window by pressing '?' key. 265465f27a3SJiri Olsa 266465f27a3SJiri OlsaCREDITS 267465f27a3SJiri Olsa------- 268465f27a3SJiri OlsaAlthough Don Zickus, Dick Fowles and Joe Mario worked together 269465f27a3SJiri Olsato get this implemented, we got lots of early help from Arnaldo 270465f27a3SJiri OlsaCarvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen. 271465f27a3SJiri Olsa 272465f27a3SJiri OlsaC2C BLOG 273465f27a3SJiri Olsa-------- 274465f27a3SJiri OlsaCheck Joe's blog on c2c tool for detailed use case explanation: 275465f27a3SJiri Olsa https://joemario.github.io/blog/2016/09/01/c2c-blog/ 276465f27a3SJiri Olsa 277465f27a3SJiri OlsaSEE ALSO 278465f27a3SJiri Olsa-------- 279465f27a3SJiri Olsalinkperf:perf-record[1], linkperf:perf-mem[1] 280