1465f27a3SJiri Olsaperf-c2c(1) 2465f27a3SJiri Olsa=========== 3465f27a3SJiri Olsa 4465f27a3SJiri OlsaNAME 5465f27a3SJiri Olsa---- 6465f27a3SJiri Olsaperf-c2c - Shared Data C2C/HITM Analyzer. 7465f27a3SJiri Olsa 8465f27a3SJiri OlsaSYNOPSIS 9465f27a3SJiri Olsa-------- 10465f27a3SJiri Olsa[verse] 11465f27a3SJiri Olsa'perf c2c record' [<options>] <command> 12465f27a3SJiri Olsa'perf c2c record' [<options>] -- [<record command options>] <command> 13465f27a3SJiri Olsa'perf c2c report' [<options>] 14465f27a3SJiri Olsa 15465f27a3SJiri OlsaDESCRIPTION 16465f27a3SJiri Olsa----------- 17465f27a3SJiri OlsaC2C stands for Cache To Cache. 18465f27a3SJiri Olsa 19465f27a3SJiri OlsaThe perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows 20465f27a3SJiri Olsayou to track down the cacheline contentions. 21465f27a3SJiri Olsa 22465f27a3SJiri OlsaThe tool is based on x86's load latency and precise store facility events 23465f27a3SJiri Olsaprovided by Intel CPUs. These events provide: 24465f27a3SJiri Olsa - memory address of the access 25465f27a3SJiri Olsa - type of the access (load and store details) 26465f27a3SJiri Olsa - latency (in cycles) of the load access 27465f27a3SJiri Olsa 28465f27a3SJiri OlsaThe c2c tool provide means to record this data and report back access details 29465f27a3SJiri Olsafor cachelines with highest contention - highest number of HITM accesses. 30465f27a3SJiri Olsa 31465f27a3SJiri OlsaThe basic workflow with this tool follows the standard record/report phase. 32465f27a3SJiri OlsaUser uses the record command to record events data and report command to 33465f27a3SJiri Olsadisplay it. 34465f27a3SJiri Olsa 35465f27a3SJiri Olsa 36465f27a3SJiri OlsaRECORD OPTIONS 37465f27a3SJiri Olsa-------------- 38465f27a3SJiri Olsa-e:: 39465f27a3SJiri Olsa--event=:: 40465f27a3SJiri Olsa Select the PMU event. Use 'perf mem record -e list' 41465f27a3SJiri Olsa to list available events. 42465f27a3SJiri Olsa 43465f27a3SJiri Olsa-v:: 44465f27a3SJiri Olsa--verbose:: 45465f27a3SJiri Olsa Be more verbose (show counter open errors, etc). 46465f27a3SJiri Olsa 47465f27a3SJiri Olsa-l:: 48465f27a3SJiri Olsa--ldlat:: 49465f27a3SJiri Olsa Configure mem-loads latency. 50465f27a3SJiri Olsa 51465f27a3SJiri Olsa-k:: 52465f27a3SJiri Olsa--all-kernel:: 53465f27a3SJiri Olsa Configure all used events to run in kernel space. 54465f27a3SJiri Olsa 55465f27a3SJiri Olsa-u:: 56465f27a3SJiri Olsa--all-user:: 57465f27a3SJiri Olsa Configure all used events to run in user space. 58465f27a3SJiri Olsa 59465f27a3SJiri OlsaREPORT OPTIONS 60465f27a3SJiri Olsa-------------- 61465f27a3SJiri Olsa-k:: 62465f27a3SJiri Olsa--vmlinux=<file>:: 63465f27a3SJiri Olsa vmlinux pathname 64465f27a3SJiri Olsa 65465f27a3SJiri Olsa-v:: 66465f27a3SJiri Olsa--verbose:: 67465f27a3SJiri Olsa Be more verbose (show counter open errors, etc). 68465f27a3SJiri Olsa 69465f27a3SJiri Olsa-i:: 70465f27a3SJiri Olsa--input:: 71465f27a3SJiri Olsa Specify the input file to process. 72465f27a3SJiri Olsa 73465f27a3SJiri Olsa-N:: 74465f27a3SJiri Olsa--node-info:: 75465f27a3SJiri Olsa Show extra node info in report (see NODE INFO section) 76465f27a3SJiri Olsa 77465f27a3SJiri Olsa-c:: 78465f27a3SJiri Olsa--coalesce:: 79465f27a3SJiri Olsa Specify sorintg fields for single cacheline display. 80465f27a3SJiri Olsa Following fields are available: tid,pid,iaddr,dso 81465f27a3SJiri Olsa (see COALESCE) 82465f27a3SJiri Olsa 83465f27a3SJiri Olsa-g:: 84465f27a3SJiri Olsa--call-graph:: 85465f27a3SJiri Olsa Setup callchains parameters. 86465f27a3SJiri Olsa Please refer to perf-report man page for details. 87465f27a3SJiri Olsa 88465f27a3SJiri Olsa--stdio:: 89465f27a3SJiri Olsa Force the stdio output (see STDIO OUTPUT) 90465f27a3SJiri Olsa 91465f27a3SJiri Olsa--stats:: 92465f27a3SJiri Olsa Display only statistic tables and force stdio mode. 93465f27a3SJiri Olsa 94465f27a3SJiri Olsa--full-symbols:: 95465f27a3SJiri Olsa Display full length of symbols. 96465f27a3SJiri Olsa 9718f278d2SJiri Olsa--no-source:: 9818f278d2SJiri Olsa Do not display Source:Line column. 9918f278d2SJiri Olsa 100af09b2d3SJiri Olsa--show-all:: 101af09b2d3SJiri Olsa Show all captured HITM lines, with no regard to HITM % 0.0005 limit. 102af09b2d3SJiri Olsa 103465f27a3SJiri OlsaC2C RECORD 104465f27a3SJiri Olsa---------- 105465f27a3SJiri OlsaThe perf c2c record command setup options related to HITM cacheline analysis 106465f27a3SJiri Olsaand calls standard perf record command. 107465f27a3SJiri Olsa 108465f27a3SJiri OlsaFollowing perf record options are configured by default: 109465f27a3SJiri Olsa(check perf record man page for details) 110465f27a3SJiri Olsa 111465f27a3SJiri Olsa -W,-d,--sample-cpu 112465f27a3SJiri Olsa 113465f27a3SJiri OlsaUnless specified otherwise with '-e' option, following events are monitored by 114465f27a3SJiri Olsadefault: 115465f27a3SJiri Olsa 116465f27a3SJiri Olsa cpu/mem-loads,ldlat=30/P 117465f27a3SJiri Olsa cpu/mem-stores/P 118465f27a3SJiri Olsa 119465f27a3SJiri OlsaUser can pass any 'perf record' option behind '--' mark, like (to enable 120465f27a3SJiri Olsacallchains and system wide monitoring): 121465f27a3SJiri Olsa 122465f27a3SJiri Olsa $ perf c2c record -- -g -a 123465f27a3SJiri Olsa 124465f27a3SJiri OlsaPlease check RECORD OPTIONS section for specific c2c record options. 125465f27a3SJiri Olsa 126465f27a3SJiri OlsaC2C REPORT 127465f27a3SJiri Olsa---------- 128465f27a3SJiri OlsaThe perf c2c report command displays shared data analysis. It comes in two 129465f27a3SJiri Olsadisplay modes: stdio and tui (default). 130465f27a3SJiri Olsa 131465f27a3SJiri OlsaThe report command workflow is following: 132465f27a3SJiri Olsa - sort all the data based on the cacheline address 133465f27a3SJiri Olsa - store access details for each cacheline 134465f27a3SJiri Olsa - sort all cachelines based on user settings 135465f27a3SJiri Olsa - display data 136465f27a3SJiri Olsa 137465f27a3SJiri OlsaIn general perf report output consist of 2 basic views: 138465f27a3SJiri Olsa 1) most expensive cachelines list 139465f27a3SJiri Olsa 2) offsets details for each cacheline 140465f27a3SJiri Olsa 141465f27a3SJiri OlsaFor each cacheline in the 1) list we display following data: 142465f27a3SJiri Olsa(Both stdio and TUI modes follow the same fields output) 143465f27a3SJiri Olsa 144465f27a3SJiri Olsa Index 145465f27a3SJiri Olsa - zero based index to identify the cacheline 146465f27a3SJiri Olsa 147465f27a3SJiri Olsa Cacheline 148465f27a3SJiri Olsa - cacheline address (hex number) 149465f27a3SJiri Olsa 150465f27a3SJiri Olsa Total records 151465f27a3SJiri Olsa - sum of all cachelines accesses 152465f27a3SJiri Olsa 153465f27a3SJiri Olsa Rmt/Lcl Hitm 154465f27a3SJiri Olsa - cacheline percentage of all Remote/Local HITM accesses 155465f27a3SJiri Olsa 156465f27a3SJiri Olsa LLC Load Hitm - Total, Lcl, Rmt 157465f27a3SJiri Olsa - count of Total/Local/Remote load HITMs 158465f27a3SJiri Olsa 159465f27a3SJiri Olsa Store Reference - Total, L1Hit, L1Miss 160465f27a3SJiri Olsa Total - all store accesses 161465f27a3SJiri Olsa L1Hit - store accesses that hit L1 162465f27a3SJiri Olsa L1Hit - store accesses that missed L1 163465f27a3SJiri Olsa 164465f27a3SJiri Olsa Load Dram 165465f27a3SJiri Olsa - count of local and remote DRAM accesses 166465f27a3SJiri Olsa 167465f27a3SJiri Olsa LLC Ld Miss 168465f27a3SJiri Olsa - count of all accesses that missed LLC 169465f27a3SJiri Olsa 170465f27a3SJiri Olsa Total Loads 171465f27a3SJiri Olsa - sum of all load accesses 172465f27a3SJiri Olsa 173465f27a3SJiri Olsa Core Load Hit - FB, L1, L2 174465f27a3SJiri Olsa - count of load hits in FB (Fill Buffer), L1 and L2 cache 175465f27a3SJiri Olsa 176465f27a3SJiri Olsa LLC Load Hit - Llc, Rmt 177465f27a3SJiri Olsa - count of LLC and Remote load hits 178465f27a3SJiri Olsa 179465f27a3SJiri OlsaFor each offset in the 2) list we display following data: 180465f27a3SJiri Olsa 181465f27a3SJiri Olsa HITM - Rmt, Lcl 182465f27a3SJiri Olsa - % of Remote/Local HITM accesses for given offset within cacheline 183465f27a3SJiri Olsa 184465f27a3SJiri Olsa Store Refs - L1 Hit, L1 Miss 185465f27a3SJiri Olsa - % of store accesses that hit/missed L1 for given offset within cacheline 186465f27a3SJiri Olsa 187465f27a3SJiri Olsa Data address - Offset 188465f27a3SJiri Olsa - offset address 189465f27a3SJiri Olsa 190465f27a3SJiri Olsa Pid 191465f27a3SJiri Olsa - pid of the process responsible for the accesses 192465f27a3SJiri Olsa 193465f27a3SJiri Olsa Tid 194465f27a3SJiri Olsa - tid of the process responsible for the accesses 195465f27a3SJiri Olsa 196465f27a3SJiri Olsa Code address 197465f27a3SJiri Olsa - code address responsible for the accesses 198465f27a3SJiri Olsa 199465f27a3SJiri Olsa cycles - rmt hitm, lcl hitm, load 200465f27a3SJiri Olsa - sum of cycles for given accesses - Remote/Local HITM and generic load 201465f27a3SJiri Olsa 202465f27a3SJiri Olsa cpu cnt 203465f27a3SJiri Olsa - number of cpus that participated on the access 204465f27a3SJiri Olsa 205465f27a3SJiri Olsa Symbol 206465f27a3SJiri Olsa - code symbol related to the 'Code address' value 207465f27a3SJiri Olsa 208465f27a3SJiri Olsa Shared Object 209465f27a3SJiri Olsa - shared object name related to the 'Code address' value 210465f27a3SJiri Olsa 211465f27a3SJiri Olsa Source:Line 212465f27a3SJiri Olsa - source information related to the 'Code address' value 213465f27a3SJiri Olsa 214465f27a3SJiri Olsa Node 215465f27a3SJiri Olsa - nodes participating on the access (see NODE INFO section) 216465f27a3SJiri Olsa 217465f27a3SJiri OlsaNODE INFO 218465f27a3SJiri Olsa--------- 219465f27a3SJiri OlsaThe 'Node' field displays nodes that accesses given cacheline 220465f27a3SJiri Olsaoffset. Its output comes in 3 flavors: 221465f27a3SJiri Olsa - node IDs separated by ',' 222465f27a3SJiri Olsa - node IDs with stats for each ID, in following format: 223465f27a3SJiri Olsa Node{cpus %hitms %stores} 224465f27a3SJiri Olsa - node IDs with list of affected CPUs in following format: 225465f27a3SJiri Olsa Node{cpu list} 226465f27a3SJiri Olsa 227465f27a3SJiri OlsaUser can switch between above flavors with -N option or 228465f27a3SJiri Olsause 'n' key to interactively switch in TUI mode. 229465f27a3SJiri Olsa 230465f27a3SJiri OlsaCOALESCE 231465f27a3SJiri Olsa-------- 232465f27a3SJiri OlsaUser can specify how to sort offsets for cacheline. 233465f27a3SJiri Olsa 234465f27a3SJiri OlsaFollowing fields are available and governs the final 235465f27a3SJiri Olsaoutput fields set for caheline offsets output: 236465f27a3SJiri Olsa 237465f27a3SJiri Olsa tid - coalesced by process TIDs 238465f27a3SJiri Olsa pid - coalesced by process PIDs 239465f27a3SJiri Olsa iaddr - coalesced by code address, following fields are displayed: 240465f27a3SJiri Olsa Code address, Code symbol, Shared Object, Source line 241465f27a3SJiri Olsa dso - coalesced by shared object 242465f27a3SJiri Olsa 243465f27a3SJiri OlsaBy default the coalescing is setup with 'pid,tid,iaddr'. 244465f27a3SJiri Olsa 245465f27a3SJiri OlsaSTDIO OUTPUT 246465f27a3SJiri Olsa------------ 247465f27a3SJiri OlsaThe stdio output displays data on standard output. 248465f27a3SJiri Olsa 249465f27a3SJiri OlsaFollowing tables are displayed: 250465f27a3SJiri Olsa Trace Event Information 251465f27a3SJiri Olsa - overall statistics of memory accesses 252465f27a3SJiri Olsa 253465f27a3SJiri Olsa Global Shared Cache Line Event Information 254465f27a3SJiri Olsa - overall statistics on shared cachelines 255465f27a3SJiri Olsa 256465f27a3SJiri Olsa Shared Data Cache Line Table 257465f27a3SJiri Olsa - list of most expensive cachelines 258465f27a3SJiri Olsa 259465f27a3SJiri Olsa Shared Cache Line Distribution Pareto 260465f27a3SJiri Olsa - list of all accessed offsets for each cacheline 261465f27a3SJiri Olsa 262465f27a3SJiri OlsaTUI OUTPUT 263465f27a3SJiri Olsa---------- 264465f27a3SJiri OlsaThe TUI output provides interactive interface to navigate 265465f27a3SJiri Olsathrough cachelines list and to display offset details. 266465f27a3SJiri Olsa 267465f27a3SJiri OlsaFor details please refer to the help window by pressing '?' key. 268465f27a3SJiri Olsa 269465f27a3SJiri OlsaCREDITS 270465f27a3SJiri Olsa------- 271465f27a3SJiri OlsaAlthough Don Zickus, Dick Fowles and Joe Mario worked together 272465f27a3SJiri Olsato get this implemented, we got lots of early help from Arnaldo 273465f27a3SJiri OlsaCarvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen. 274465f27a3SJiri Olsa 275465f27a3SJiri OlsaC2C BLOG 276465f27a3SJiri Olsa-------- 277465f27a3SJiri OlsaCheck Joe's blog on c2c tool for detailed use case explanation: 278465f27a3SJiri Olsa https://joemario.github.io/blog/2016/09/01/c2c-blog/ 279465f27a3SJiri Olsa 280465f27a3SJiri OlsaSEE ALSO 281465f27a3SJiri Olsa-------- 282465f27a3SJiri Olsalinkperf:perf-record[1], linkperf:perf-mem[1] 283