1465f27a3SJiri Olsaperf-c2c(1)
2465f27a3SJiri Olsa===========
3465f27a3SJiri Olsa
4465f27a3SJiri OlsaNAME
5465f27a3SJiri Olsa----
6465f27a3SJiri Olsaperf-c2c - Shared Data C2C/HITM Analyzer.
7465f27a3SJiri Olsa
8465f27a3SJiri OlsaSYNOPSIS
9465f27a3SJiri Olsa--------
10465f27a3SJiri Olsa[verse]
11465f27a3SJiri Olsa'perf c2c record' [<options>] <command>
12465f27a3SJiri Olsa'perf c2c record' [<options>] -- [<record command options>] <command>
13465f27a3SJiri Olsa'perf c2c report' [<options>]
14465f27a3SJiri Olsa
15465f27a3SJiri OlsaDESCRIPTION
16465f27a3SJiri Olsa-----------
17465f27a3SJiri OlsaC2C stands for Cache To Cache.
18465f27a3SJiri Olsa
19465f27a3SJiri OlsaThe perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
20465f27a3SJiri Olsayou to track down the cacheline contentions.
21465f27a3SJiri Olsa
22465f27a3SJiri OlsaThe tool is based on x86's load latency and precise store facility events
23465f27a3SJiri Olsaprovided by Intel CPUs. These events provide:
24465f27a3SJiri Olsa  - memory address of the access
25465f27a3SJiri Olsa  - type of the access (load and store details)
26465f27a3SJiri Olsa  - latency (in cycles) of the load access
27465f27a3SJiri Olsa
28465f27a3SJiri OlsaThe c2c tool provide means to record this data and report back access details
29465f27a3SJiri Olsafor cachelines with highest contention - highest number of HITM accesses.
30465f27a3SJiri Olsa
31465f27a3SJiri OlsaThe basic workflow with this tool follows the standard record/report phase.
32465f27a3SJiri OlsaUser uses the record command to record events data and report command to
33465f27a3SJiri Olsadisplay it.
34465f27a3SJiri Olsa
35465f27a3SJiri Olsa
36465f27a3SJiri OlsaRECORD OPTIONS
37465f27a3SJiri Olsa--------------
38465f27a3SJiri Olsa-e::
39465f27a3SJiri Olsa--event=::
40465f27a3SJiri Olsa	Select the PMU event. Use 'perf mem record -e list'
41465f27a3SJiri Olsa	to list available events.
42465f27a3SJiri Olsa
43465f27a3SJiri Olsa-v::
44465f27a3SJiri Olsa--verbose::
45465f27a3SJiri Olsa	Be more verbose (show counter open errors, etc).
46465f27a3SJiri Olsa
47465f27a3SJiri Olsa-l::
48465f27a3SJiri Olsa--ldlat::
49465f27a3SJiri Olsa	Configure mem-loads latency.
50465f27a3SJiri Olsa
51465f27a3SJiri Olsa-k::
52465f27a3SJiri Olsa--all-kernel::
53465f27a3SJiri Olsa	Configure all used events to run in kernel space.
54465f27a3SJiri Olsa
55465f27a3SJiri Olsa-u::
56465f27a3SJiri Olsa--all-user::
57465f27a3SJiri Olsa	Configure all used events to run in user space.
58465f27a3SJiri Olsa
59465f27a3SJiri OlsaREPORT OPTIONS
60465f27a3SJiri Olsa--------------
61465f27a3SJiri Olsa-k::
62465f27a3SJiri Olsa--vmlinux=<file>::
63465f27a3SJiri Olsa	vmlinux pathname
64465f27a3SJiri Olsa
65465f27a3SJiri Olsa-v::
66465f27a3SJiri Olsa--verbose::
67465f27a3SJiri Olsa	Be more verbose (show counter open errors, etc).
68465f27a3SJiri Olsa
69465f27a3SJiri Olsa-i::
70465f27a3SJiri Olsa--input::
71465f27a3SJiri Olsa	Specify the input file to process.
72465f27a3SJiri Olsa
73465f27a3SJiri Olsa-N::
74465f27a3SJiri Olsa--node-info::
75465f27a3SJiri Olsa	Show extra node info in report (see NODE INFO section)
76465f27a3SJiri Olsa
77465f27a3SJiri Olsa-c::
78465f27a3SJiri Olsa--coalesce::
79465f27a3SJiri Olsa	Specify sorintg fields for single cacheline display.
80465f27a3SJiri Olsa	Following fields are available: tid,pid,iaddr,dso
81465f27a3SJiri Olsa	(see COALESCE)
82465f27a3SJiri Olsa
83465f27a3SJiri Olsa-g::
84465f27a3SJiri Olsa--call-graph::
85465f27a3SJiri Olsa	Setup callchains parameters.
86465f27a3SJiri Olsa	Please refer to perf-report man page for details.
87465f27a3SJiri Olsa
88465f27a3SJiri Olsa--stdio::
89465f27a3SJiri Olsa	Force the stdio output (see STDIO OUTPUT)
90465f27a3SJiri Olsa
91465f27a3SJiri Olsa--stats::
92465f27a3SJiri Olsa	Display only statistic tables and force stdio mode.
93465f27a3SJiri Olsa
94465f27a3SJiri Olsa--full-symbols::
95465f27a3SJiri Olsa	Display full length of symbols.
96465f27a3SJiri Olsa
9718f278d2SJiri Olsa--no-source::
9818f278d2SJiri Olsa	Do not display Source:Line column.
9918f278d2SJiri Olsa
100af09b2d3SJiri Olsa--show-all::
101af09b2d3SJiri Olsa	Show all captured HITM lines, with no regard to HITM % 0.0005 limit.
102af09b2d3SJiri Olsa
103465f27a3SJiri OlsaC2C RECORD
104465f27a3SJiri Olsa----------
105465f27a3SJiri OlsaThe perf c2c record command setup options related to HITM cacheline analysis
106465f27a3SJiri Olsaand calls standard perf record command.
107465f27a3SJiri Olsa
108465f27a3SJiri OlsaFollowing perf record options are configured by default:
109465f27a3SJiri Olsa(check perf record man page for details)
110465f27a3SJiri Olsa
111465f27a3SJiri Olsa  -W,-d,--sample-cpu
112465f27a3SJiri Olsa
113465f27a3SJiri OlsaUnless specified otherwise with '-e' option, following events are monitored by
114465f27a3SJiri Olsadefault:
115465f27a3SJiri Olsa
116465f27a3SJiri Olsa  cpu/mem-loads,ldlat=30/P
117465f27a3SJiri Olsa  cpu/mem-stores/P
118465f27a3SJiri Olsa
119465f27a3SJiri OlsaUser can pass any 'perf record' option behind '--' mark, like (to enable
120465f27a3SJiri Olsacallchains and system wide monitoring):
121465f27a3SJiri Olsa
122465f27a3SJiri Olsa  $ perf c2c record -- -g -a
123465f27a3SJiri Olsa
124465f27a3SJiri OlsaPlease check RECORD OPTIONS section for specific c2c record options.
125465f27a3SJiri Olsa
126465f27a3SJiri OlsaC2C REPORT
127465f27a3SJiri Olsa----------
128465f27a3SJiri OlsaThe perf c2c report command displays shared data analysis.  It comes in two
129465f27a3SJiri Olsadisplay modes: stdio and tui (default).
130465f27a3SJiri Olsa
131465f27a3SJiri OlsaThe report command workflow is following:
132465f27a3SJiri Olsa  - sort all the data based on the cacheline address
133465f27a3SJiri Olsa  - store access details for each cacheline
134465f27a3SJiri Olsa  - sort all cachelines based on user settings
135465f27a3SJiri Olsa  - display data
136465f27a3SJiri Olsa
137465f27a3SJiri OlsaIn general perf report output consist of 2 basic views:
138465f27a3SJiri Olsa  1) most expensive cachelines list
139465f27a3SJiri Olsa  2) offsets details for each cacheline
140465f27a3SJiri Olsa
141465f27a3SJiri OlsaFor each cacheline in the 1) list we display following data:
142465f27a3SJiri Olsa(Both stdio and TUI modes follow the same fields output)
143465f27a3SJiri Olsa
144465f27a3SJiri Olsa  Index
145465f27a3SJiri Olsa  - zero based index to identify the cacheline
146465f27a3SJiri Olsa
147465f27a3SJiri Olsa  Cacheline
148465f27a3SJiri Olsa  - cacheline address (hex number)
149465f27a3SJiri Olsa
150465f27a3SJiri Olsa  Total records
151465f27a3SJiri Olsa  - sum of all cachelines accesses
152465f27a3SJiri Olsa
153465f27a3SJiri Olsa  Rmt/Lcl Hitm
154465f27a3SJiri Olsa  - cacheline percentage of all Remote/Local HITM accesses
155465f27a3SJiri Olsa
156465f27a3SJiri Olsa  LLC Load Hitm - Total, Lcl, Rmt
157465f27a3SJiri Olsa  - count of Total/Local/Remote load HITMs
158465f27a3SJiri Olsa
159465f27a3SJiri Olsa  Store Reference - Total, L1Hit, L1Miss
160465f27a3SJiri Olsa    Total - all store accesses
161465f27a3SJiri Olsa    L1Hit - store accesses that hit L1
162465f27a3SJiri Olsa    L1Hit - store accesses that missed L1
163465f27a3SJiri Olsa
164465f27a3SJiri Olsa  Load Dram
165465f27a3SJiri Olsa  - count of local and remote DRAM accesses
166465f27a3SJiri Olsa
167465f27a3SJiri Olsa  LLC Ld Miss
168465f27a3SJiri Olsa  - count of all accesses that missed LLC
169465f27a3SJiri Olsa
170465f27a3SJiri Olsa  Total Loads
171465f27a3SJiri Olsa  - sum of all load accesses
172465f27a3SJiri Olsa
173465f27a3SJiri Olsa  Core Load Hit - FB, L1, L2
174465f27a3SJiri Olsa  - count of load hits in FB (Fill Buffer), L1 and L2 cache
175465f27a3SJiri Olsa
176465f27a3SJiri Olsa  LLC Load Hit - Llc, Rmt
177465f27a3SJiri Olsa  - count of LLC and Remote load hits
178465f27a3SJiri Olsa
179465f27a3SJiri OlsaFor each offset in the 2) list we display following data:
180465f27a3SJiri Olsa
181465f27a3SJiri Olsa  HITM - Rmt, Lcl
182465f27a3SJiri Olsa  - % of Remote/Local HITM accesses for given offset within cacheline
183465f27a3SJiri Olsa
184465f27a3SJiri Olsa  Store Refs - L1 Hit, L1 Miss
185465f27a3SJiri Olsa  - % of store accesses that hit/missed L1 for given offset within cacheline
186465f27a3SJiri Olsa
187465f27a3SJiri Olsa  Data address - Offset
188465f27a3SJiri Olsa  - offset address
189465f27a3SJiri Olsa
190465f27a3SJiri Olsa  Pid
191465f27a3SJiri Olsa  - pid of the process responsible for the accesses
192465f27a3SJiri Olsa
193465f27a3SJiri Olsa  Tid
194465f27a3SJiri Olsa  - tid of the process responsible for the accesses
195465f27a3SJiri Olsa
196465f27a3SJiri Olsa  Code address
197465f27a3SJiri Olsa  - code address responsible for the accesses
198465f27a3SJiri Olsa
199465f27a3SJiri Olsa  cycles - rmt hitm, lcl hitm, load
200465f27a3SJiri Olsa    - sum of cycles for given accesses - Remote/Local HITM and generic load
201465f27a3SJiri Olsa
202465f27a3SJiri Olsa  cpu cnt
203465f27a3SJiri Olsa    - number of cpus that participated on the access
204465f27a3SJiri Olsa
205465f27a3SJiri Olsa  Symbol
206465f27a3SJiri Olsa    - code symbol related to the 'Code address' value
207465f27a3SJiri Olsa
208465f27a3SJiri Olsa  Shared Object
209465f27a3SJiri Olsa    - shared object name related to the 'Code address' value
210465f27a3SJiri Olsa
211465f27a3SJiri Olsa  Source:Line
212465f27a3SJiri Olsa    - source information related to the 'Code address' value
213465f27a3SJiri Olsa
214465f27a3SJiri Olsa  Node
215465f27a3SJiri Olsa    - nodes participating on the access (see NODE INFO section)
216465f27a3SJiri Olsa
217465f27a3SJiri OlsaNODE INFO
218465f27a3SJiri Olsa---------
219465f27a3SJiri OlsaThe 'Node' field displays nodes that accesses given cacheline
220465f27a3SJiri Olsaoffset. Its output comes in 3 flavors:
221465f27a3SJiri Olsa  - node IDs separated by ','
222465f27a3SJiri Olsa  - node IDs with stats for each ID, in following format:
223465f27a3SJiri Olsa      Node{cpus %hitms %stores}
224465f27a3SJiri Olsa  - node IDs with list of affected CPUs in following format:
225465f27a3SJiri Olsa      Node{cpu list}
226465f27a3SJiri Olsa
227465f27a3SJiri OlsaUser can switch between above flavors with -N option or
228465f27a3SJiri Olsause 'n' key to interactively switch in TUI mode.
229465f27a3SJiri Olsa
230465f27a3SJiri OlsaCOALESCE
231465f27a3SJiri Olsa--------
232465f27a3SJiri OlsaUser can specify how to sort offsets for cacheline.
233465f27a3SJiri Olsa
234465f27a3SJiri OlsaFollowing fields are available and governs the final
235465f27a3SJiri Olsaoutput fields set for caheline offsets output:
236465f27a3SJiri Olsa
237465f27a3SJiri Olsa  tid   - coalesced by process TIDs
238465f27a3SJiri Olsa  pid   - coalesced by process PIDs
239465f27a3SJiri Olsa  iaddr - coalesced by code address, following fields are displayed:
240465f27a3SJiri Olsa             Code address, Code symbol, Shared Object, Source line
241465f27a3SJiri Olsa  dso   - coalesced by shared object
242465f27a3SJiri Olsa
243465f27a3SJiri OlsaBy default the coalescing is setup with 'pid,tid,iaddr'.
244465f27a3SJiri Olsa
245465f27a3SJiri OlsaSTDIO OUTPUT
246465f27a3SJiri Olsa------------
247465f27a3SJiri OlsaThe stdio output displays data on standard output.
248465f27a3SJiri Olsa
249465f27a3SJiri OlsaFollowing tables are displayed:
250465f27a3SJiri Olsa  Trace Event Information
251465f27a3SJiri Olsa  - overall statistics of memory accesses
252465f27a3SJiri Olsa
253465f27a3SJiri Olsa  Global Shared Cache Line Event Information
254465f27a3SJiri Olsa  - overall statistics on shared cachelines
255465f27a3SJiri Olsa
256465f27a3SJiri Olsa  Shared Data Cache Line Table
257465f27a3SJiri Olsa  - list of most expensive cachelines
258465f27a3SJiri Olsa
259465f27a3SJiri Olsa  Shared Cache Line Distribution Pareto
260465f27a3SJiri Olsa  - list of all accessed offsets for each cacheline
261465f27a3SJiri Olsa
262465f27a3SJiri OlsaTUI OUTPUT
263465f27a3SJiri Olsa----------
264465f27a3SJiri OlsaThe TUI output provides interactive interface to navigate
265465f27a3SJiri Olsathrough cachelines list and to display offset details.
266465f27a3SJiri Olsa
267465f27a3SJiri OlsaFor details please refer to the help window by pressing '?' key.
268465f27a3SJiri Olsa
269465f27a3SJiri OlsaCREDITS
270465f27a3SJiri Olsa-------
271465f27a3SJiri OlsaAlthough Don Zickus, Dick Fowles and Joe Mario worked together
272465f27a3SJiri Olsato get this implemented, we got lots of early help from Arnaldo
273465f27a3SJiri OlsaCarvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
274465f27a3SJiri Olsa
275465f27a3SJiri OlsaC2C BLOG
276465f27a3SJiri Olsa--------
277465f27a3SJiri OlsaCheck Joe's blog on c2c tool for detailed use case explanation:
278465f27a3SJiri Olsa  https://joemario.github.io/blog/2016/09/01/c2c-blog/
279465f27a3SJiri Olsa
280465f27a3SJiri OlsaSEE ALSO
281465f27a3SJiri Olsa--------
282465f27a3SJiri Olsalinkperf:perf-record[1], linkperf:perf-mem[1]
283