1465f27a3SJiri Olsaperf-c2c(1)
2465f27a3SJiri Olsa===========
3465f27a3SJiri Olsa
4465f27a3SJiri OlsaNAME
5465f27a3SJiri Olsa----
6465f27a3SJiri Olsaperf-c2c - Shared Data C2C/HITM Analyzer.
7465f27a3SJiri Olsa
8465f27a3SJiri OlsaSYNOPSIS
9465f27a3SJiri Olsa--------
10465f27a3SJiri Olsa[verse]
11465f27a3SJiri Olsa'perf c2c record' [<options>] <command>
12465f27a3SJiri Olsa'perf c2c record' [<options>] -- [<record command options>] <command>
13465f27a3SJiri Olsa'perf c2c report' [<options>]
14465f27a3SJiri Olsa
15465f27a3SJiri OlsaDESCRIPTION
16465f27a3SJiri Olsa-----------
17465f27a3SJiri OlsaC2C stands for Cache To Cache.
18465f27a3SJiri Olsa
19465f27a3SJiri OlsaThe perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
20465f27a3SJiri Olsayou to track down the cacheline contentions.
21465f27a3SJiri Olsa
22465f27a3SJiri OlsaThe tool is based on x86's load latency and precise store facility events
23465f27a3SJiri Olsaprovided by Intel CPUs. These events provide:
24465f27a3SJiri Olsa  - memory address of the access
25465f27a3SJiri Olsa  - type of the access (load and store details)
26465f27a3SJiri Olsa  - latency (in cycles) of the load access
27465f27a3SJiri Olsa
28465f27a3SJiri OlsaThe c2c tool provide means to record this data and report back access details
29465f27a3SJiri Olsafor cachelines with highest contention - highest number of HITM accesses.
30465f27a3SJiri Olsa
31465f27a3SJiri OlsaThe basic workflow with this tool follows the standard record/report phase.
32465f27a3SJiri OlsaUser uses the record command to record events data and report command to
33465f27a3SJiri Olsadisplay it.
34465f27a3SJiri Olsa
35465f27a3SJiri Olsa
36465f27a3SJiri OlsaRECORD OPTIONS
37465f27a3SJiri Olsa--------------
38465f27a3SJiri Olsa-e::
39465f27a3SJiri Olsa--event=::
40465f27a3SJiri Olsa	Select the PMU event. Use 'perf mem record -e list'
41465f27a3SJiri Olsa	to list available events.
42465f27a3SJiri Olsa
43465f27a3SJiri Olsa-v::
44465f27a3SJiri Olsa--verbose::
45465f27a3SJiri Olsa	Be more verbose (show counter open errors, etc).
46465f27a3SJiri Olsa
47465f27a3SJiri Olsa-l::
48465f27a3SJiri Olsa--ldlat::
49465f27a3SJiri Olsa	Configure mem-loads latency.
50465f27a3SJiri Olsa
51465f27a3SJiri Olsa-k::
52465f27a3SJiri Olsa--all-kernel::
53465f27a3SJiri Olsa	Configure all used events to run in kernel space.
54465f27a3SJiri Olsa
55465f27a3SJiri Olsa-u::
56465f27a3SJiri Olsa--all-user::
57465f27a3SJiri Olsa	Configure all used events to run in user space.
58465f27a3SJiri Olsa
59465f27a3SJiri OlsaREPORT OPTIONS
60465f27a3SJiri Olsa--------------
61465f27a3SJiri Olsa-k::
62465f27a3SJiri Olsa--vmlinux=<file>::
63465f27a3SJiri Olsa	vmlinux pathname
64465f27a3SJiri Olsa
65465f27a3SJiri Olsa-v::
66465f27a3SJiri Olsa--verbose::
67465f27a3SJiri Olsa	Be more verbose (show counter open errors, etc).
68465f27a3SJiri Olsa
69465f27a3SJiri Olsa-i::
70465f27a3SJiri Olsa--input::
71465f27a3SJiri Olsa	Specify the input file to process.
72465f27a3SJiri Olsa
73465f27a3SJiri Olsa-N::
74465f27a3SJiri Olsa--node-info::
75465f27a3SJiri Olsa	Show extra node info in report (see NODE INFO section)
76465f27a3SJiri Olsa
77465f27a3SJiri Olsa-c::
78465f27a3SJiri Olsa--coalesce::
79465f27a3SJiri Olsa	Specify sorintg fields for single cacheline display.
80465f27a3SJiri Olsa	Following fields are available: tid,pid,iaddr,dso
81465f27a3SJiri Olsa	(see COALESCE)
82465f27a3SJiri Olsa
83465f27a3SJiri Olsa-g::
84465f27a3SJiri Olsa--call-graph::
85465f27a3SJiri Olsa	Setup callchains parameters.
86465f27a3SJiri Olsa	Please refer to perf-report man page for details.
87465f27a3SJiri Olsa
88465f27a3SJiri Olsa--stdio::
89465f27a3SJiri Olsa	Force the stdio output (see STDIO OUTPUT)
90465f27a3SJiri Olsa
91465f27a3SJiri Olsa--stats::
92465f27a3SJiri Olsa	Display only statistic tables and force stdio mode.
93465f27a3SJiri Olsa
94465f27a3SJiri Olsa--full-symbols::
95465f27a3SJiri Olsa	Display full length of symbols.
96465f27a3SJiri Olsa
9718f278d2SJiri Olsa--no-source::
9818f278d2SJiri Olsa	Do not display Source:Line column.
9918f278d2SJiri Olsa
100465f27a3SJiri OlsaC2C RECORD
101465f27a3SJiri Olsa----------
102465f27a3SJiri OlsaThe perf c2c record command setup options related to HITM cacheline analysis
103465f27a3SJiri Olsaand calls standard perf record command.
104465f27a3SJiri Olsa
105465f27a3SJiri OlsaFollowing perf record options are configured by default:
106465f27a3SJiri Olsa(check perf record man page for details)
107465f27a3SJiri Olsa
108465f27a3SJiri Olsa  -W,-d,--sample-cpu
109465f27a3SJiri Olsa
110465f27a3SJiri OlsaUnless specified otherwise with '-e' option, following events are monitored by
111465f27a3SJiri Olsadefault:
112465f27a3SJiri Olsa
113465f27a3SJiri Olsa  cpu/mem-loads,ldlat=30/P
114465f27a3SJiri Olsa  cpu/mem-stores/P
115465f27a3SJiri Olsa
116465f27a3SJiri OlsaUser can pass any 'perf record' option behind '--' mark, like (to enable
117465f27a3SJiri Olsacallchains and system wide monitoring):
118465f27a3SJiri Olsa
119465f27a3SJiri Olsa  $ perf c2c record -- -g -a
120465f27a3SJiri Olsa
121465f27a3SJiri OlsaPlease check RECORD OPTIONS section for specific c2c record options.
122465f27a3SJiri Olsa
123465f27a3SJiri OlsaC2C REPORT
124465f27a3SJiri Olsa----------
125465f27a3SJiri OlsaThe perf c2c report command displays shared data analysis.  It comes in two
126465f27a3SJiri Olsadisplay modes: stdio and tui (default).
127465f27a3SJiri Olsa
128465f27a3SJiri OlsaThe report command workflow is following:
129465f27a3SJiri Olsa  - sort all the data based on the cacheline address
130465f27a3SJiri Olsa  - store access details for each cacheline
131465f27a3SJiri Olsa  - sort all cachelines based on user settings
132465f27a3SJiri Olsa  - display data
133465f27a3SJiri Olsa
134465f27a3SJiri OlsaIn general perf report output consist of 2 basic views:
135465f27a3SJiri Olsa  1) most expensive cachelines list
136465f27a3SJiri Olsa  2) offsets details for each cacheline
137465f27a3SJiri Olsa
138465f27a3SJiri OlsaFor each cacheline in the 1) list we display following data:
139465f27a3SJiri Olsa(Both stdio and TUI modes follow the same fields output)
140465f27a3SJiri Olsa
141465f27a3SJiri Olsa  Index
142465f27a3SJiri Olsa  - zero based index to identify the cacheline
143465f27a3SJiri Olsa
144465f27a3SJiri Olsa  Cacheline
145465f27a3SJiri Olsa  - cacheline address (hex number)
146465f27a3SJiri Olsa
147465f27a3SJiri Olsa  Total records
148465f27a3SJiri Olsa  - sum of all cachelines accesses
149465f27a3SJiri Olsa
150465f27a3SJiri Olsa  Rmt/Lcl Hitm
151465f27a3SJiri Olsa  - cacheline percentage of all Remote/Local HITM accesses
152465f27a3SJiri Olsa
153465f27a3SJiri Olsa  LLC Load Hitm - Total, Lcl, Rmt
154465f27a3SJiri Olsa  - count of Total/Local/Remote load HITMs
155465f27a3SJiri Olsa
156465f27a3SJiri Olsa  Store Reference - Total, L1Hit, L1Miss
157465f27a3SJiri Olsa    Total - all store accesses
158465f27a3SJiri Olsa    L1Hit - store accesses that hit L1
159465f27a3SJiri Olsa    L1Hit - store accesses that missed L1
160465f27a3SJiri Olsa
161465f27a3SJiri Olsa  Load Dram
162465f27a3SJiri Olsa  - count of local and remote DRAM accesses
163465f27a3SJiri Olsa
164465f27a3SJiri Olsa  LLC Ld Miss
165465f27a3SJiri Olsa  - count of all accesses that missed LLC
166465f27a3SJiri Olsa
167465f27a3SJiri Olsa  Total Loads
168465f27a3SJiri Olsa  - sum of all load accesses
169465f27a3SJiri Olsa
170465f27a3SJiri Olsa  Core Load Hit - FB, L1, L2
171465f27a3SJiri Olsa  - count of load hits in FB (Fill Buffer), L1 and L2 cache
172465f27a3SJiri Olsa
173465f27a3SJiri Olsa  LLC Load Hit - Llc, Rmt
174465f27a3SJiri Olsa  - count of LLC and Remote load hits
175465f27a3SJiri Olsa
176465f27a3SJiri OlsaFor each offset in the 2) list we display following data:
177465f27a3SJiri Olsa
178465f27a3SJiri Olsa  HITM - Rmt, Lcl
179465f27a3SJiri Olsa  - % of Remote/Local HITM accesses for given offset within cacheline
180465f27a3SJiri Olsa
181465f27a3SJiri Olsa  Store Refs - L1 Hit, L1 Miss
182465f27a3SJiri Olsa  - % of store accesses that hit/missed L1 for given offset within cacheline
183465f27a3SJiri Olsa
184465f27a3SJiri Olsa  Data address - Offset
185465f27a3SJiri Olsa  - offset address
186465f27a3SJiri Olsa
187465f27a3SJiri Olsa  Pid
188465f27a3SJiri Olsa  - pid of the process responsible for the accesses
189465f27a3SJiri Olsa
190465f27a3SJiri Olsa  Tid
191465f27a3SJiri Olsa  - tid of the process responsible for the accesses
192465f27a3SJiri Olsa
193465f27a3SJiri Olsa  Code address
194465f27a3SJiri Olsa  - code address responsible for the accesses
195465f27a3SJiri Olsa
196465f27a3SJiri Olsa  cycles - rmt hitm, lcl hitm, load
197465f27a3SJiri Olsa    - sum of cycles for given accesses - Remote/Local HITM and generic load
198465f27a3SJiri Olsa
199465f27a3SJiri Olsa  cpu cnt
200465f27a3SJiri Olsa    - number of cpus that participated on the access
201465f27a3SJiri Olsa
202465f27a3SJiri Olsa  Symbol
203465f27a3SJiri Olsa    - code symbol related to the 'Code address' value
204465f27a3SJiri Olsa
205465f27a3SJiri Olsa  Shared Object
206465f27a3SJiri Olsa    - shared object name related to the 'Code address' value
207465f27a3SJiri Olsa
208465f27a3SJiri Olsa  Source:Line
209465f27a3SJiri Olsa    - source information related to the 'Code address' value
210465f27a3SJiri Olsa
211465f27a3SJiri Olsa  Node
212465f27a3SJiri Olsa    - nodes participating on the access (see NODE INFO section)
213465f27a3SJiri Olsa
214465f27a3SJiri OlsaNODE INFO
215465f27a3SJiri Olsa---------
216465f27a3SJiri OlsaThe 'Node' field displays nodes that accesses given cacheline
217465f27a3SJiri Olsaoffset. Its output comes in 3 flavors:
218465f27a3SJiri Olsa  - node IDs separated by ','
219465f27a3SJiri Olsa  - node IDs with stats for each ID, in following format:
220465f27a3SJiri Olsa      Node{cpus %hitms %stores}
221465f27a3SJiri Olsa  - node IDs with list of affected CPUs in following format:
222465f27a3SJiri Olsa      Node{cpu list}
223465f27a3SJiri Olsa
224465f27a3SJiri OlsaUser can switch between above flavors with -N option or
225465f27a3SJiri Olsause 'n' key to interactively switch in TUI mode.
226465f27a3SJiri Olsa
227465f27a3SJiri OlsaCOALESCE
228465f27a3SJiri Olsa--------
229465f27a3SJiri OlsaUser can specify how to sort offsets for cacheline.
230465f27a3SJiri Olsa
231465f27a3SJiri OlsaFollowing fields are available and governs the final
232465f27a3SJiri Olsaoutput fields set for caheline offsets output:
233465f27a3SJiri Olsa
234465f27a3SJiri Olsa  tid   - coalesced by process TIDs
235465f27a3SJiri Olsa  pid   - coalesced by process PIDs
236465f27a3SJiri Olsa  iaddr - coalesced by code address, following fields are displayed:
237465f27a3SJiri Olsa             Code address, Code symbol, Shared Object, Source line
238465f27a3SJiri Olsa  dso   - coalesced by shared object
239465f27a3SJiri Olsa
240465f27a3SJiri OlsaBy default the coalescing is setup with 'pid,tid,iaddr'.
241465f27a3SJiri Olsa
242465f27a3SJiri OlsaSTDIO OUTPUT
243465f27a3SJiri Olsa------------
244465f27a3SJiri OlsaThe stdio output displays data on standard output.
245465f27a3SJiri Olsa
246465f27a3SJiri OlsaFollowing tables are displayed:
247465f27a3SJiri Olsa  Trace Event Information
248465f27a3SJiri Olsa  - overall statistics of memory accesses
249465f27a3SJiri Olsa
250465f27a3SJiri Olsa  Global Shared Cache Line Event Information
251465f27a3SJiri Olsa  - overall statistics on shared cachelines
252465f27a3SJiri Olsa
253465f27a3SJiri Olsa  Shared Data Cache Line Table
254465f27a3SJiri Olsa  - list of most expensive cachelines
255465f27a3SJiri Olsa
256465f27a3SJiri Olsa  Shared Cache Line Distribution Pareto
257465f27a3SJiri Olsa  - list of all accessed offsets for each cacheline
258465f27a3SJiri Olsa
259465f27a3SJiri OlsaTUI OUTPUT
260465f27a3SJiri Olsa----------
261465f27a3SJiri OlsaThe TUI output provides interactive interface to navigate
262465f27a3SJiri Olsathrough cachelines list and to display offset details.
263465f27a3SJiri Olsa
264465f27a3SJiri OlsaFor details please refer to the help window by pressing '?' key.
265465f27a3SJiri Olsa
266465f27a3SJiri OlsaCREDITS
267465f27a3SJiri Olsa-------
268465f27a3SJiri OlsaAlthough Don Zickus, Dick Fowles and Joe Mario worked together
269465f27a3SJiri Olsato get this implemented, we got lots of early help from Arnaldo
270465f27a3SJiri OlsaCarvalho de Melo, Stephane Eranian, Jiri Olsa and Andi Kleen.
271465f27a3SJiri Olsa
272465f27a3SJiri OlsaC2C BLOG
273465f27a3SJiri Olsa--------
274465f27a3SJiri OlsaCheck Joe's blog on c2c tool for detailed use case explanation:
275465f27a3SJiri Olsa  https://joemario.github.io/blog/2016/09/01/c2c-blog/
276465f27a3SJiri Olsa
277465f27a3SJiri OlsaSEE ALSO
278465f27a3SJiri Olsa--------
279465f27a3SJiri Olsalinkperf:perf-record[1], linkperf:perf-mem[1]
280