Documentation/trace/tracepoint-analysis.rst

14 taken in conjunction with other tracepoints to build a "Big Picture" of
15 what is going on within the system. There are a large number of methods for
27 ----------------------
32   $ find /sys/kernel/tracing/events -type d
34 will give a fair indication of the number of events available.
37 ----------------------------------------
40 are available with the perf tool. Getting a list of available events is a
55 3.1 System-Wide Event Enabling
56 ------------------------------
58 See Documentation/trace/events.rst for a proper description on how events
59 can be enabled system-wide. A short example of enabling all events related
62   $ for i in `find /sys/kernel/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
64 3.2 System-Wide Event Enabling with SystemTap
65 ---------------------------------------------
79   	printf ("%-25s %-s\n", "#Pages Allocated", "Process Name")
80   	foreach (proc in page_allocs-)
81   		printf("%-25d %s\n", page_allocs[proc], proc)
90 3.3 System-Wide Event Enabling with PCL
91 ---------------------------------------
93 By specifying the -a switch and analysing sleep, the system-wide events
94 for a duration of time can be examined.
97  $ perf stat -a \
98 	-e kmem:mm_page_alloc -e kmem:mm_page_free \
99 	-e kmem:mm_page_free_batched \
109 Similarly, one could execute a shell and exit it as desired to get a report
113 ------------------------
115 Documentation/trace/ftrace.rst describes how to enable events on a per-thread
119 -----------------------------------
121 Events can be activated and tracked for the duration of a process on a local
125   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
126 		 -e kmem:mm_page_free_batched ./hackbench 10
140 Documentation/trace/ftrace.rst covers in-depth how to filter events in
153   $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free
154 			-e kmem:mm_page_free_batched ./hackbench 10
163           16630  kmem:mm_page_alloc         ( +-   3.542% )
164           11486  kmem:mm_page_free	    ( +-   4.771% )
165            4730  kmem:mm_page_free_batched  ( +-   2.325% )
167     0.982653002  seconds time elapsed   ( +-   1.448% )
169 In the event that some higher-level event is required that depends on some
170 aggregation of discrete events, then a script would need to be developed.
172 Using --repeat, it is also possible to view how events are fluctuating over
173 time on a system-wide basis using -a and sleep.
176   $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \
177 		-e kmem:mm_page_free_batched \
178 		-a --repeat 10 \
182            1066  kmem:mm_page_alloc         ( +-  26.148% )
183             182  kmem:mm_page_free          ( +-   5.464% )
184             890  kmem:mm_page_free_batched  ( +-  30.079% )
186     1.002251757  seconds time elapsed   ( +-   0.005% )
188 6. Higher-Level Analysis with Helper Scripts
192 /sys/kernel/tracing/trace_pipe in human-readable format although binary
193 options exist as well. By post-processing the output, further information can
194 be gathered on-line as appropriate. Examples of post-processing might include
196   - Reading information from /proc for the PID that triggered the event
197   - Deriving a higher-level event from a series of lower-level events.
198   - Calculating latencies between two events
200 Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
201 script that can read trace_pipe from STDIN or a copy of a trace. When used
202 on-line, it can be interrupted once to generate a report without exiting
208   - Derive high-level events from many low-level events. If a number of pages
209     are freed to the main allocator from the per-CPU lists, it recognises
210     that as one per-CPU drain even though there is no specific tracepoint
212   - It can aggregate based on PID or individual process number
213   - In the event memory is getting externally fragmented, it reports
215   - When receiving an event about a PID, it can record who the parent was so
216     that if large numbers of events are coming from very short-lived
220 7. Lower-Level Analysis with PCL
223 There may also be a requirement to identify what functions within a program
228   $ perf record -c 1 \
229 	-e kmem:mm_page_alloc -e kmem:mm_page_free \
230 	-e kmem:mm_page_free_batched \
235 Note the use of '-c 1' to set the event period to sample. The default sample
237 very coarse as a result.
239 This record outputted a file called perf.data which can be analysed using
250        6.85%  hackbench  /lib/i686/cmov/libc-2.9.so
251        2.62%  hackbench  /lib/ld-2.9.so
255        0.02%       perf  /lib/i686/cmov/libc-2.9.so
257        0.01%       perf  /lib/ld-2.9.so
258        0.00%  hackbench  /lib/i686/cmov/libpthread-2.9.so
260   # (For more details, try: perf report --sort comm,dso,symbol)
265 take a slightly different example. In the course of writing this, it was
270   $ perf record -c 1 -f \
271 		-e kmem:mm_page_alloc -e kmem:mm_page_free \
272 		-e kmem:mm_page_free_batched \
273 		-p `pidof X`
275 This was interrupted after a few seconds and
285       47.95%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1
286        0.09%     Xorg  /lib/i686/cmov/libc-2.9.so
289   # (For more details, try: perf report --sort comm,dso,symbol)
292 So, almost half of the events are occurring in a library. To get an idea which
296   $ perf report --sort comm,dso,symbol
302       51.95%     Xorg  [vdso]                                   [.] 0x000000ffffe424
303       47.93%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixmanFillsse2
304        0.09%     Xorg  /lib/i686/cmov/libc-2.9.so               [.] _int_malloc
305        0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] pixman_region32_copy_f
307        0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] get_fast_path
315     0.00 :         34eeb:       0f 18 08                prefetcht0 (%eax)
321    12.40 :         34eee:       66 0f 7f 80 40 ff ff    movdqa %xmm0,-0xc0(%eax)
323    12.40 :         34ef6:       66 0f 7f 80 50 ff ff    movdqa %xmm0,-0xb0(%eax)
325    12.39 :         34efe:       66 0f 7f 80 60 ff ff    movdqa %xmm0,-0xa0(%eax)
327    12.67 :         34f06:       66 0f 7f 80 70 ff ff    movdqa %xmm0,-0x90(%eax)
329    12.58 :         34f0e:       66 0f 7f 40 80          movdqa %xmm0,-0x80(%eax)
330    12.31 :         34f13:       66 0f 7f 40 90          movdqa %xmm0,-0x70(%eax)
331    12.40 :         34f18:       66 0f 7f 40 a0          movdqa %xmm0,-0x60(%eax)
332    12.31 :         34f1d:       66 0f 7f 40 b0          movdqa %xmm0,-0x50(%eax)
334 At a glance, it looks like the time is being spent copying pixmaps to
336 are being copied around so much but a starting point would be to take an