1837e716dSChangbin Du==================================
2837e716dSChangbin DuUsing the Linux Kernel Tracepoints
3837e716dSChangbin Du==================================
4837e716dSChangbin Du
5837e716dSChangbin Du:Author: Mathieu Desnoyers
6837e716dSChangbin Du
7837e716dSChangbin Du
8837e716dSChangbin DuThis document introduces Linux Kernel Tracepoints and their use. It
9837e716dSChangbin Duprovides examples of how to insert tracepoints in the kernel and
10837e716dSChangbin Duconnect probe functions to them and provides some examples of probe
11837e716dSChangbin Dufunctions.
12837e716dSChangbin Du
13837e716dSChangbin Du
14837e716dSChangbin DuPurpose of tracepoints
15837e716dSChangbin Du----------------------
16837e716dSChangbin DuA tracepoint placed in code provides a hook to call a function (probe)
17837e716dSChangbin Duthat you can provide at runtime. A tracepoint can be "on" (a probe is
18837e716dSChangbin Duconnected to it) or "off" (no probe is attached). When a tracepoint is
19837e716dSChangbin Du"off" it has no effect, except for adding a tiny time penalty
20837e716dSChangbin Du(checking a condition for a branch) and space penalty (adding a few
21837e716dSChangbin Dubytes for the function call at the end of the instrumented function
22837e716dSChangbin Duand adds a data structure in a separate section).  When a tracepoint
23837e716dSChangbin Duis "on", the function you provide is called each time the tracepoint
24837e716dSChangbin Duis executed, in the execution context of the caller. When the function
25837e716dSChangbin Duprovided ends its execution, it returns to the caller (continuing from
26837e716dSChangbin Duthe tracepoint site).
27837e716dSChangbin Du
28837e716dSChangbin DuYou can put tracepoints at important locations in the code. They are
29837e716dSChangbin Dulightweight hooks that can pass an arbitrary number of parameters,
30837e716dSChangbin Duwhich prototypes are described in a tracepoint declaration placed in a
31837e716dSChangbin Duheader file.
32837e716dSChangbin Du
33837e716dSChangbin DuThey can be used for tracing and performance accounting.
34837e716dSChangbin Du
35837e716dSChangbin Du
36837e716dSChangbin DuUsage
37837e716dSChangbin Du-----
38837e716dSChangbin DuTwo elements are required for tracepoints :
39837e716dSChangbin Du
40837e716dSChangbin Du- A tracepoint definition, placed in a header file.
41837e716dSChangbin Du- The tracepoint statement, in C code.
42837e716dSChangbin Du
43837e716dSChangbin DuIn order to use tracepoints, you should include linux/tracepoint.h.
44837e716dSChangbin Du
45837e716dSChangbin DuIn include/trace/events/subsys.h::
46837e716dSChangbin Du
47837e716dSChangbin Du	#undef TRACE_SYSTEM
48837e716dSChangbin Du	#define TRACE_SYSTEM subsys
49837e716dSChangbin Du
50837e716dSChangbin Du	#if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ)
51837e716dSChangbin Du	#define _TRACE_SUBSYS_H
52837e716dSChangbin Du
53837e716dSChangbin Du	#include <linux/tracepoint.h>
54837e716dSChangbin Du
55837e716dSChangbin Du	DECLARE_TRACE(subsys_eventname,
56837e716dSChangbin Du		TP_PROTO(int firstarg, struct task_struct *p),
57837e716dSChangbin Du		TP_ARGS(firstarg, p));
58837e716dSChangbin Du
59837e716dSChangbin Du	#endif /* _TRACE_SUBSYS_H */
60837e716dSChangbin Du
61837e716dSChangbin Du	/* This part must be outside protection */
62837e716dSChangbin Du	#include <trace/define_trace.h>
63837e716dSChangbin Du
64837e716dSChangbin DuIn subsys/file.c (where the tracing statement must be added)::
65837e716dSChangbin Du
66837e716dSChangbin Du	#include <trace/events/subsys.h>
67837e716dSChangbin Du
68837e716dSChangbin Du	#define CREATE_TRACE_POINTS
69837e716dSChangbin Du	DEFINE_TRACE(subsys_eventname);
70837e716dSChangbin Du
71837e716dSChangbin Du	void somefct(void)
72837e716dSChangbin Du	{
73837e716dSChangbin Du		...
74837e716dSChangbin Du		trace_subsys_eventname(arg, task);
75837e716dSChangbin Du		...
76837e716dSChangbin Du	}
77837e716dSChangbin Du
78837e716dSChangbin DuWhere :
79837e716dSChangbin Du  - subsys_eventname is an identifier unique to your event
80837e716dSChangbin Du
81837e716dSChangbin Du    - subsys is the name of your subsystem.
82837e716dSChangbin Du    - eventname is the name of the event to trace.
83837e716dSChangbin Du
84837e716dSChangbin Du  - `TP_PROTO(int firstarg, struct task_struct *p)` is the prototype of the
85837e716dSChangbin Du    function called by this tracepoint.
86837e716dSChangbin Du
87837e716dSChangbin Du  - `TP_ARGS(firstarg, p)` are the parameters names, same as found in the
88837e716dSChangbin Du    prototype.
89837e716dSChangbin Du
90837e716dSChangbin Du  - if you use the header in multiple source files, `#define CREATE_TRACE_POINTS`
91837e716dSChangbin Du    should appear only in one source file.
92837e716dSChangbin Du
93837e716dSChangbin DuConnecting a function (probe) to a tracepoint is done by providing a
94837e716dSChangbin Duprobe (function to call) for the specific tracepoint through
95837e716dSChangbin Duregister_trace_subsys_eventname().  Removing a probe is done through
96837e716dSChangbin Duunregister_trace_subsys_eventname(); it will remove the probe.
97837e716dSChangbin Du
98837e716dSChangbin Dutracepoint_synchronize_unregister() must be called before the end of
99837e716dSChangbin Duthe module exit function to make sure there is no caller left using
100837e716dSChangbin Duthe probe. This, and the fact that preemption is disabled around the
101837e716dSChangbin Duprobe call, make sure that probe removal and module unload are safe.
102837e716dSChangbin Du
103837e716dSChangbin DuThe tracepoint mechanism supports inserting multiple instances of the
104837e716dSChangbin Dusame tracepoint, but a single definition must be made of a given
105837e716dSChangbin Dutracepoint name over all the kernel to make sure no type conflict will
106837e716dSChangbin Duoccur. Name mangling of the tracepoints is done using the prototypes
107837e716dSChangbin Duto make sure typing is correct. Verification of probe type correctness
108837e716dSChangbin Duis done at the registration site by the compiler. Tracepoints can be
109837e716dSChangbin Duput in inline functions, inlined static functions, and unrolled loops
110837e716dSChangbin Duas well as regular functions.
111837e716dSChangbin Du
112837e716dSChangbin DuThe naming scheme "subsys_event" is suggested here as a convention
113837e716dSChangbin Duintended to limit collisions. Tracepoint names are global to the
114837e716dSChangbin Dukernel: they are considered as being the same whether they are in the
115837e716dSChangbin Ducore kernel image or in modules.
116837e716dSChangbin Du
117837e716dSChangbin DuIf the tracepoint has to be used in kernel modules, an
118837e716dSChangbin DuEXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
119837e716dSChangbin Duused to export the defined tracepoints.
120837e716dSChangbin Du
121837e716dSChangbin DuIf you need to do a bit of work for a tracepoint parameter, and
122837e716dSChangbin Duthat work is only used for the tracepoint, that work can be encapsulated
123837e716dSChangbin Duwithin an if statement with the following::
124837e716dSChangbin Du
125837e716dSChangbin Du	if (trace_foo_bar_enabled()) {
126837e716dSChangbin Du		int i;
127837e716dSChangbin Du		int tot = 0;
128837e716dSChangbin Du
129837e716dSChangbin Du		for (i = 0; i < count; i++)
130837e716dSChangbin Du			tot += calculate_nuggets();
131837e716dSChangbin Du
132837e716dSChangbin Du		trace_foo_bar(tot);
133837e716dSChangbin Du	}
134837e716dSChangbin Du
135837e716dSChangbin DuAll trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled()
136837e716dSChangbin Dufunction defined that returns true if the tracepoint is enabled and
137837e716dSChangbin Dufalse otherwise. The trace_<tracepoint>() should always be within the
138837e716dSChangbin Dublock of the if (trace_<tracepoint>_enabled()) to prevent races between
139837e716dSChangbin Duthe tracepoint being enabled and the check being seen.
140837e716dSChangbin Du
141837e716dSChangbin DuThe advantage of using the trace_<tracepoint>_enabled() is that it uses
142837e716dSChangbin Duthe static_key of the tracepoint to allow the if statement to be implemented
143837e716dSChangbin Duwith jump labels and avoid conditional branches.
144837e716dSChangbin Du
145837e716dSChangbin Du.. note:: The convenience macro TRACE_EVENT provides an alternative way to
146837e716dSChangbin Du      define tracepoints. Check http://lwn.net/Articles/379903,
147837e716dSChangbin Du      http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362
148837e716dSChangbin Du      for a series of articles with more details.
149afbe7973SSteven Rostedt (VMware)
150afbe7973SSteven Rostedt (VMware)If you require calling a tracepoint from a header file, it is not
151afbe7973SSteven Rostedt (VMware)recommended to call one directly or to use the trace_<tracepoint>_enabled()
152afbe7973SSteven Rostedt (VMware)function call, as tracepoints in header files can have side effects if a
153afbe7973SSteven Rostedt (VMware)header is included from a file that has CREATE_TRACE_POINTS set, as
154afbe7973SSteven Rostedt (VMware)well as the trace_<tracepoint>() is not that small of an inline
155afbe7973SSteven Rostedt (VMware)and can bloat the kernel if used by other inlined functions. Instead,
156afbe7973SSteven Rostedt (VMware)include tracepoint-defs.h and use tracepoint_enabled().
157afbe7973SSteven Rostedt (VMware)
158afbe7973SSteven Rostedt (VMware)In a C file::
159afbe7973SSteven Rostedt (VMware)
160afbe7973SSteven Rostedt (VMware)	void do_trace_foo_bar_wrapper(args)
161afbe7973SSteven Rostedt (VMware)	{
162afbe7973SSteven Rostedt (VMware)		trace_foo_bar(args);
163afbe7973SSteven Rostedt (VMware)	}
164afbe7973SSteven Rostedt (VMware)
165afbe7973SSteven Rostedt (VMware)In the header file::
166afbe7973SSteven Rostedt (VMware)
167afbe7973SSteven Rostedt (VMware)	DECLARE_TRACEPOINT(foo_bar);
168afbe7973SSteven Rostedt (VMware)
169afbe7973SSteven Rostedt (VMware)	static inline void some_inline_function()
170afbe7973SSteven Rostedt (VMware)	{
171afbe7973SSteven Rostedt (VMware)		[..]
172afbe7973SSteven Rostedt (VMware)		if (tracepoint_enabled(foo_bar))
173afbe7973SSteven Rostedt (VMware)			do_trace_foo_bar_wrapper(args);
174afbe7973SSteven Rostedt (VMware)		[..]
175afbe7973SSteven Rostedt (VMware)	}
176