1837e716dSChangbin Du================================== 2837e716dSChangbin DuUsing the Linux Kernel Tracepoints 3837e716dSChangbin Du================================== 4837e716dSChangbin Du 5837e716dSChangbin Du:Author: Mathieu Desnoyers 6837e716dSChangbin Du 7837e716dSChangbin Du 8837e716dSChangbin DuThis document introduces Linux Kernel Tracepoints and their use. It 9837e716dSChangbin Duprovides examples of how to insert tracepoints in the kernel and 10837e716dSChangbin Duconnect probe functions to them and provides some examples of probe 11837e716dSChangbin Dufunctions. 12837e716dSChangbin Du 13837e716dSChangbin Du 14837e716dSChangbin DuPurpose of tracepoints 15837e716dSChangbin Du---------------------- 16837e716dSChangbin DuA tracepoint placed in code provides a hook to call a function (probe) 17837e716dSChangbin Duthat you can provide at runtime. A tracepoint can be "on" (a probe is 18837e716dSChangbin Duconnected to it) or "off" (no probe is attached). When a tracepoint is 19837e716dSChangbin Du"off" it has no effect, except for adding a tiny time penalty 20837e716dSChangbin Du(checking a condition for a branch) and space penalty (adding a few 21837e716dSChangbin Dubytes for the function call at the end of the instrumented function 22837e716dSChangbin Duand adds a data structure in a separate section). When a tracepoint 23837e716dSChangbin Duis "on", the function you provide is called each time the tracepoint 24837e716dSChangbin Duis executed, in the execution context of the caller. When the function 25837e716dSChangbin Duprovided ends its execution, it returns to the caller (continuing from 26837e716dSChangbin Duthe tracepoint site). 27837e716dSChangbin Du 28837e716dSChangbin DuYou can put tracepoints at important locations in the code. They are 29837e716dSChangbin Dulightweight hooks that can pass an arbitrary number of parameters, 30837e716dSChangbin Duwhich prototypes are described in a tracepoint declaration placed in a 31837e716dSChangbin Duheader file. 32837e716dSChangbin Du 33837e716dSChangbin DuThey can be used for tracing and performance accounting. 34837e716dSChangbin Du 35837e716dSChangbin Du 36837e716dSChangbin DuUsage 37837e716dSChangbin Du----- 38837e716dSChangbin DuTwo elements are required for tracepoints : 39837e716dSChangbin Du 40837e716dSChangbin Du- A tracepoint definition, placed in a header file. 41837e716dSChangbin Du- The tracepoint statement, in C code. 42837e716dSChangbin Du 43837e716dSChangbin DuIn order to use tracepoints, you should include linux/tracepoint.h. 44837e716dSChangbin Du 45837e716dSChangbin DuIn include/trace/events/subsys.h:: 46837e716dSChangbin Du 47837e716dSChangbin Du #undef TRACE_SYSTEM 48837e716dSChangbin Du #define TRACE_SYSTEM subsys 49837e716dSChangbin Du 50837e716dSChangbin Du #if !defined(_TRACE_SUBSYS_H) || defined(TRACE_HEADER_MULTI_READ) 51837e716dSChangbin Du #define _TRACE_SUBSYS_H 52837e716dSChangbin Du 53837e716dSChangbin Du #include <linux/tracepoint.h> 54837e716dSChangbin Du 55837e716dSChangbin Du DECLARE_TRACE(subsys_eventname, 56837e716dSChangbin Du TP_PROTO(int firstarg, struct task_struct *p), 57837e716dSChangbin Du TP_ARGS(firstarg, p)); 58837e716dSChangbin Du 59837e716dSChangbin Du #endif /* _TRACE_SUBSYS_H */ 60837e716dSChangbin Du 61837e716dSChangbin Du /* This part must be outside protection */ 62837e716dSChangbin Du #include <trace/define_trace.h> 63837e716dSChangbin Du 64837e716dSChangbin DuIn subsys/file.c (where the tracing statement must be added):: 65837e716dSChangbin Du 66837e716dSChangbin Du #include <trace/events/subsys.h> 67837e716dSChangbin Du 68837e716dSChangbin Du #define CREATE_TRACE_POINTS 69837e716dSChangbin Du DEFINE_TRACE(subsys_eventname); 70837e716dSChangbin Du 71837e716dSChangbin Du void somefct(void) 72837e716dSChangbin Du { 73837e716dSChangbin Du ... 74837e716dSChangbin Du trace_subsys_eventname(arg, task); 75837e716dSChangbin Du ... 76837e716dSChangbin Du } 77837e716dSChangbin Du 78837e716dSChangbin DuWhere : 79837e716dSChangbin Du - subsys_eventname is an identifier unique to your event 80837e716dSChangbin Du 81837e716dSChangbin Du - subsys is the name of your subsystem. 82837e716dSChangbin Du - eventname is the name of the event to trace. 83837e716dSChangbin Du 84837e716dSChangbin Du - `TP_PROTO(int firstarg, struct task_struct *p)` is the prototype of the 85837e716dSChangbin Du function called by this tracepoint. 86837e716dSChangbin Du 87837e716dSChangbin Du - `TP_ARGS(firstarg, p)` are the parameters names, same as found in the 88837e716dSChangbin Du prototype. 89837e716dSChangbin Du 90837e716dSChangbin Du - if you use the header in multiple source files, `#define CREATE_TRACE_POINTS` 91837e716dSChangbin Du should appear only in one source file. 92837e716dSChangbin Du 93837e716dSChangbin DuConnecting a function (probe) to a tracepoint is done by providing a 94837e716dSChangbin Duprobe (function to call) for the specific tracepoint through 95837e716dSChangbin Duregister_trace_subsys_eventname(). Removing a probe is done through 96837e716dSChangbin Duunregister_trace_subsys_eventname(); it will remove the probe. 97837e716dSChangbin Du 98837e716dSChangbin Dutracepoint_synchronize_unregister() must be called before the end of 99837e716dSChangbin Duthe module exit function to make sure there is no caller left using 100837e716dSChangbin Duthe probe. This, and the fact that preemption is disabled around the 101837e716dSChangbin Duprobe call, make sure that probe removal and module unload are safe. 102837e716dSChangbin Du 103837e716dSChangbin DuThe tracepoint mechanism supports inserting multiple instances of the 104837e716dSChangbin Dusame tracepoint, but a single definition must be made of a given 105837e716dSChangbin Dutracepoint name over all the kernel to make sure no type conflict will 106837e716dSChangbin Duoccur. Name mangling of the tracepoints is done using the prototypes 107837e716dSChangbin Duto make sure typing is correct. Verification of probe type correctness 108837e716dSChangbin Duis done at the registration site by the compiler. Tracepoints can be 109837e716dSChangbin Duput in inline functions, inlined static functions, and unrolled loops 110837e716dSChangbin Duas well as regular functions. 111837e716dSChangbin Du 112837e716dSChangbin DuThe naming scheme "subsys_event" is suggested here as a convention 113837e716dSChangbin Duintended to limit collisions. Tracepoint names are global to the 114837e716dSChangbin Dukernel: they are considered as being the same whether they are in the 115837e716dSChangbin Ducore kernel image or in modules. 116837e716dSChangbin Du 117837e716dSChangbin DuIf the tracepoint has to be used in kernel modules, an 118837e716dSChangbin DuEXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be 119837e716dSChangbin Duused to export the defined tracepoints. 120837e716dSChangbin Du 121837e716dSChangbin DuIf you need to do a bit of work for a tracepoint parameter, and 122837e716dSChangbin Duthat work is only used for the tracepoint, that work can be encapsulated 123837e716dSChangbin Duwithin an if statement with the following:: 124837e716dSChangbin Du 125837e716dSChangbin Du if (trace_foo_bar_enabled()) { 126837e716dSChangbin Du int i; 127837e716dSChangbin Du int tot = 0; 128837e716dSChangbin Du 129837e716dSChangbin Du for (i = 0; i < count; i++) 130837e716dSChangbin Du tot += calculate_nuggets(); 131837e716dSChangbin Du 132837e716dSChangbin Du trace_foo_bar(tot); 133837e716dSChangbin Du } 134837e716dSChangbin Du 135837e716dSChangbin DuAll trace_<tracepoint>() calls have a matching trace_<tracepoint>_enabled() 136837e716dSChangbin Dufunction defined that returns true if the tracepoint is enabled and 137837e716dSChangbin Dufalse otherwise. The trace_<tracepoint>() should always be within the 138837e716dSChangbin Dublock of the if (trace_<tracepoint>_enabled()) to prevent races between 139837e716dSChangbin Duthe tracepoint being enabled and the check being seen. 140837e716dSChangbin Du 141837e716dSChangbin DuThe advantage of using the trace_<tracepoint>_enabled() is that it uses 142837e716dSChangbin Duthe static_key of the tracepoint to allow the if statement to be implemented 143837e716dSChangbin Duwith jump labels and avoid conditional branches. 144837e716dSChangbin Du 145837e716dSChangbin Du.. note:: The convenience macro TRACE_EVENT provides an alternative way to 146837e716dSChangbin Du define tracepoints. Check http://lwn.net/Articles/379903, 147837e716dSChangbin Du http://lwn.net/Articles/381064 and http://lwn.net/Articles/383362 148837e716dSChangbin Du for a series of articles with more details. 149*afbe7973SSteven Rostedt (VMware) 150*afbe7973SSteven Rostedt (VMware)If you require calling a tracepoint from a header file, it is not 151*afbe7973SSteven Rostedt (VMware)recommended to call one directly or to use the trace_<tracepoint>_enabled() 152*afbe7973SSteven Rostedt (VMware)function call, as tracepoints in header files can have side effects if a 153*afbe7973SSteven Rostedt (VMware)header is included from a file that has CREATE_TRACE_POINTS set, as 154*afbe7973SSteven Rostedt (VMware)well as the trace_<tracepoint>() is not that small of an inline 155*afbe7973SSteven Rostedt (VMware)and can bloat the kernel if used by other inlined functions. Instead, 156*afbe7973SSteven Rostedt (VMware)include tracepoint-defs.h and use tracepoint_enabled(). 157*afbe7973SSteven Rostedt (VMware) 158*afbe7973SSteven Rostedt (VMware)In a C file:: 159*afbe7973SSteven Rostedt (VMware) 160*afbe7973SSteven Rostedt (VMware) void do_trace_foo_bar_wrapper(args) 161*afbe7973SSteven Rostedt (VMware) { 162*afbe7973SSteven Rostedt (VMware) trace_foo_bar(args); 163*afbe7973SSteven Rostedt (VMware) } 164*afbe7973SSteven Rostedt (VMware) 165*afbe7973SSteven Rostedt (VMware)In the header file:: 166*afbe7973SSteven Rostedt (VMware) 167*afbe7973SSteven Rostedt (VMware) DECLARE_TRACEPOINT(foo_bar); 168*afbe7973SSteven Rostedt (VMware) 169*afbe7973SSteven Rostedt (VMware) static inline void some_inline_function() 170*afbe7973SSteven Rostedt (VMware) { 171*afbe7973SSteven Rostedt (VMware) [..] 172*afbe7973SSteven Rostedt (VMware) if (tracepoint_enabled(foo_bar)) 173*afbe7973SSteven Rostedt (VMware) do_trace_foo_bar_wrapper(args); 174*afbe7973SSteven Rostedt (VMware) [..] 175*afbe7973SSteven Rostedt (VMware) } 176