xref: /openbmc/linux/Documentation/trace/fprobetrace.rst (revision c900529f3d9161bfde5cca0754f83b4d3c3e0220)
1590e7b28SMasami Hiramatsu (Google).. SPDX-License-Identifier: GPL-2.0
2590e7b28SMasami Hiramatsu (Google)
3590e7b28SMasami Hiramatsu (Google)==========================
4590e7b28SMasami Hiramatsu (Google)Fprobe-based Event Tracing
5590e7b28SMasami Hiramatsu (Google)==========================
6590e7b28SMasami Hiramatsu (Google)
7590e7b28SMasami Hiramatsu (Google).. Author: Masami Hiramatsu <mhiramat@kernel.org>
8590e7b28SMasami Hiramatsu (Google)
9590e7b28SMasami Hiramatsu (Google)Overview
10590e7b28SMasami Hiramatsu (Google)--------
11590e7b28SMasami Hiramatsu (Google)
12590e7b28SMasami Hiramatsu (Google)Fprobe event is similar to the kprobe event, but limited to probe on
13590e7b28SMasami Hiramatsu (Google)the function entry and exit only. It is good enough for many use cases
14590e7b28SMasami Hiramatsu (Google)which only traces some specific functions.
15590e7b28SMasami Hiramatsu (Google)
16590e7b28SMasami Hiramatsu (Google)This document also covers tracepoint probe events (tprobe) since this
17590e7b28SMasami Hiramatsu (Google)is also works only on the tracepoint entry. User can trace a part of
18590e7b28SMasami Hiramatsu (Google)tracepoint argument, or the tracepoint without trace-event, which is
19590e7b28SMasami Hiramatsu (Google)not exposed on tracefs.
20590e7b28SMasami Hiramatsu (Google)
21590e7b28SMasami Hiramatsu (Google)As same as other dynamic events, fprobe events and tracepoint probe
22590e7b28SMasami Hiramatsu (Google)events are defined via `dynamic_events` interface file on tracefs.
23590e7b28SMasami Hiramatsu (Google)
24590e7b28SMasami Hiramatsu (Google)Synopsis of fprobe-events
25590e7b28SMasami Hiramatsu (Google)-------------------------
26590e7b28SMasami Hiramatsu (Google)::
27590e7b28SMasami Hiramatsu (Google)
28590e7b28SMasami Hiramatsu (Google)  f[:[GRP1/][EVENT1]] SYM [FETCHARGS]                       : Probe on function entry
29590e7b28SMasami Hiramatsu (Google)  f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS]     : Probe on function exit
30590e7b28SMasami Hiramatsu (Google)  t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS]                : Probe on tracepoint
31590e7b28SMasami Hiramatsu (Google)
32590e7b28SMasami Hiramatsu (Google) GRP1           : Group name for fprobe. If omitted, use "fprobes" for it.
33590e7b28SMasami Hiramatsu (Google) GRP2           : Group name for tprobe. If omitted, use "tracepoints" for it.
34590e7b28SMasami Hiramatsu (Google) EVENT1         : Event name for fprobe. If omitted, the event name is
35590e7b28SMasami Hiramatsu (Google)                  "SYM__entry" or "SYM__exit".
36590e7b28SMasami Hiramatsu (Google) EVENT2         : Event name for tprobe. If omitted, the event name is
37590e7b28SMasami Hiramatsu (Google)                  the same as "TRACEPOINT", but if the "TRACEPOINT" starts
38590e7b28SMasami Hiramatsu (Google)                  with a digit character, "_TRACEPOINT" is used.
39590e7b28SMasami Hiramatsu (Google) MAXACTIVE      : Maximum number of instances of the specified function that
40590e7b28SMasami Hiramatsu (Google)                  can be probed simultaneously, or 0 for the default value
41a2bd0c08SMasami Hiramatsu (Google)                  as defined in Documentation/trace/fprobe.rst
42590e7b28SMasami Hiramatsu (Google)
43590e7b28SMasami Hiramatsu (Google) FETCHARGS      : Arguments. Each probe can have up to 128 args.
44590e7b28SMasami Hiramatsu (Google)  ARG           : Fetch "ARG" function argument using BTF (only for function
45590e7b28SMasami Hiramatsu (Google)                  entry or tracepoint.) (\*1)
46590e7b28SMasami Hiramatsu (Google)  @ADDR         : Fetch memory at ADDR (ADDR should be in kernel)
47590e7b28SMasami Hiramatsu (Google)  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
48590e7b28SMasami Hiramatsu (Google)  $stackN       : Fetch Nth entry of stack (N >= 0)
49590e7b28SMasami Hiramatsu (Google)  $stack        : Fetch stack address.
50590e7b28SMasami Hiramatsu (Google)  $argN         : Fetch the Nth function argument. (N >= 1) (\*2)
51590e7b28SMasami Hiramatsu (Google)  $retval       : Fetch return value.(\*3)
52590e7b28SMasami Hiramatsu (Google)  $comm         : Fetch current task comm.
53590e7b28SMasami Hiramatsu (Google)  +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
54590e7b28SMasami Hiramatsu (Google)  \IMM          : Store an immediate value to the argument.
55590e7b28SMasami Hiramatsu (Google)  NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
56590e7b28SMasami Hiramatsu (Google)  FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
57590e7b28SMasami Hiramatsu (Google)                  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
58590e7b28SMasami Hiramatsu (Google)                  (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
59590e7b28SMasami Hiramatsu (Google)                  and bitfield are supported.
60590e7b28SMasami Hiramatsu (Google)
61590e7b28SMasami Hiramatsu (Google)  (\*1) This is available only when BTF is enabled.
62590e7b28SMasami Hiramatsu (Google)  (\*2) only for the probe on function entry (offs == 0).
63590e7b28SMasami Hiramatsu (Google)  (\*3) only for return probe.
64590e7b28SMasami Hiramatsu (Google)  (\*4) this is useful for fetching a field of data structures.
65590e7b28SMasami Hiramatsu (Google)  (\*5) "u" means user-space dereference.
66590e7b28SMasami Hiramatsu (Google)
67590e7b28SMasami Hiramatsu (Google)For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.
68590e7b28SMasami Hiramatsu (Google)
69590e7b28SMasami Hiramatsu (Google)BTF arguments
70590e7b28SMasami Hiramatsu (Google)-------------
71590e7b28SMasami Hiramatsu (Google)BTF (BPF Type Format) argument allows user to trace function and tracepoint
72590e7b28SMasami Hiramatsu (Google)parameters by its name instead of ``$argN``. This feature is available if the
73590e7b28SMasami Hiramatsu (Google)kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF.
74590e7b28SMasami Hiramatsu (Google)If user only specify the BTF argument, the event's argument name is also
75590e7b28SMasami Hiramatsu (Google)automatically set by the given name. ::
76590e7b28SMasami Hiramatsu (Google)
77590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read count pos' >> dynamic_events
78590e7b28SMasami Hiramatsu (Google) # cat dynamic_events
79590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read count=count pos=pos
80590e7b28SMasami Hiramatsu (Google)
81590e7b28SMasami Hiramatsu (Google)It also chooses the fetch type from BTF information. For example, in the above
82*a2439a4cSMasami Hiramatsu (Google)example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus,
83*a2439a4cSMasami Hiramatsu (Google)both are converted to 64bit unsigned long, but only ``pos`` has "%Lx"
84*a2439a4cSMasami Hiramatsu (Google)print-format as below ::
85590e7b28SMasami Hiramatsu (Google)
86590e7b28SMasami Hiramatsu (Google) # cat events/fprobes/myprobe/format
87590e7b28SMasami Hiramatsu (Google) name: myprobe
88590e7b28SMasami Hiramatsu (Google) ID: 1313
89590e7b28SMasami Hiramatsu (Google) format:
90590e7b28SMasami Hiramatsu (Google)	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
91590e7b28SMasami Hiramatsu (Google)	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
92590e7b28SMasami Hiramatsu (Google)	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
93590e7b28SMasami Hiramatsu (Google)	field:int common_pid;	offset:4;	size:4;	signed:1;
94590e7b28SMasami Hiramatsu (Google)
95590e7b28SMasami Hiramatsu (Google)	field:unsigned long __probe_ip;	offset:8;	size:8;	signed:0;
96590e7b28SMasami Hiramatsu (Google)	field:u64 count;	offset:16;	size:8;	signed:0;
97590e7b28SMasami Hiramatsu (Google)	field:u64 pos;	offset:24;	size:8;	signed:0;
98590e7b28SMasami Hiramatsu (Google)
99590e7b28SMasami Hiramatsu (Google) print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos
100590e7b28SMasami Hiramatsu (Google)
101590e7b28SMasami Hiramatsu (Google)If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*``
102590e7b28SMasami Hiramatsu (Google)is expanded to all function arguments of the function or the tracepoint. ::
103590e7b28SMasami Hiramatsu (Google)
104590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read $arg*' >> dynamic_events
105590e7b28SMasami Hiramatsu (Google) # cat dynamic_events
106590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
107590e7b28SMasami Hiramatsu (Google)
108*a2439a4cSMasami Hiramatsu (Google)BTF also affects the ``$retval``. If user doesn't set any type, the retval
109*a2439a4cSMasami Hiramatsu (Google)type is automatically picked from the BTF. If the function returns ``void``,
110*a2439a4cSMasami Hiramatsu (Google)``$retval`` is rejected.
111*a2439a4cSMasami Hiramatsu (Google)
112*a2439a4cSMasami Hiramatsu (Google)You can access the data fields of a data structure using allow operator ``->``
113*a2439a4cSMasami Hiramatsu (Google)(for pointer type) and dot operator ``.`` (for data structure type.)::
114*a2439a4cSMasami Hiramatsu (Google)
115*a2439a4cSMasami Hiramatsu (Google)# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events
116*a2439a4cSMasami Hiramatsu (Google)
117*a2439a4cSMasami Hiramatsu (Google)The field access operators, ``->`` and ``.`` can be combined for accessing deeper
118*a2439a4cSMasami Hiramatsu (Google)members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux``
119*a2439a4cSMasami Hiramatsu (Google)If there is non-name union member, you can directly access it as the C code does.
120*a2439a4cSMasami Hiramatsu (Google)For example::
121*a2439a4cSMasami Hiramatsu (Google)
122*a2439a4cSMasami Hiramatsu (Google) struct {
123*a2439a4cSMasami Hiramatsu (Google)	union {
124*a2439a4cSMasami Hiramatsu (Google)	int a;
125*a2439a4cSMasami Hiramatsu (Google)	int b;
126*a2439a4cSMasami Hiramatsu (Google)	};
127*a2439a4cSMasami Hiramatsu (Google) } *foo;
128*a2439a4cSMasami Hiramatsu (Google)
129*a2439a4cSMasami Hiramatsu (Google)To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case.
130*a2439a4cSMasami Hiramatsu (Google)
131*a2439a4cSMasami Hiramatsu (Google)This data field access is available for the return value via ``$retval``,
132*a2439a4cSMasami Hiramatsu (Google)e.g. ``$retval->name``.
133*a2439a4cSMasami Hiramatsu (Google)
134*a2439a4cSMasami Hiramatsu (Google)For these BTF arguments and fields, ``:string`` and ``:ustring`` change the
135*a2439a4cSMasami Hiramatsu (Google)behavior. If these are used for BTF argument or field, it checks whether
136*a2439a4cSMasami Hiramatsu (Google)the BTF type of the argument or the data field is ``char *`` or ``char []``,
137*a2439a4cSMasami Hiramatsu (Google)or not.  If not, it rejects applying the string types. Also, with the BTF
138*a2439a4cSMasami Hiramatsu (Google)support, you don't need a memory dereference operator (``+0(PTR)``) for
139*a2439a4cSMasami Hiramatsu (Google)accessing the string pointed by a ``PTR``. It automatically adds the memory
140*a2439a4cSMasami Hiramatsu (Google)dereference operator according to the BTF type. e.g. ::
141*a2439a4cSMasami Hiramatsu (Google)
142*a2439a4cSMasami Hiramatsu (Google)# echo 't sched_switch prev->comm:string' >> dynamic_events
143*a2439a4cSMasami Hiramatsu (Google)# echo 'f getname_flags%return $retval->name:string' >> dynamic_events
144*a2439a4cSMasami Hiramatsu (Google)
145*a2439a4cSMasami Hiramatsu (Google)The ``prev->comm`` is an embedded char array in the data structure, and
146*a2439a4cSMasami Hiramatsu (Google)``$retval->name`` is a char pointer in the data structure. But in both
147*a2439a4cSMasami Hiramatsu (Google)cases, you can use ``:string`` type to get the string.
148*a2439a4cSMasami Hiramatsu (Google)
149590e7b28SMasami Hiramatsu (Google)
150590e7b28SMasami Hiramatsu (Google)Usage examples
151590e7b28SMasami Hiramatsu (Google)--------------
152590e7b28SMasami Hiramatsu (Google)Here is an example to add fprobe events on ``vfs_read()`` function entry
153590e7b28SMasami Hiramatsu (Google)and exit, with BTF arguments.
154590e7b28SMasami Hiramatsu (Google)::
155590e7b28SMasami Hiramatsu (Google)
156590e7b28SMasami Hiramatsu (Google)  # echo 'f vfs_read $arg*' >> dynamic_events
157590e7b28SMasami Hiramatsu (Google)  # echo 'f vfs_read%return $retval' >> dynamic_events
158590e7b28SMasami Hiramatsu (Google)  # cat dynamic_events
159590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos
160590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
161590e7b28SMasami Hiramatsu (Google)  # echo 1 > events/fprobes/enable
162590e7b28SMasami Hiramatsu (Google)  # head -n 20 trace | tail
163590e7b28SMasami Hiramatsu (Google) #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
164590e7b28SMasami Hiramatsu (Google) #              | |         |   |||||     |         |
165590e7b28SMasami Hiramatsu (Google)               sh-70      [000] ...1.   335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
166590e7b28SMasami Hiramatsu (Google)               sh-70      [000] .....   335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
167590e7b28SMasami Hiramatsu (Google)               sh-70      [000] ...1.   335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
168590e7b28SMasami Hiramatsu (Google)               sh-70      [000] .....   335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
169590e7b28SMasami Hiramatsu (Google)               sh-70      [000] ...1.   335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08
170590e7b28SMasami Hiramatsu (Google)               sh-70      [000] .....   335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
171590e7b28SMasami Hiramatsu (Google)               sh-70      [000] ...1.   336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
172590e7b28SMasami Hiramatsu (Google)               sh-70      [000] .....   336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
173590e7b28SMasami Hiramatsu (Google)
174590e7b28SMasami Hiramatsu (Google)You can see all function arguments and return values are recorded as signed int.
175590e7b28SMasami Hiramatsu (Google)
176590e7b28SMasami Hiramatsu (Google)Also, here is an example of tracepoint events on ``sched_switch`` tracepoint.
177590e7b28SMasami Hiramatsu (Google)To compare the result, this also enables the ``sched_switch`` traceevent too.
178590e7b28SMasami Hiramatsu (Google)::
179590e7b28SMasami Hiramatsu (Google)
180590e7b28SMasami Hiramatsu (Google)  # echo 't sched_switch $arg*' >> dynamic_events
181590e7b28SMasami Hiramatsu (Google)  # echo 1 > events/sched/sched_switch/enable
182590e7b28SMasami Hiramatsu (Google)  # echo 1 > events/tracepoints/sched_switch/enable
183590e7b28SMasami Hiramatsu (Google)  # echo > trace
184590e7b28SMasami Hiramatsu (Google)  # head -n 20 trace | tail
185590e7b28SMasami Hiramatsu (Google) #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
186590e7b28SMasami Hiramatsu (Google) #              | |         |   |||||     |         |
187590e7b28SMasami Hiramatsu (Google)               sh-70      [000] d..2.  3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
188590e7b28SMasami Hiramatsu (Google)               sh-70      [000] d..3.  3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1
189590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..2.  3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
190590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..3.  3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
191590e7b28SMasami Hiramatsu (Google)      rcu_preempt-16      [000] d..2.  3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
192590e7b28SMasami Hiramatsu (Google)      rcu_preempt-16      [000] d..3.  3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026
193590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..2.  3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
194590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..3.  3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
195590e7b28SMasami Hiramatsu (Google)
196590e7b28SMasami Hiramatsu (Google)As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on
197590e7b28SMasami Hiramatsu (Google)the other hand, the ``sched_switch`` tracepoint probe event shows *raw*
198590e7b28SMasami Hiramatsu (Google)parameters. This means you can access any field values in the task
199590e7b28SMasami Hiramatsu (Google)structure pointed by the ``prev`` and ``next`` arguments.
200590e7b28SMasami Hiramatsu (Google)
201590e7b28SMasami Hiramatsu (Google)For example, usually ``task_struct::start_time`` is not traced, but with this
202*a2439a4cSMasami Hiramatsu (Google)traceprobe event, you can trace that field as below.
203590e7b28SMasami Hiramatsu (Google)::
204590e7b28SMasami Hiramatsu (Google)
205*a2439a4cSMasami Hiramatsu (Google)  # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
206590e7b28SMasami Hiramatsu (Google)  # head -n 20 trace | tail
207590e7b28SMasami Hiramatsu (Google) #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
208590e7b28SMasami Hiramatsu (Google) #              | |         |   |||||     |         |
209590e7b28SMasami Hiramatsu (Google)               sh-70      [000] d..3.  5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
210590e7b28SMasami Hiramatsu (Google)      rcu_preempt-16      [000] d..3.  5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526
211590e7b28SMasami Hiramatsu (Google)               sh-70      [000] d..3.  5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
212590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..3.  5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
213590e7b28SMasami Hiramatsu (Google)      rcu_preempt-16      [000] d..3.  5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
214590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..3.  5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
215590e7b28SMasami Hiramatsu (Google)      kworker/0:1-14      [000] d..3.  5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
216590e7b28SMasami Hiramatsu (Google)           <idle>-0       [000] d..3.  5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
217