1.. SPDX-License-Identifier: GPL-2.0 2 3========================== 4Fprobe-based Event Tracing 5========================== 6 7.. Author: Masami Hiramatsu <mhiramat@kernel.org> 8 9Overview 10-------- 11 12Fprobe event is similar to the kprobe event, but limited to probe on 13the function entry and exit only. It is good enough for many use cases 14which only traces some specific functions. 15 16This document also covers tracepoint probe events (tprobe) since this 17is also works only on the tracepoint entry. User can trace a part of 18tracepoint argument, or the tracepoint without trace-event, which is 19not exposed on tracefs. 20 21As same as other dynamic events, fprobe events and tracepoint probe 22events are defined via `dynamic_events` interface file on tracefs. 23 24Synopsis of fprobe-events 25------------------------- 26:: 27 28 f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry 29 f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit 30 t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint 31 32 GRP1 : Group name for fprobe. If omitted, use "fprobes" for it. 33 GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it. 34 EVENT1 : Event name for fprobe. If omitted, the event name is 35 "SYM__entry" or "SYM__exit". 36 EVENT2 : Event name for tprobe. If omitted, the event name is 37 the same as "TRACEPOINT", but if the "TRACEPOINT" starts 38 with a digit character, "_TRACEPOINT" is used. 39 MAXACTIVE : Maximum number of instances of the specified function that 40 can be probed simultaneously, or 0 for the default value 41 as defined in Documentation/trace/fprobe.rst 42 43 FETCHARGS : Arguments. Each probe can have up to 128 args. 44 ARG : Fetch "ARG" function argument using BTF (only for function 45 entry or tracepoint.) (\*1) 46 @ADDR : Fetch memory at ADDR (ADDR should be in kernel) 47 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 48 $stackN : Fetch Nth entry of stack (N >= 0) 49 $stack : Fetch stack address. 50 $argN : Fetch the Nth function argument. (N >= 1) (\*2) 51 $retval : Fetch return value.(\*3) 52 $comm : Fetch current task comm. 53 +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5) 54 \IMM : Store an immediate value to the argument. 55 NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 56 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 57 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types 58 (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr" 59 and bitfield are supported. 60 61 (\*1) This is available only when BTF is enabled. 62 (\*2) only for the probe on function entry (offs == 0). 63 (\*3) only for return probe. 64 (\*4) this is useful for fetching a field of data structures. 65 (\*5) "u" means user-space dereference. 66 67For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`. 68 69BTF arguments 70------------- 71BTF (BPF Type Format) argument allows user to trace function and tracepoint 72parameters by its name instead of ``$argN``. This feature is available if the 73kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF. 74If user only specify the BTF argument, the event's argument name is also 75automatically set by the given name. :: 76 77 # echo 'f:myprobe vfs_read count pos' >> dynamic_events 78 # cat dynamic_events 79 f:fprobes/myprobe vfs_read count=count pos=pos 80 81It also chooses the fetch type from BTF information. For example, in the above 82example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, both 83are converted to 64bit unsigned long, but only ``pos`` has "%Lx" print-format as 84below :: 85 86 # cat events/fprobes/myprobe/format 87 name: myprobe 88 ID: 1313 89 format: 90 field:unsigned short common_type; offset:0; size:2; signed:0; 91 field:unsigned char common_flags; offset:2; size:1; signed:0; 92 field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 93 field:int common_pid; offset:4; size:4; signed:1; 94 95 field:unsigned long __probe_ip; offset:8; size:8; signed:0; 96 field:u64 count; offset:16; size:8; signed:0; 97 field:u64 pos; offset:24; size:8; signed:0; 98 99 print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos 100 101If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*`` 102is expanded to all function arguments of the function or the tracepoint. :: 103 104 # echo 'f:myprobe vfs_read $arg*' >> dynamic_events 105 # cat dynamic_events 106 f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos 107 108BTF also affects the ``$retval``. If user doesn't set any type, the retval type is 109automatically picked from the BTF. If the function returns ``void``, ``$retval`` 110is rejected. 111 112Usage examples 113-------------- 114Here is an example to add fprobe events on ``vfs_read()`` function entry 115and exit, with BTF arguments. 116:: 117 118 # echo 'f vfs_read $arg*' >> dynamic_events 119 # echo 'f vfs_read%return $retval' >> dynamic_events 120 # cat dynamic_events 121 f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos 122 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval 123 # echo 1 > events/fprobes/enable 124 # head -n 20 trace | tail 125 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 126 # | | | ||||| | | 127 sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 128 sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 129 sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 130 sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 131 sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08 132 sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 133 sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 134 sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 135 136You can see all function arguments and return values are recorded as signed int. 137 138Also, here is an example of tracepoint events on ``sched_switch`` tracepoint. 139To compare the result, this also enables the ``sched_switch`` traceevent too. 140:: 141 142 # echo 't sched_switch $arg*' >> dynamic_events 143 # echo 1 > events/sched/sched_switch/enable 144 # echo 1 > events/tracepoints/sched_switch/enable 145 # echo > trace 146 # head -n 20 trace | tail 147 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 148 # | | | ||||| | | 149 sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 150 sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1 151 <idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 152 <idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 153 rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120 154 rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026 155 <idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 156 <idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 157 158As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on 159the other hand, the ``sched_switch`` tracepoint probe event shows *raw* 160parameters. This means you can access any field values in the task 161structure pointed by the ``prev`` and ``next`` arguments. 162 163For example, usually ``task_struct::start_time`` is not traced, but with this 164traceprobe event, you can trace it as below. 165:: 166 167 # echo 't sched_switch comm=+1896(next):string start_time=+1728(next):u64' > dynamic_events 168 # head -n 20 trace | tail 169 # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 170 # | | | ||||| | | 171 sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 172 rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526 173 sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 174 <idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 175 rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 176 <idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 177 kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 178 <idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 179 180Currently, to find the offset of a specific field in the data structure, 181you need to build kernel with debuginfo and run `perf probe` command with 182`-D` option. e.g. 183:: 184 185 # perf probe -D "__probestub_sched_switch next->comm:string next->start_time" 186 p:probe/__probestub_sched_switch __probestub_sched_switch+0 comm=+1896(%cx):string start_time=+1728(%cx):u64 187 188And replace the ``%cx`` with the ``next``. 189