1590e7b28SMasami Hiramatsu (Google).. SPDX-License-Identifier: GPL-2.0 2590e7b28SMasami Hiramatsu (Google) 3590e7b28SMasami Hiramatsu (Google)========================== 4590e7b28SMasami Hiramatsu (Google)Fprobe-based Event Tracing 5590e7b28SMasami Hiramatsu (Google)========================== 6590e7b28SMasami Hiramatsu (Google) 7590e7b28SMasami Hiramatsu (Google).. Author: Masami Hiramatsu <mhiramat@kernel.org> 8590e7b28SMasami Hiramatsu (Google) 9590e7b28SMasami Hiramatsu (Google)Overview 10590e7b28SMasami Hiramatsu (Google)-------- 11590e7b28SMasami Hiramatsu (Google) 12590e7b28SMasami Hiramatsu (Google)Fprobe event is similar to the kprobe event, but limited to probe on 13590e7b28SMasami Hiramatsu (Google)the function entry and exit only. It is good enough for many use cases 14590e7b28SMasami Hiramatsu (Google)which only traces some specific functions. 15590e7b28SMasami Hiramatsu (Google) 16590e7b28SMasami Hiramatsu (Google)This document also covers tracepoint probe events (tprobe) since this 17590e7b28SMasami Hiramatsu (Google)is also works only on the tracepoint entry. User can trace a part of 18590e7b28SMasami Hiramatsu (Google)tracepoint argument, or the tracepoint without trace-event, which is 19590e7b28SMasami Hiramatsu (Google)not exposed on tracefs. 20590e7b28SMasami Hiramatsu (Google) 21590e7b28SMasami Hiramatsu (Google)As same as other dynamic events, fprobe events and tracepoint probe 22590e7b28SMasami Hiramatsu (Google)events are defined via `dynamic_events` interface file on tracefs. 23590e7b28SMasami Hiramatsu (Google) 24590e7b28SMasami Hiramatsu (Google)Synopsis of fprobe-events 25590e7b28SMasami Hiramatsu (Google)------------------------- 26590e7b28SMasami Hiramatsu (Google):: 27590e7b28SMasami Hiramatsu (Google) 28590e7b28SMasami Hiramatsu (Google) f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry 29590e7b28SMasami Hiramatsu (Google) f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit 30590e7b28SMasami Hiramatsu (Google) t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint 31590e7b28SMasami Hiramatsu (Google) 32590e7b28SMasami Hiramatsu (Google) GRP1 : Group name for fprobe. If omitted, use "fprobes" for it. 33590e7b28SMasami Hiramatsu (Google) GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it. 34590e7b28SMasami Hiramatsu (Google) EVENT1 : Event name for fprobe. If omitted, the event name is 35590e7b28SMasami Hiramatsu (Google) "SYM__entry" or "SYM__exit". 36590e7b28SMasami Hiramatsu (Google) EVENT2 : Event name for tprobe. If omitted, the event name is 37590e7b28SMasami Hiramatsu (Google) the same as "TRACEPOINT", but if the "TRACEPOINT" starts 38590e7b28SMasami Hiramatsu (Google) with a digit character, "_TRACEPOINT" is used. 39590e7b28SMasami Hiramatsu (Google) MAXACTIVE : Maximum number of instances of the specified function that 40590e7b28SMasami Hiramatsu (Google) can be probed simultaneously, or 0 for the default value 41*a2bd0c08SMasami Hiramatsu (Google) as defined in Documentation/trace/fprobe.rst 42590e7b28SMasami Hiramatsu (Google) 43590e7b28SMasami Hiramatsu (Google) FETCHARGS : Arguments. Each probe can have up to 128 args. 44590e7b28SMasami Hiramatsu (Google) ARG : Fetch "ARG" function argument using BTF (only for function 45590e7b28SMasami Hiramatsu (Google) entry or tracepoint.) (\*1) 46590e7b28SMasami Hiramatsu (Google) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) 47590e7b28SMasami Hiramatsu (Google) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 48590e7b28SMasami Hiramatsu (Google) $stackN : Fetch Nth entry of stack (N >= 0) 49590e7b28SMasami Hiramatsu (Google) $stack : Fetch stack address. 50590e7b28SMasami Hiramatsu (Google) $argN : Fetch the Nth function argument. (N >= 1) (\*2) 51590e7b28SMasami Hiramatsu (Google) $retval : Fetch return value.(\*3) 52590e7b28SMasami Hiramatsu (Google) $comm : Fetch current task comm. 53590e7b28SMasami Hiramatsu (Google) +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5) 54590e7b28SMasami Hiramatsu (Google) \IMM : Store an immediate value to the argument. 55590e7b28SMasami Hiramatsu (Google) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 56590e7b28SMasami Hiramatsu (Google) FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 57590e7b28SMasami Hiramatsu (Google) (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types 58590e7b28SMasami Hiramatsu (Google) (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr" 59590e7b28SMasami Hiramatsu (Google) and bitfield are supported. 60590e7b28SMasami Hiramatsu (Google) 61590e7b28SMasami Hiramatsu (Google) (\*1) This is available only when BTF is enabled. 62590e7b28SMasami Hiramatsu (Google) (\*2) only for the probe on function entry (offs == 0). 63590e7b28SMasami Hiramatsu (Google) (\*3) only for return probe. 64590e7b28SMasami Hiramatsu (Google) (\*4) this is useful for fetching a field of data structures. 65590e7b28SMasami Hiramatsu (Google) (\*5) "u" means user-space dereference. 66590e7b28SMasami Hiramatsu (Google) 67590e7b28SMasami Hiramatsu (Google)For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`. 68590e7b28SMasami Hiramatsu (Google) 69590e7b28SMasami Hiramatsu (Google)BTF arguments 70590e7b28SMasami Hiramatsu (Google)------------- 71590e7b28SMasami Hiramatsu (Google)BTF (BPF Type Format) argument allows user to trace function and tracepoint 72590e7b28SMasami Hiramatsu (Google)parameters by its name instead of ``$argN``. This feature is available if the 73590e7b28SMasami Hiramatsu (Google)kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF. 74590e7b28SMasami Hiramatsu (Google)If user only specify the BTF argument, the event's argument name is also 75590e7b28SMasami Hiramatsu (Google)automatically set by the given name. :: 76590e7b28SMasami Hiramatsu (Google) 77590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read count pos' >> dynamic_events 78590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 79590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read count=count pos=pos 80590e7b28SMasami Hiramatsu (Google) 81590e7b28SMasami Hiramatsu (Google)It also chooses the fetch type from BTF information. For example, in the above 82590e7b28SMasami Hiramatsu (Google)example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, both 83590e7b28SMasami Hiramatsu (Google)are converted to 64bit unsigned long, but only ``pos`` has "%Lx" print-format as 84590e7b28SMasami Hiramatsu (Google)below :: 85590e7b28SMasami Hiramatsu (Google) 86590e7b28SMasami Hiramatsu (Google) # cat events/fprobes/myprobe/format 87590e7b28SMasami Hiramatsu (Google) name: myprobe 88590e7b28SMasami Hiramatsu (Google) ID: 1313 89590e7b28SMasami Hiramatsu (Google) format: 90590e7b28SMasami Hiramatsu (Google) field:unsigned short common_type; offset:0; size:2; signed:0; 91590e7b28SMasami Hiramatsu (Google) field:unsigned char common_flags; offset:2; size:1; signed:0; 92590e7b28SMasami Hiramatsu (Google) field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 93590e7b28SMasami Hiramatsu (Google) field:int common_pid; offset:4; size:4; signed:1; 94590e7b28SMasami Hiramatsu (Google) 95590e7b28SMasami Hiramatsu (Google) field:unsigned long __probe_ip; offset:8; size:8; signed:0; 96590e7b28SMasami Hiramatsu (Google) field:u64 count; offset:16; size:8; signed:0; 97590e7b28SMasami Hiramatsu (Google) field:u64 pos; offset:24; size:8; signed:0; 98590e7b28SMasami Hiramatsu (Google) 99590e7b28SMasami Hiramatsu (Google) print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos 100590e7b28SMasami Hiramatsu (Google) 101590e7b28SMasami Hiramatsu (Google)If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*`` 102590e7b28SMasami Hiramatsu (Google)is expanded to all function arguments of the function or the tracepoint. :: 103590e7b28SMasami Hiramatsu (Google) 104590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read $arg*' >> dynamic_events 105590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 106590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos 107590e7b28SMasami Hiramatsu (Google) 108590e7b28SMasami Hiramatsu (Google)BTF also affects the ``$retval``. If user doesn't set any type, the retval type is 109590e7b28SMasami Hiramatsu (Google)automatically picked from the BTF. If the function returns ``void``, ``$retval`` 110590e7b28SMasami Hiramatsu (Google)is rejected. 111590e7b28SMasami Hiramatsu (Google) 112590e7b28SMasami Hiramatsu (Google)Usage examples 113590e7b28SMasami Hiramatsu (Google)-------------- 114590e7b28SMasami Hiramatsu (Google)Here is an example to add fprobe events on ``vfs_read()`` function entry 115590e7b28SMasami Hiramatsu (Google)and exit, with BTF arguments. 116590e7b28SMasami Hiramatsu (Google):: 117590e7b28SMasami Hiramatsu (Google) 118590e7b28SMasami Hiramatsu (Google) # echo 'f vfs_read $arg*' >> dynamic_events 119590e7b28SMasami Hiramatsu (Google) # echo 'f vfs_read%return $retval' >> dynamic_events 120590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 121590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos 122590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__exit vfs_read%return arg1=$retval 123590e7b28SMasami Hiramatsu (Google) # echo 1 > events/fprobes/enable 124590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 125590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 126590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 127590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 128590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 129590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 130590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 131590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08 132590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 133590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 134590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 135590e7b28SMasami Hiramatsu (Google) 136590e7b28SMasami Hiramatsu (Google)You can see all function arguments and return values are recorded as signed int. 137590e7b28SMasami Hiramatsu (Google) 138590e7b28SMasami Hiramatsu (Google)Also, here is an example of tracepoint events on ``sched_switch`` tracepoint. 139590e7b28SMasami Hiramatsu (Google)To compare the result, this also enables the ``sched_switch`` traceevent too. 140590e7b28SMasami Hiramatsu (Google):: 141590e7b28SMasami Hiramatsu (Google) 142590e7b28SMasami Hiramatsu (Google) # echo 't sched_switch $arg*' >> dynamic_events 143590e7b28SMasami Hiramatsu (Google) # echo 1 > events/sched/sched_switch/enable 144590e7b28SMasami Hiramatsu (Google) # echo 1 > events/tracepoints/sched_switch/enable 145590e7b28SMasami Hiramatsu (Google) # echo > trace 146590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 147590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 148590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 149590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 150590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1 151590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 152590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 153590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120 154590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026 155590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 156590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 157590e7b28SMasami Hiramatsu (Google) 158590e7b28SMasami Hiramatsu (Google)As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on 159590e7b28SMasami Hiramatsu (Google)the other hand, the ``sched_switch`` tracepoint probe event shows *raw* 160590e7b28SMasami Hiramatsu (Google)parameters. This means you can access any field values in the task 161590e7b28SMasami Hiramatsu (Google)structure pointed by the ``prev`` and ``next`` arguments. 162590e7b28SMasami Hiramatsu (Google) 163590e7b28SMasami Hiramatsu (Google)For example, usually ``task_struct::start_time`` is not traced, but with this 164590e7b28SMasami Hiramatsu (Google)traceprobe event, you can trace it as below. 165590e7b28SMasami Hiramatsu (Google):: 166590e7b28SMasami Hiramatsu (Google) 167590e7b28SMasami Hiramatsu (Google) # echo 't sched_switch comm=+1896(next):string start_time=+1728(next):u64' > dynamic_events 168590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 169590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 170590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 171590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 172590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526 173590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 174590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 175590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 176590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 177590e7b28SMasami Hiramatsu (Google) kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 178590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 179590e7b28SMasami Hiramatsu (Google) 180590e7b28SMasami Hiramatsu (Google)Currently, to find the offset of a specific field in the data structure, 181590e7b28SMasami Hiramatsu (Google)you need to build kernel with debuginfo and run `perf probe` command with 182590e7b28SMasami Hiramatsu (Google)`-D` option. e.g. 183590e7b28SMasami Hiramatsu (Google):: 184590e7b28SMasami Hiramatsu (Google) 185590e7b28SMasami Hiramatsu (Google) # perf probe -D "__probestub_sched_switch next->comm:string next->start_time" 186590e7b28SMasami Hiramatsu (Google) p:probe/__probestub_sched_switch __probestub_sched_switch+0 comm=+1896(%cx):string start_time=+1728(%cx):u64 187590e7b28SMasami Hiramatsu (Google) 188590e7b28SMasami Hiramatsu (Google)And replace the ``%cx`` with the ``next``. 189