1590e7b28SMasami Hiramatsu (Google).. SPDX-License-Identifier: GPL-2.0 2590e7b28SMasami Hiramatsu (Google) 3590e7b28SMasami Hiramatsu (Google)========================== 4590e7b28SMasami Hiramatsu (Google)Fprobe-based Event Tracing 5590e7b28SMasami Hiramatsu (Google)========================== 6590e7b28SMasami Hiramatsu (Google) 7590e7b28SMasami Hiramatsu (Google).. Author: Masami Hiramatsu <mhiramat@kernel.org> 8590e7b28SMasami Hiramatsu (Google) 9590e7b28SMasami Hiramatsu (Google)Overview 10590e7b28SMasami Hiramatsu (Google)-------- 11590e7b28SMasami Hiramatsu (Google) 12590e7b28SMasami Hiramatsu (Google)Fprobe event is similar to the kprobe event, but limited to probe on 13590e7b28SMasami Hiramatsu (Google)the function entry and exit only. It is good enough for many use cases 14590e7b28SMasami Hiramatsu (Google)which only traces some specific functions. 15590e7b28SMasami Hiramatsu (Google) 16590e7b28SMasami Hiramatsu (Google)This document also covers tracepoint probe events (tprobe) since this 17590e7b28SMasami Hiramatsu (Google)is also works only on the tracepoint entry. User can trace a part of 18590e7b28SMasami Hiramatsu (Google)tracepoint argument, or the tracepoint without trace-event, which is 19590e7b28SMasami Hiramatsu (Google)not exposed on tracefs. 20590e7b28SMasami Hiramatsu (Google) 21590e7b28SMasami Hiramatsu (Google)As same as other dynamic events, fprobe events and tracepoint probe 22590e7b28SMasami Hiramatsu (Google)events are defined via `dynamic_events` interface file on tracefs. 23590e7b28SMasami Hiramatsu (Google) 24590e7b28SMasami Hiramatsu (Google)Synopsis of fprobe-events 25590e7b28SMasami Hiramatsu (Google)------------------------- 26590e7b28SMasami Hiramatsu (Google):: 27590e7b28SMasami Hiramatsu (Google) 28590e7b28SMasami Hiramatsu (Google) f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry 29590e7b28SMasami Hiramatsu (Google) f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit 30590e7b28SMasami Hiramatsu (Google) t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint 31590e7b28SMasami Hiramatsu (Google) 32590e7b28SMasami Hiramatsu (Google) GRP1 : Group name for fprobe. If omitted, use "fprobes" for it. 33590e7b28SMasami Hiramatsu (Google) GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it. 34590e7b28SMasami Hiramatsu (Google) EVENT1 : Event name for fprobe. If omitted, the event name is 35590e7b28SMasami Hiramatsu (Google) "SYM__entry" or "SYM__exit". 36590e7b28SMasami Hiramatsu (Google) EVENT2 : Event name for tprobe. If omitted, the event name is 37590e7b28SMasami Hiramatsu (Google) the same as "TRACEPOINT", but if the "TRACEPOINT" starts 38590e7b28SMasami Hiramatsu (Google) with a digit character, "_TRACEPOINT" is used. 39590e7b28SMasami Hiramatsu (Google) MAXACTIVE : Maximum number of instances of the specified function that 40590e7b28SMasami Hiramatsu (Google) can be probed simultaneously, or 0 for the default value 41a2bd0c08SMasami Hiramatsu (Google) as defined in Documentation/trace/fprobe.rst 42590e7b28SMasami Hiramatsu (Google) 43590e7b28SMasami Hiramatsu (Google) FETCHARGS : Arguments. Each probe can have up to 128 args. 44590e7b28SMasami Hiramatsu (Google) ARG : Fetch "ARG" function argument using BTF (only for function 45590e7b28SMasami Hiramatsu (Google) entry or tracepoint.) (\*1) 46590e7b28SMasami Hiramatsu (Google) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) 47590e7b28SMasami Hiramatsu (Google) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 48590e7b28SMasami Hiramatsu (Google) $stackN : Fetch Nth entry of stack (N >= 0) 49590e7b28SMasami Hiramatsu (Google) $stack : Fetch stack address. 50590e7b28SMasami Hiramatsu (Google) $argN : Fetch the Nth function argument. (N >= 1) (\*2) 51590e7b28SMasami Hiramatsu (Google) $retval : Fetch return value.(\*3) 52590e7b28SMasami Hiramatsu (Google) $comm : Fetch current task comm. 53590e7b28SMasami Hiramatsu (Google) +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5) 54590e7b28SMasami Hiramatsu (Google) \IMM : Store an immediate value to the argument. 55590e7b28SMasami Hiramatsu (Google) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 56590e7b28SMasami Hiramatsu (Google) FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 57590e7b28SMasami Hiramatsu (Google) (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types 58590e7b28SMasami Hiramatsu (Google) (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr" 59590e7b28SMasami Hiramatsu (Google) and bitfield are supported. 60590e7b28SMasami Hiramatsu (Google) 61590e7b28SMasami Hiramatsu (Google) (\*1) This is available only when BTF is enabled. 62590e7b28SMasami Hiramatsu (Google) (\*2) only for the probe on function entry (offs == 0). 63590e7b28SMasami Hiramatsu (Google) (\*3) only for return probe. 64590e7b28SMasami Hiramatsu (Google) (\*4) this is useful for fetching a field of data structures. 65590e7b28SMasami Hiramatsu (Google) (\*5) "u" means user-space dereference. 66590e7b28SMasami Hiramatsu (Google) 67590e7b28SMasami Hiramatsu (Google)For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`. 68590e7b28SMasami Hiramatsu (Google) 69590e7b28SMasami Hiramatsu (Google)BTF arguments 70590e7b28SMasami Hiramatsu (Google)------------- 71590e7b28SMasami Hiramatsu (Google)BTF (BPF Type Format) argument allows user to trace function and tracepoint 72590e7b28SMasami Hiramatsu (Google)parameters by its name instead of ``$argN``. This feature is available if the 73590e7b28SMasami Hiramatsu (Google)kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF. 74590e7b28SMasami Hiramatsu (Google)If user only specify the BTF argument, the event's argument name is also 75590e7b28SMasami Hiramatsu (Google)automatically set by the given name. :: 76590e7b28SMasami Hiramatsu (Google) 77590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read count pos' >> dynamic_events 78590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 79590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read count=count pos=pos 80590e7b28SMasami Hiramatsu (Google) 81590e7b28SMasami Hiramatsu (Google)It also chooses the fetch type from BTF information. For example, in the above 82*a2439a4cSMasami Hiramatsu (Google)example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, 83*a2439a4cSMasami Hiramatsu (Google)both are converted to 64bit unsigned long, but only ``pos`` has "%Lx" 84*a2439a4cSMasami Hiramatsu (Google)print-format as below :: 85590e7b28SMasami Hiramatsu (Google) 86590e7b28SMasami Hiramatsu (Google) # cat events/fprobes/myprobe/format 87590e7b28SMasami Hiramatsu (Google) name: myprobe 88590e7b28SMasami Hiramatsu (Google) ID: 1313 89590e7b28SMasami Hiramatsu (Google) format: 90590e7b28SMasami Hiramatsu (Google) field:unsigned short common_type; offset:0; size:2; signed:0; 91590e7b28SMasami Hiramatsu (Google) field:unsigned char common_flags; offset:2; size:1; signed:0; 92590e7b28SMasami Hiramatsu (Google) field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 93590e7b28SMasami Hiramatsu (Google) field:int common_pid; offset:4; size:4; signed:1; 94590e7b28SMasami Hiramatsu (Google) 95590e7b28SMasami Hiramatsu (Google) field:unsigned long __probe_ip; offset:8; size:8; signed:0; 96590e7b28SMasami Hiramatsu (Google) field:u64 count; offset:16; size:8; signed:0; 97590e7b28SMasami Hiramatsu (Google) field:u64 pos; offset:24; size:8; signed:0; 98590e7b28SMasami Hiramatsu (Google) 99590e7b28SMasami Hiramatsu (Google) print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos 100590e7b28SMasami Hiramatsu (Google) 101590e7b28SMasami Hiramatsu (Google)If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*`` 102590e7b28SMasami Hiramatsu (Google)is expanded to all function arguments of the function or the tracepoint. :: 103590e7b28SMasami Hiramatsu (Google) 104590e7b28SMasami Hiramatsu (Google) # echo 'f:myprobe vfs_read $arg*' >> dynamic_events 105590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 106590e7b28SMasami Hiramatsu (Google) f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos 107590e7b28SMasami Hiramatsu (Google) 108*a2439a4cSMasami Hiramatsu (Google)BTF also affects the ``$retval``. If user doesn't set any type, the retval 109*a2439a4cSMasami Hiramatsu (Google)type is automatically picked from the BTF. If the function returns ``void``, 110*a2439a4cSMasami Hiramatsu (Google)``$retval`` is rejected. 111*a2439a4cSMasami Hiramatsu (Google) 112*a2439a4cSMasami Hiramatsu (Google)You can access the data fields of a data structure using allow operator ``->`` 113*a2439a4cSMasami Hiramatsu (Google)(for pointer type) and dot operator ``.`` (for data structure type.):: 114*a2439a4cSMasami Hiramatsu (Google) 115*a2439a4cSMasami Hiramatsu (Google)# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events 116*a2439a4cSMasami Hiramatsu (Google) 117*a2439a4cSMasami Hiramatsu (Google)The field access operators, ``->`` and ``.`` can be combined for accessing deeper 118*a2439a4cSMasami Hiramatsu (Google)members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux`` 119*a2439a4cSMasami Hiramatsu (Google)If there is non-name union member, you can directly access it as the C code does. 120*a2439a4cSMasami Hiramatsu (Google)For example:: 121*a2439a4cSMasami Hiramatsu (Google) 122*a2439a4cSMasami Hiramatsu (Google) struct { 123*a2439a4cSMasami Hiramatsu (Google) union { 124*a2439a4cSMasami Hiramatsu (Google) int a; 125*a2439a4cSMasami Hiramatsu (Google) int b; 126*a2439a4cSMasami Hiramatsu (Google) }; 127*a2439a4cSMasami Hiramatsu (Google) } *foo; 128*a2439a4cSMasami Hiramatsu (Google) 129*a2439a4cSMasami Hiramatsu (Google)To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case. 130*a2439a4cSMasami Hiramatsu (Google) 131*a2439a4cSMasami Hiramatsu (Google)This data field access is available for the return value via ``$retval``, 132*a2439a4cSMasami Hiramatsu (Google)e.g. ``$retval->name``. 133*a2439a4cSMasami Hiramatsu (Google) 134*a2439a4cSMasami Hiramatsu (Google)For these BTF arguments and fields, ``:string`` and ``:ustring`` change the 135*a2439a4cSMasami Hiramatsu (Google)behavior. If these are used for BTF argument or field, it checks whether 136*a2439a4cSMasami Hiramatsu (Google)the BTF type of the argument or the data field is ``char *`` or ``char []``, 137*a2439a4cSMasami Hiramatsu (Google)or not. If not, it rejects applying the string types. Also, with the BTF 138*a2439a4cSMasami Hiramatsu (Google)support, you don't need a memory dereference operator (``+0(PTR)``) for 139*a2439a4cSMasami Hiramatsu (Google)accessing the string pointed by a ``PTR``. It automatically adds the memory 140*a2439a4cSMasami Hiramatsu (Google)dereference operator according to the BTF type. e.g. :: 141*a2439a4cSMasami Hiramatsu (Google) 142*a2439a4cSMasami Hiramatsu (Google)# echo 't sched_switch prev->comm:string' >> dynamic_events 143*a2439a4cSMasami Hiramatsu (Google)# echo 'f getname_flags%return $retval->name:string' >> dynamic_events 144*a2439a4cSMasami Hiramatsu (Google) 145*a2439a4cSMasami Hiramatsu (Google)The ``prev->comm`` is an embedded char array in the data structure, and 146*a2439a4cSMasami Hiramatsu (Google)``$retval->name`` is a char pointer in the data structure. But in both 147*a2439a4cSMasami Hiramatsu (Google)cases, you can use ``:string`` type to get the string. 148*a2439a4cSMasami Hiramatsu (Google) 149590e7b28SMasami Hiramatsu (Google) 150590e7b28SMasami Hiramatsu (Google)Usage examples 151590e7b28SMasami Hiramatsu (Google)-------------- 152590e7b28SMasami Hiramatsu (Google)Here is an example to add fprobe events on ``vfs_read()`` function entry 153590e7b28SMasami Hiramatsu (Google)and exit, with BTF arguments. 154590e7b28SMasami Hiramatsu (Google):: 155590e7b28SMasami Hiramatsu (Google) 156590e7b28SMasami Hiramatsu (Google) # echo 'f vfs_read $arg*' >> dynamic_events 157590e7b28SMasami Hiramatsu (Google) # echo 'f vfs_read%return $retval' >> dynamic_events 158590e7b28SMasami Hiramatsu (Google) # cat dynamic_events 159590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos 160590e7b28SMasami Hiramatsu (Google) f:fprobes/vfs_read__exit vfs_read%return arg1=$retval 161590e7b28SMasami Hiramatsu (Google) # echo 1 > events/fprobes/enable 162590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 163590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 164590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 165590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 166590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 167590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 168590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 169590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08 170590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 171590e7b28SMasami Hiramatsu (Google) sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08 172590e7b28SMasami Hiramatsu (Google) sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1 173590e7b28SMasami Hiramatsu (Google) 174590e7b28SMasami Hiramatsu (Google)You can see all function arguments and return values are recorded as signed int. 175590e7b28SMasami Hiramatsu (Google) 176590e7b28SMasami Hiramatsu (Google)Also, here is an example of tracepoint events on ``sched_switch`` tracepoint. 177590e7b28SMasami Hiramatsu (Google)To compare the result, this also enables the ``sched_switch`` traceevent too. 178590e7b28SMasami Hiramatsu (Google):: 179590e7b28SMasami Hiramatsu (Google) 180590e7b28SMasami Hiramatsu (Google) # echo 't sched_switch $arg*' >> dynamic_events 181590e7b28SMasami Hiramatsu (Google) # echo 1 > events/sched/sched_switch/enable 182590e7b28SMasami Hiramatsu (Google) # echo 1 > events/tracepoints/sched_switch/enable 183590e7b28SMasami Hiramatsu (Google) # echo > trace 184590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 185590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 186590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 187590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 188590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1 189590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 190590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 191590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120 192590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026 193590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120 194590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0 195590e7b28SMasami Hiramatsu (Google) 196590e7b28SMasami Hiramatsu (Google)As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on 197590e7b28SMasami Hiramatsu (Google)the other hand, the ``sched_switch`` tracepoint probe event shows *raw* 198590e7b28SMasami Hiramatsu (Google)parameters. This means you can access any field values in the task 199590e7b28SMasami Hiramatsu (Google)structure pointed by the ``prev`` and ``next`` arguments. 200590e7b28SMasami Hiramatsu (Google) 201590e7b28SMasami Hiramatsu (Google)For example, usually ``task_struct::start_time`` is not traced, but with this 202*a2439a4cSMasami Hiramatsu (Google)traceprobe event, you can trace that field as below. 203590e7b28SMasami Hiramatsu (Google):: 204590e7b28SMasami Hiramatsu (Google) 205*a2439a4cSMasami Hiramatsu (Google) # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events 206590e7b28SMasami Hiramatsu (Google) # head -n 20 trace | tail 207590e7b28SMasami Hiramatsu (Google) # TASK-PID CPU# ||||| TIMESTAMP FUNCTION 208590e7b28SMasami Hiramatsu (Google) # | | | ||||| | | 209590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 210590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526 211590e7b28SMasami Hiramatsu (Google) sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 212590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000 213590e7b28SMasami Hiramatsu (Google) rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 214590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 215590e7b28SMasami Hiramatsu (Google) kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0 216590e7b28SMasami Hiramatsu (Google) <idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000 217