1================================= 2Using ftrace to hook to functions 3================================= 4 5.. Copyright 2017 VMware Inc. 6.. Author: Steven Rostedt <srostedt@goodmis.org> 7.. License: The GNU Free Documentation License, Version 1.2 8.. (dual licensed under the GPL v2) 9 10Written for: 4.14 11 12Introduction 13============ 14 15The ftrace infrastructure was originally created to attach callbacks to the 16beginning of functions in order to record and trace the flow of the kernel. 17But callbacks to the start of a function can have other use cases. Either 18for live kernel patching, or for security monitoring. This document describes 19how to use ftrace to implement your own function callbacks. 20 21 22The ftrace context 23================== 24.. warning:: 25 26 The ability to add a callback to almost any function within the 27 kernel comes with risks. A callback can be called from any context 28 (normal, softirq, irq, and NMI). Callbacks can also be called just before 29 going to idle, during CPU bring up and takedown, or going to user space. 30 This requires extra care to what can be done inside a callback. A callback 31 can be called outside the protective scope of RCU. 32 33The ftrace infrastructure has some protections against recursions and RCU 34but one must still be very careful how they use the callbacks. 35 36 37The ftrace_ops structure 38======================== 39 40To register a function callback, a ftrace_ops is required. This structure 41is used to tell ftrace what function should be called as the callback 42as well as what protections the callback will perform and not require 43ftrace to handle. 44 45There is only one field that is needed to be set when registering 46an ftrace_ops with ftrace: 47 48.. code-block:: c 49 50 struct ftrace_ops ops = { 51 .func = my_callback_func, 52 .flags = MY_FTRACE_FLAGS 53 .private = any_private_data_structure, 54 }; 55 56Both .flags and .private are optional. Only .func is required. 57 58To enable tracing call:: 59 60 register_ftrace_function(&ops); 61 62To disable tracing call:: 63 64 unregister_ftrace_function(&ops); 65 66The above is defined by including the header:: 67 68 #include <linux/ftrace.h> 69 70The registered callback will start being called some time after the 71register_ftrace_function() is called and before it returns. The exact time 72that callbacks start being called is dependent upon architecture and scheduling 73of services. The callback itself will have to handle any synchronization if it 74must begin at an exact moment. 75 76The unregister_ftrace_function() will guarantee that the callback is 77no longer being called by functions after the unregister_ftrace_function() 78returns. Note that to perform this guarantee, the unregister_ftrace_function() 79may take some time to finish. 80 81 82The callback function 83===================== 84 85The prototype of the callback function is as follows (as of v4.14): 86 87.. code-block:: c 88 89 void callback_func(unsigned long ip, unsigned long parent_ip, 90 struct ftrace_ops *op, struct pt_regs *regs); 91 92@ip 93 This is the instruction pointer of the function that is being traced. 94 (where the fentry or mcount is within the function) 95 96@parent_ip 97 This is the instruction pointer of the function that called the 98 the function being traced (where the call of the function occurred). 99 100@op 101 This is a pointer to ftrace_ops that was used to register the callback. 102 This can be used to pass data to the callback via the private pointer. 103 104@regs 105 If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 106 flags are set in the ftrace_ops structure, then this will be pointing 107 to the pt_regs structure like it would be if an breakpoint was placed 108 at the start of the function where ftrace was tracing. Otherwise it 109 either contains garbage, or NULL. 110 111 112The ftrace FLAGS 113================ 114 115The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. 116Some of the flags are used for internal infrastructure of ftrace, but the 117ones that users should be aware of are the following: 118 119FTRACE_OPS_FL_SAVE_REGS 120 If the callback requires reading or modifying the pt_regs 121 passed to the callback, then it must set this flag. Registering 122 a ftrace_ops with this flag set on an architecture that does not 123 support passing of pt_regs to the callback will fail. 124 125FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 126 Similar to SAVE_REGS but the registering of a 127 ftrace_ops on an architecture that does not support passing of regs 128 will not fail with this flag set. But the callback must check if 129 regs is NULL or not to determine if the architecture supports it. 130 131FTRACE_OPS_FL_RECURSION_SAFE 132 By default, a wrapper is added around the callback to 133 make sure that recursion of the function does not occur. That is, 134 if a function that is called as a result of the callback's execution 135 is also traced, ftrace will prevent the callback from being called 136 again. But this wrapper adds some overhead, and if the callback is 137 safe from recursion, it can set this flag to disable the ftrace 138 protection. 139 140 Note, if this flag is set, and recursion does occur, it could cause 141 the system to crash, and possibly reboot via a triple fault. 142 143 It is OK if another callback traces a function that is called by a 144 callback that is marked recursion safe. Recursion safe callbacks 145 must never trace any function that are called by the callback 146 itself or any nested functions that those functions call. 147 148 If this flag is set, it is possible that the callback will also 149 be called with preemption enabled (when CONFIG_PREEMPTION is set), 150 but this is not guaranteed. 151 152FTRACE_OPS_FL_IPMODIFY 153 Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" 154 the traced function (have another function called instead of the 155 traced function), it requires setting this flag. This is what live 156 kernel patches uses. Without this flag the pt_regs->ip can not be 157 modified. 158 159 Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be 160 registered to any given function at a time. 161 162FTRACE_OPS_FL_RCU 163 If this is set, then the callback will only be called by functions 164 where RCU is "watching". This is required if the callback function 165 performs any rcu_read_lock() operation. 166 167 RCU stops watching when the system goes idle, the time when a CPU 168 is taken down and comes back online, and when entering from kernel 169 to user space and back to kernel space. During these transitions, 170 a callback may be executed and RCU synchronization will not protect 171 it. 172 173FTRACE_OPS_FL_PERMANENT 174 If this is set on any ftrace ops, then the tracing cannot disabled by 175 writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with 176 the flag set cannot be registered if ftrace_enabled is 0. 177 178 Livepatch uses it not to lose the function redirection, so the system 179 stays protected. 180 181 182Filtering which functions to trace 183================================== 184 185If a callback is only to be called from specific functions, a filter must be 186set up. The filters are added by name, or ip if it is known. 187 188.. code-block:: c 189 190 int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, 191 int len, int reset); 192 193@ops 194 The ops to set the filter with 195 196@buf 197 The string that holds the function filter text. 198@len 199 The length of the string. 200 201@reset 202 Non-zero to reset all filters before applying this filter. 203 204Filters denote which functions should be enabled when tracing is enabled. 205If @buf is NULL and reset is set, all functions will be enabled for tracing. 206 207The @buf can also be a glob expression to enable all functions that 208match a specific pattern. 209 210See Filter Commands in :file:`Documentation/trace/ftrace.rst`. 211 212To just trace the schedule function: 213 214.. code-block:: c 215 216 ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); 217 218To add more functions, call the ftrace_set_filter() more than once with the 219@reset parameter set to zero. To remove the current filter set and replace it 220with new functions defined by @buf, have @reset be non-zero. 221 222To remove all the filtered functions and trace all functions: 223 224.. code-block:: c 225 226 ret = ftrace_set_filter(&ops, NULL, 0, 1); 227 228 229Sometimes more than one function has the same name. To trace just a specific 230function in this case, ftrace_set_filter_ip() can be used. 231 232.. code-block:: c 233 234 ret = ftrace_set_filter_ip(&ops, ip, 0, 0); 235 236Although the ip must be the address where the call to fentry or mcount is 237located in the function. This function is used by perf and kprobes that 238gets the ip address from the user (usually using debug info from the kernel). 239 240If a glob is used to set the filter, functions can be added to a "notrace" 241list that will prevent those functions from calling the callback. 242The "notrace" list takes precedence over the "filter" list. If the 243two lists are non-empty and contain the same functions, the callback will not 244be called by any function. 245 246An empty "notrace" list means to allow all functions defined by the filter 247to be traced. 248 249.. code-block:: c 250 251 int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, 252 int len, int reset); 253 254This takes the same parameters as ftrace_set_filter() but will add the 255functions it finds to not be traced. This is a separate list from the 256filter list, and this function does not modify the filter list. 257 258A non-zero @reset will clear the "notrace" list before adding functions 259that match @buf to it. 260 261Clearing the "notrace" list is the same as clearing the filter list 262 263.. code-block:: c 264 265 ret = ftrace_set_notrace(&ops, NULL, 0, 1); 266 267The filter and notrace lists may be changed at any time. If only a set of 268functions should call the callback, it is best to set the filters before 269registering the callback. But the changes may also happen after the callback 270has been registered. 271 272If a filter is in place, and the @reset is non-zero, and @buf contains a 273matching glob to functions, the switch will happen during the time of 274the ftrace_set_filter() call. At no time will all functions call the callback. 275 276.. code-block:: c 277 278 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 279 280 register_ftrace_function(&ops); 281 282 msleep(10); 283 284 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); 285 286is not the same as: 287 288.. code-block:: c 289 290 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 291 292 register_ftrace_function(&ops); 293 294 msleep(10); 295 296 ftrace_set_filter(&ops, NULL, 0, 1); 297 298 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); 299 300As the latter will have a short time where all functions will call 301the callback, between the time of the reset, and the time of the 302new setting of the filter. 303