1================================= 2Using ftrace to hook to functions 3================================= 4 5.. Copyright 2017 VMware Inc. 6.. Author: Steven Rostedt <srostedt@goodmis.org> 7.. License: The GNU Free Documentation License, Version 1.2 8.. (dual licensed under the GPL v2) 9 10Written for: 4.14 11 12Introduction 13============ 14 15The ftrace infrastructure was originally created to attach callbacks to the 16beginning of functions in order to record and trace the flow of the kernel. 17But callbacks to the start of a function can have other use cases. Either 18for live kernel patching, or for security monitoring. This document describes 19how to use ftrace to implement your own function callbacks. 20 21 22The ftrace context 23================== 24.. warning:: 25 26 The ability to add a callback to almost any function within the 27 kernel comes with risks. A callback can be called from any context 28 (normal, softirq, irq, and NMI). Callbacks can also be called just before 29 going to idle, during CPU bring up and takedown, or going to user space. 30 This requires extra care to what can be done inside a callback. A callback 31 can be called outside the protective scope of RCU. 32 33There are helper functions to help against recursion, and making sure 34RCU is watching. These are explained below. 35 36 37The ftrace_ops structure 38======================== 39 40To register a function callback, a ftrace_ops is required. This structure 41is used to tell ftrace what function should be called as the callback 42as well as what protections the callback will perform and not require 43ftrace to handle. 44 45There is only one field that is needed to be set when registering 46an ftrace_ops with ftrace: 47 48.. code-block:: c 49 50 struct ftrace_ops ops = { 51 .func = my_callback_func, 52 .flags = MY_FTRACE_FLAGS 53 .private = any_private_data_structure, 54 }; 55 56Both .flags and .private are optional. Only .func is required. 57 58To enable tracing call:: 59 60 register_ftrace_function(&ops); 61 62To disable tracing call:: 63 64 unregister_ftrace_function(&ops); 65 66The above is defined by including the header:: 67 68 #include <linux/ftrace.h> 69 70The registered callback will start being called some time after the 71register_ftrace_function() is called and before it returns. The exact time 72that callbacks start being called is dependent upon architecture and scheduling 73of services. The callback itself will have to handle any synchronization if it 74must begin at an exact moment. 75 76The unregister_ftrace_function() will guarantee that the callback is 77no longer being called by functions after the unregister_ftrace_function() 78returns. Note that to perform this guarantee, the unregister_ftrace_function() 79may take some time to finish. 80 81 82The callback function 83===================== 84 85The prototype of the callback function is as follows (as of v4.14): 86 87.. code-block:: c 88 89 void callback_func(unsigned long ip, unsigned long parent_ip, 90 struct ftrace_ops *op, struct pt_regs *regs); 91 92@ip 93 This is the instruction pointer of the function that is being traced. 94 (where the fentry or mcount is within the function) 95 96@parent_ip 97 This is the instruction pointer of the function that called the 98 the function being traced (where the call of the function occurred). 99 100@op 101 This is a pointer to ftrace_ops that was used to register the callback. 102 This can be used to pass data to the callback via the private pointer. 103 104@regs 105 If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 106 flags are set in the ftrace_ops structure, then this will be pointing 107 to the pt_regs structure like it would be if an breakpoint was placed 108 at the start of the function where ftrace was tracing. Otherwise it 109 either contains garbage, or NULL. 110 111Protect your callback 112===================== 113 114As functions can be called from anywhere, and it is possible that a function 115called by a callback may also be traced, and call that same callback, 116recursion protection must be used. There are two helper functions that 117can help in this regard. If you start your code with: 118 119.. code-block:: c 120 121 int bit; 122 123 bit = ftrace_test_recursion_trylock(ip, parent_ip); 124 if (bit < 0) 125 return; 126 127and end it with: 128 129.. code-block:: c 130 131 ftrace_test_recursion_unlock(bit); 132 133The code in between will be safe to use, even if it ends up calling a 134function that the callback is tracing. Note, on success, 135ftrace_test_recursion_trylock() will disable preemption, and the 136ftrace_test_recursion_unlock() will enable it again (if it was previously 137enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to 138ftrace_test_recursion_trylock() to record where the recursion happened 139(if CONFIG_FTRACE_RECORD_RECURSION is set). 140 141Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops 142(as explained below), then a helper trampoline will be used to test 143for recursion for the callback and no recursion test needs to be done. 144But this is at the expense of a slightly more overhead from an extra 145function call. 146 147If your callback accesses any data or critical section that requires RCU 148protection, it is best to make sure that RCU is "watching", otherwise 149that data or critical section will not be protected as expected. In this 150case add: 151 152.. code-block:: c 153 154 if (!rcu_is_watching()) 155 return; 156 157Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops 158(as explained below), then a helper trampoline will be used to test 159for rcu_is_watching for the callback and no other test needs to be done. 160But this is at the expense of a slightly more overhead from an extra 161function call. 162 163 164The ftrace FLAGS 165================ 166 167The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. 168Some of the flags are used for internal infrastructure of ftrace, but the 169ones that users should be aware of are the following: 170 171FTRACE_OPS_FL_SAVE_REGS 172 If the callback requires reading or modifying the pt_regs 173 passed to the callback, then it must set this flag. Registering 174 a ftrace_ops with this flag set on an architecture that does not 175 support passing of pt_regs to the callback will fail. 176 177FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 178 Similar to SAVE_REGS but the registering of a 179 ftrace_ops on an architecture that does not support passing of regs 180 will not fail with this flag set. But the callback must check if 181 regs is NULL or not to determine if the architecture supports it. 182 183FTRACE_OPS_FL_RECURSION 184 By default, it is expected that the callback can handle recursion. 185 But if the callback is not that worried about overehead, then 186 setting this bit will add the recursion protection around the 187 callback by calling a helper function that will do the recursion 188 protection and only call the callback if it did not recurse. 189 190 Note, if this flag is not set, and recursion does occur, it could 191 cause the system to crash, and possibly reboot via a triple fault. 192 193 Not, if this flag is set, then the callback will always be called 194 with preemption disabled. If it is not set, then it is possible 195 (but not guaranteed) that the callback will be called in 196 preemptable context. 197 198FTRACE_OPS_FL_IPMODIFY 199 Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" 200 the traced function (have another function called instead of the 201 traced function), it requires setting this flag. This is what live 202 kernel patches uses. Without this flag the pt_regs->ip can not be 203 modified. 204 205 Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be 206 registered to any given function at a time. 207 208FTRACE_OPS_FL_RCU 209 If this is set, then the callback will only be called by functions 210 where RCU is "watching". This is required if the callback function 211 performs any rcu_read_lock() operation. 212 213 RCU stops watching when the system goes idle, the time when a CPU 214 is taken down and comes back online, and when entering from kernel 215 to user space and back to kernel space. During these transitions, 216 a callback may be executed and RCU synchronization will not protect 217 it. 218 219FTRACE_OPS_FL_PERMANENT 220 If this is set on any ftrace ops, then the tracing cannot disabled by 221 writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with 222 the flag set cannot be registered if ftrace_enabled is 0. 223 224 Livepatch uses it not to lose the function redirection, so the system 225 stays protected. 226 227 228Filtering which functions to trace 229================================== 230 231If a callback is only to be called from specific functions, a filter must be 232set up. The filters are added by name, or ip if it is known. 233 234.. code-block:: c 235 236 int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, 237 int len, int reset); 238 239@ops 240 The ops to set the filter with 241 242@buf 243 The string that holds the function filter text. 244@len 245 The length of the string. 246 247@reset 248 Non-zero to reset all filters before applying this filter. 249 250Filters denote which functions should be enabled when tracing is enabled. 251If @buf is NULL and reset is set, all functions will be enabled for tracing. 252 253The @buf can also be a glob expression to enable all functions that 254match a specific pattern. 255 256See Filter Commands in :file:`Documentation/trace/ftrace.rst`. 257 258To just trace the schedule function: 259 260.. code-block:: c 261 262 ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); 263 264To add more functions, call the ftrace_set_filter() more than once with the 265@reset parameter set to zero. To remove the current filter set and replace it 266with new functions defined by @buf, have @reset be non-zero. 267 268To remove all the filtered functions and trace all functions: 269 270.. code-block:: c 271 272 ret = ftrace_set_filter(&ops, NULL, 0, 1); 273 274 275Sometimes more than one function has the same name. To trace just a specific 276function in this case, ftrace_set_filter_ip() can be used. 277 278.. code-block:: c 279 280 ret = ftrace_set_filter_ip(&ops, ip, 0, 0); 281 282Although the ip must be the address where the call to fentry or mcount is 283located in the function. This function is used by perf and kprobes that 284gets the ip address from the user (usually using debug info from the kernel). 285 286If a glob is used to set the filter, functions can be added to a "notrace" 287list that will prevent those functions from calling the callback. 288The "notrace" list takes precedence over the "filter" list. If the 289two lists are non-empty and contain the same functions, the callback will not 290be called by any function. 291 292An empty "notrace" list means to allow all functions defined by the filter 293to be traced. 294 295.. code-block:: c 296 297 int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, 298 int len, int reset); 299 300This takes the same parameters as ftrace_set_filter() but will add the 301functions it finds to not be traced. This is a separate list from the 302filter list, and this function does not modify the filter list. 303 304A non-zero @reset will clear the "notrace" list before adding functions 305that match @buf to it. 306 307Clearing the "notrace" list is the same as clearing the filter list 308 309.. code-block:: c 310 311 ret = ftrace_set_notrace(&ops, NULL, 0, 1); 312 313The filter and notrace lists may be changed at any time. If only a set of 314functions should call the callback, it is best to set the filters before 315registering the callback. But the changes may also happen after the callback 316has been registered. 317 318If a filter is in place, and the @reset is non-zero, and @buf contains a 319matching glob to functions, the switch will happen during the time of 320the ftrace_set_filter() call. At no time will all functions call the callback. 321 322.. code-block:: c 323 324 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 325 326 register_ftrace_function(&ops); 327 328 msleep(10); 329 330 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); 331 332is not the same as: 333 334.. code-block:: c 335 336 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 337 338 register_ftrace_function(&ops); 339 340 msleep(10); 341 342 ftrace_set_filter(&ops, NULL, 0, 1); 343 344 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); 345 346As the latter will have a short time where all functions will call 347the callback, between the time of the reset, and the time of the 348new setting of the filter. 349