1================================= 2Using ftrace to hook to functions 3================================= 4 5.. Copyright 2017 VMware Inc. 6.. Author: Steven Rostedt <srostedt@goodmis.org> 7.. License: The GNU Free Documentation License, Version 1.2 8.. (dual licensed under the GPL v2) 9 10Written for: 4.14 11 12Introduction 13============ 14 15The ftrace infrastructure was originially created to attach callbacks to the 16beginning of functions in order to record and trace the flow of the kernel. 17But callbacks to the start of a function can have other use cases. Either 18for live kernel patching, or for security monitoring. This document describes 19how to use ftrace to implement your own function callbacks. 20 21 22The ftrace context 23================== 24 25WARNING: The ability to add a callback to almost any function within the 26kernel comes with risks. A callback can be called from any context 27(normal, softirq, irq, and NMI). Callbacks can also be called just before 28going to idle, during CPU bring up and takedown, or going to user space. 29This requires extra care to what can be done inside a callback. A callback 30can be called outside the protective scope of RCU. 31 32The ftrace infrastructure has some protections agains recursions and RCU 33but one must still be very careful how they use the callbacks. 34 35 36The ftrace_ops structure 37======================== 38 39To register a function callback, a ftrace_ops is required. This structure 40is used to tell ftrace what function should be called as the callback 41as well as what protections the callback will perform and not require 42ftrace to handle. 43 44There is only one field that is needed to be set when registering 45an ftrace_ops with ftrace:: 46 47.. code-block: c 48 49 struct ftrace_ops ops = { 50 .func = my_callback_func, 51 .flags = MY_FTRACE_FLAGS 52 .private = any_private_data_structure, 53 }; 54 55Both .flags and .private are optional. Only .func is required. 56 57To enable tracing call:: 58 59.. c:function:: register_ftrace_function(&ops); 60 61To disable tracing call:: 62 63.. c:function:: unregister_ftrace_function(&ops); 64 65The above is defined by including the header:: 66 67.. c:function:: #include <linux/ftrace.h> 68 69The registered callback will start being called some time after the 70register_ftrace_function() is called and before it returns. The exact time 71that callbacks start being called is dependent upon architecture and scheduling 72of services. The callback itself will have to handle any synchronization if it 73must begin at an exact moment. 74 75The unregister_ftrace_function() will guarantee that the callback is 76no longer being called by functions after the unregister_ftrace_function() 77returns. Note that to perform this guarantee, the unregister_ftrace_function() 78may take some time to finish. 79 80 81The callback function 82===================== 83 84The prototype of the callback function is as follows (as of v4.14):: 85 86.. code-block: c 87 88 void callback_func(unsigned long ip, unsigned long parent_ip, 89 struct ftrace_ops *op, struct pt_regs *regs); 90 91@ip 92 This is the instruction pointer of the function that is being traced. 93 (where the fentry or mcount is within the function) 94 95@parent_ip 96 This is the instruction pointer of the function that called the 97 the function being traced (where the call of the function occurred). 98 99@op 100 This is a pointer to ftrace_ops that was used to register the callback. 101 This can be used to pass data to the callback via the private pointer. 102 103@regs 104 If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 105 flags are set in the ftrace_ops structure, then this will be pointing 106 to the pt_regs structure like it would be if an breakpoint was placed 107 at the start of the function where ftrace was tracing. Otherwise it 108 either contains garbage, or NULL. 109 110 111The ftrace FLAGS 112================ 113 114The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. 115Some of the flags are used for internal infrastructure of ftrace, but the 116ones that users should be aware of are the following: 117 118FTRACE_OPS_FL_SAVE_REGS 119 If the callback requires reading or modifying the pt_regs 120 passed to the callback, then it must set this flag. Registering 121 a ftrace_ops with this flag set on an architecture that does not 122 support passing of pt_regs to the callback will fail. 123 124FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 125 Similar to SAVE_REGS but the registering of a 126 ftrace_ops on an architecture that does not support passing of regs 127 will not fail with this flag set. But the callback must check if 128 regs is NULL or not to determine if the architecture supports it. 129 130FTRACE_OPS_FL_RECURSION_SAFE 131 By default, a wrapper is added around the callback to 132 make sure that recursion of the function does not occur. That is, 133 if a function that is called as a result of the callback's execution 134 is also traced, ftrace will prevent the callback from being called 135 again. But this wrapper adds some overhead, and if the callback is 136 safe from recursion, it can set this flag to disable the ftrace 137 protection. 138 139 Note, if this flag is set, and recursion does occur, it could cause 140 the system to crash, and possibly reboot via a triple fault. 141 142 It is OK if another callback traces a function that is called by a 143 callback that is marked recursion safe. Recursion safe callbacks 144 must never trace any function that are called by the callback 145 itself or any nested functions that those functions call. 146 147 If this flag is set, it is possible that the callback will also 148 be called with preemption enabled (when CONFIG_PREEMPT is set), 149 but this is not guaranteed. 150 151FTRACE_OPS_FL_IPMODIFY 152 Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" 153 the traced function (have another function called instead of the 154 traced function), it requires setting this flag. This is what live 155 kernel patches uses. Without this flag the pt_regs->ip can not be 156 modified. 157 158 Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be 159 registered to any given function at a time. 160 161FTRACE_OPS_FL_RCU 162 If this is set, then the callback will only be called by functions 163 where RCU is "watching". This is required if the callback function 164 performs any rcu_read_lock() operation. 165 166 RCU stops watching when the system goes idle, the time when a CPU 167 is taken down and comes back online, and when entering from kernel 168 to user space and back to kernel space. During these transitions, 169 a callback may be executed and RCU synchronization will not protect 170 it. 171 172 173Filtering which functions to trace 174================================== 175 176If a callback is only to be called from specific functions, a filter must be 177set up. The filters are added by name, or ip if it is known. 178 179.. code-block: c 180 181 int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, 182 int len, int reset); 183 184@ops 185 The ops to set the filter with 186 187@buf 188 The string that holds the function filter text. 189@len 190 The length of the string. 191 192@reset 193 Non-zero to reset all filters before applying this filter. 194 195Filters denote which functions should be enabled when tracing is enabled. 196If @buf is NULL and reset is set, all functions will be enabled for tracing. 197 198The @buf can also be a glob expression to enable all functions that 199match a specific pattern. 200 201See Filter Commands in :file:`Documentation/trace/ftrace.txt`. 202 203To just trace the schedule function:: 204 205.. code-block: c 206 207 ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); 208 209To add more functions, call the ftrace_set_filter() more than once with the 210@reset parameter set to zero. To remove the current filter set and replace it 211with new functions defined by @buf, have @reset be non-zero. 212 213To remove all the filtered functions and trace all functions:: 214 215.. code-block: c 216 217 ret = ftrace_set_filter(&ops, NULL, 0, 1); 218 219 220Sometimes more than one function has the same name. To trace just a specific 221function in this case, ftrace_set_filter_ip() can be used. 222 223.. code-block: c 224 225 ret = ftrace_set_filter_ip(&ops, ip, 0, 0); 226 227Although the ip must be the address where the call to fentry or mcount is 228located in the function. This function is used by perf and kprobes that 229gets the ip address from the user (usually using debug info from the kernel). 230 231If a glob is used to set the filter, functions can be added to a "notrace" 232list that will prevent those functions from calling the callback. 233The "notrace" list takes precedence over the "filter" list. If the 234two lists are non-empty and contain the same functions, the callback will not 235be called by any function. 236 237An empty "notrace" list means to allow all functions defined by the filter 238to be traced. 239 240.. code-block: c 241 242 int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, 243 int len, int reset); 244 245This takes the same parameters as ftrace_set_filter() but will add the 246functions it finds to not be traced. This is a separate list from the 247filter list, and this function does not modify the filter list. 248 249A non-zero @reset will clear the "notrace" list before adding functions 250that match @buf to it. 251 252Clearing the "notrace" list is the same as clearing the filter list 253 254.. code-block: c 255 256 ret = ftrace_set_notrace(&ops, NULL, 0, 1); 257 258The filter and notrace lists may be changed at any time. If only a set of 259functions should call the callback, it is best to set the filters before 260registering the callback. But the changes may also happen after the callback 261has been registered. 262 263If a filter is in place, and the @reset is non-zero, and @buf contains a 264matching glob to functions, the switch will happen during the time of 265the ftrace_set_filter() call. At no time will all functions call the callback. 266 267.. code-block: c 268 269 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 270 271 register_ftrace_function(&ops); 272 273 msleep(10); 274 275 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); 276 277is not the same as: 278 279.. code-block: c 280 281 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 282 283 register_ftrace_function(&ops); 284 285 msleep(10); 286 287 ftrace_set_filter(&ops, NULL, 0, 1); 288 289 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); 290 291As the latter will have a short time where all functions will call 292the callback, between the time of the reset, and the time of the 293new setting of the filter. 294