1========================================= 2user_events: User-based Event Tracing 3========================================= 4 5:Author: Beau Belgrave 6 7Overview 8-------- 9User based trace events allow user processes to create events and trace data 10that can be viewed via existing tools, such as ftrace and perf. 11To enable this feature, build your kernel with CONFIG_USER_EVENTS=y. 12 13Programs can view status of the events via 14/sys/kernel/tracing/user_events_status and can both register and write 15data out via /sys/kernel/tracing/user_events_data. 16 17Programs can also use /sys/kernel/tracing/dynamic_events to register and 18delete user based events via the u: prefix. The format of the command to 19dynamic_events is the same as the ioctl with the u: prefix applied. 20 21Typically programs will register a set of events that they wish to expose to 22tools that can read trace_events (such as ftrace and perf). The registration 23process tells the kernel which address and bit to reflect if any tool has 24enabled the event and data should be written. The registration will give back 25a write index which describes the data when a write() or writev() is called 26on the /sys/kernel/tracing/user_events_data file. 27 28The structures referenced in this document are contained within the 29/include/uapi/linux/user_events.h file in the source tree. 30 31**NOTE:** *Both user_events_status and user_events_data are under the tracefs 32filesystem and may be mounted at different paths than above.* 33 34Registering 35----------- 36Registering within a user process is done via ioctl() out to the 37/sys/kernel/tracing/user_events_data file. The command to issue is 38DIAG_IOCSREG. 39 40This command takes a packed struct user_reg as an argument:: 41 42 struct user_reg { 43 /* Input: Size of the user_reg structure being used */ 44 __u32 size; 45 46 /* Input: Bit in enable address to use */ 47 __u8 enable_bit; 48 49 /* Input: Enable size in bytes at address */ 50 __u8 enable_size; 51 52 /* Input: Flags for future use, set to 0 */ 53 __u16 flags; 54 55 /* Input: Address to update when enabled */ 56 __u64 enable_addr; 57 58 /* Input: Pointer to string with event name, description and flags */ 59 __u64 name_args; 60 61 /* Output: Index of the event to use when writing data */ 62 __u32 write_index; 63 } __attribute__((__packed__)); 64 65The struct user_reg requires all the above inputs to be set appropriately. 66 67+ size: This must be set to sizeof(struct user_reg). 68 69+ enable_bit: The bit to reflect the event status at the address specified by 70 enable_addr. 71 72+ enable_size: The size of the value specified by enable_addr. 73 This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be 74 used on 64-bit kernels, however, 32-bit can be used on all kernels. 75 76+ flags: The flags to use, if any. For the initial version this must be 0. 77 Callers should first attempt to use flags and retry without flags to ensure 78 support for lower versions of the kernel. If a flag is not supported -EINVAL 79 is returned. 80 81+ enable_addr: The address of the value to use to reflect event status. This 82 must be naturally aligned and write accessible within the user program. 83 84+ name_args: The name and arguments to describe the event, see command format 85 for details. 86 87Upon successful registration the following is set. 88 89+ write_index: The index to use for this file descriptor that represents this 90 event when writing out data. The index is unique to this instance of the file 91 descriptor that was used for the registration. See writing data for details. 92 93User based events show up under tracefs like any other event under the 94subsystem named "user_events". This means tools that wish to attach to the 95events need to use /sys/kernel/tracing/events/user_events/[name]/enable 96or perf record -e user_events:[name] when attaching/recording. 97 98**NOTE:** The event subsystem name by default is "user_events". Callers should 99not assume it will always be "user_events". Operators reserve the right in the 100future to change the subsystem name per-process to accomodate event isolation. 101 102Command Format 103^^^^^^^^^^^^^^ 104The command string format is as follows:: 105 106 name[:FLAG1[,FLAG2...]] [Field1[;Field2...]] 107 108Supported Flags 109^^^^^^^^^^^^^^^ 110None yet 111 112Field Format 113^^^^^^^^^^^^ 114:: 115 116 type name [size] 117 118Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc). 119User programs are encouraged to use clearly sized types like u32. 120 121**NOTE:** *Long is not supported since size can vary between user and kernel.* 122 123The size is only valid for types that start with a struct prefix. 124This allows user programs to describe custom structs out to tools, if required. 125 126For example, a struct in C that looks like this:: 127 128 struct mytype { 129 char data[20]; 130 }; 131 132Would be represented by the following field:: 133 134 struct mytype myname 20 135 136Deleting 137-------- 138Deleting an event from within a user process is done via ioctl() out to the 139/sys/kernel/tracing/user_events_data file. The command to issue is 140DIAG_IOCSDEL. 141 142This command only requires a single string specifying the event to delete by 143its name. Delete will only succeed if there are no references left to the 144event (in both user and kernel space). User programs should use a separate file 145to request deletes than the one used for registration due to this. 146 147Unregistering 148------------- 149If after registering an event it is no longer wanted to be updated then it can 150be disabled via ioctl() out to the /sys/kernel/tracing/user_events_data file. 151The command to issue is DIAG_IOCSUNREG. This is different than deleting, where 152deleting actually removes the event from the system. Unregistering simply tells 153the kernel your process is no longer interested in updates to the event. 154 155This command takes a packed struct user_unreg as an argument:: 156 157 struct user_unreg { 158 /* Input: Size of the user_unreg structure being used */ 159 __u32 size; 160 161 /* Input: Bit to unregister */ 162 __u8 disable_bit; 163 164 /* Input: Reserved, set to 0 */ 165 __u8 __reserved; 166 167 /* Input: Reserved, set to 0 */ 168 __u16 __reserved2; 169 170 /* Input: Address to unregister */ 171 __u64 disable_addr; 172 } __attribute__((__packed__)); 173 174The struct user_unreg requires all the above inputs to be set appropriately. 175 176+ size: This must be set to sizeof(struct user_unreg). 177 178+ disable_bit: This must be set to the bit to disable (same bit that was 179 previously registered via enable_bit). 180 181+ disable_addr: This must be set to the address to disable (same address that was 182 previously registered via enable_addr). 183 184**NOTE:** Events are automatically unregistered when execve() is invoked. During 185fork() the registered events will be retained and must be unregistered manually 186in each process if wanted. 187 188Status 189------ 190When tools attach/record user based events the status of the event is updated 191in realtime. This allows user programs to only incur the cost of the write() or 192writev() calls when something is actively attached to the event. 193 194The kernel will update the specified bit that was registered for the event as 195tools attach/detach from the event. User programs simply check if the bit is set 196to see if something is attached or not. 197 198Administrators can easily check the status of all registered events by reading 199the user_events_status file directly via a terminal. The output is as follows:: 200 201 Name [# Comments] 202 ... 203 204 Active: ActiveCount 205 Busy: BusyCount 206 207For example, on a system that has a single event the output looks like this:: 208 209 test 210 211 Active: 1 212 Busy: 0 213 214If a user enables the user event via ftrace, the output would change to this:: 215 216 test # Used by ftrace 217 218 Active: 1 219 Busy: 1 220 221Writing Data 222------------ 223After registering an event the same fd that was used to register can be used 224to write an entry for that event. The write_index returned must be at the start 225of the data, then the remaining data is treated as the payload of the event. 226 227For example, if write_index returned was 1 and I wanted to write out an int 228payload of the event. Then the data would have to be 8 bytes (2 ints) in size, 229with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the 230value I want as the payload. 231 232In memory this would look like this:: 233 234 int index; 235 int payload; 236 237User programs might have well known structs that they wish to use to emit out 238as payloads. In those cases writev() can be used, with the first vector being 239the index and the following vector(s) being the actual event payload. 240 241For example, if I have a struct like this:: 242 243 struct payload { 244 int src; 245 int dst; 246 int flags; 247 } __attribute__((__packed__)); 248 249It's advised for user programs to do the following:: 250 251 struct iovec io[2]; 252 struct payload e; 253 254 io[0].iov_base = &write_index; 255 io[0].iov_len = sizeof(write_index); 256 io[1].iov_base = &e; 257 io[1].iov_len = sizeof(e); 258 259 writev(fd, (const struct iovec*)io, 2); 260 261**NOTE:** *The write_index is not emitted out into the trace being recorded.* 262 263Example Code 264------------ 265See sample code in samples/user_events. 266