xref: /openbmc/linux/Documentation/trace/user_events.rst (revision c900529f3d9161bfde5cca0754f83b4d3c3e0220)
1864ea0e1SBeau Belgrave=========================================
2864ea0e1SBeau Belgraveuser_events: User-based Event Tracing
3864ea0e1SBeau Belgrave=========================================
4864ea0e1SBeau Belgrave
5864ea0e1SBeau Belgrave:Author: Beau Belgrave
6864ea0e1SBeau Belgrave
7864ea0e1SBeau BelgraveOverview
8864ea0e1SBeau Belgrave--------
9864ea0e1SBeau BelgraveUser based trace events allow user processes to create events and trace data
10768c1e7fSBeau Belgravethat can be viewed via existing tools, such as ftrace and perf.
11864ea0e1SBeau BelgraveTo enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
12864ea0e1SBeau Belgrave
13864ea0e1SBeau BelgravePrograms can view status of the events via
142abfcd29SRoss Zwisler/sys/kernel/tracing/user_events_status and can both register and write
152abfcd29SRoss Zwislerdata out via /sys/kernel/tracing/user_events_data.
16864ea0e1SBeau Belgrave
17864ea0e1SBeau BelgraveTypically programs will register a set of events that they wish to expose to
18864ea0e1SBeau Belgravetools that can read trace_events (such as ftrace and perf). The registration
1927dc2ae7SBeau Belgraveprocess tells the kernel which address and bit to reflect if any tool has
2027dc2ae7SBeau Belgraveenabled the event and data should be written. The registration will give back
2127dc2ae7SBeau Belgravea write index which describes the data when a write() or writev() is called
2227dc2ae7SBeau Belgraveon the /sys/kernel/tracing/user_events_data file.
23864ea0e1SBeau Belgrave
24933678b6SBeau BelgraveThe structures referenced in this document are contained within the
25933678b6SBeau Belgrave/include/uapi/linux/user_events.h file in the source tree.
26864ea0e1SBeau Belgrave
27864ea0e1SBeau Belgrave**NOTE:** *Both user_events_status and user_events_data are under the tracefs
28864ea0e1SBeau Belgravefilesystem and may be mounted at different paths than above.*
29864ea0e1SBeau Belgrave
30864ea0e1SBeau BelgraveRegistering
31864ea0e1SBeau Belgrave-----------
32864ea0e1SBeau BelgraveRegistering within a user process is done via ioctl() out to the
332abfcd29SRoss Zwisler/sys/kernel/tracing/user_events_data file. The command to issue is
34864ea0e1SBeau BelgraveDIAG_IOCSREG.
35864ea0e1SBeau Belgrave
36933678b6SBeau BelgraveThis command takes a packed struct user_reg as an argument::
37864ea0e1SBeau Belgrave
38864ea0e1SBeau Belgrave  struct user_reg {
3927dc2ae7SBeau Belgrave        /* Input: Size of the user_reg structure being used */
4027dc2ae7SBeau Belgrave        __u32 size;
41864ea0e1SBeau Belgrave
4227dc2ae7SBeau Belgrave        /* Input: Bit in enable address to use */
4327dc2ae7SBeau Belgrave        __u8 enable_bit;
4427dc2ae7SBeau Belgrave
4527dc2ae7SBeau Belgrave        /* Input: Enable size in bytes at address */
4627dc2ae7SBeau Belgrave        __u8 enable_size;
4727dc2ae7SBeau Belgrave
4827dc2ae7SBeau Belgrave        /* Input: Flags for future use, set to 0 */
4927dc2ae7SBeau Belgrave        __u16 flags;
5027dc2ae7SBeau Belgrave
5127dc2ae7SBeau Belgrave        /* Input: Address to update when enabled */
5227dc2ae7SBeau Belgrave        __u64 enable_addr;
5327dc2ae7SBeau Belgrave
5427dc2ae7SBeau Belgrave        /* Input: Pointer to string with event name, description and flags */
5527dc2ae7SBeau Belgrave        __u64 name_args;
5627dc2ae7SBeau Belgrave
5727dc2ae7SBeau Belgrave        /* Output: Index of the event to use when writing data */
5827dc2ae7SBeau Belgrave        __u32 write_index;
5927dc2ae7SBeau Belgrave  } __attribute__((__packed__));
6027dc2ae7SBeau Belgrave
6127dc2ae7SBeau BelgraveThe struct user_reg requires all the above inputs to be set appropriately.
6227dc2ae7SBeau Belgrave
6327dc2ae7SBeau Belgrave+ size: This must be set to sizeof(struct user_reg).
6427dc2ae7SBeau Belgrave
6527dc2ae7SBeau Belgrave+ enable_bit: The bit to reflect the event status at the address specified by
6627dc2ae7SBeau Belgrave  enable_addr.
6727dc2ae7SBeau Belgrave
6827dc2ae7SBeau Belgrave+ enable_size: The size of the value specified by enable_addr.
6927dc2ae7SBeau Belgrave  This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be
7027dc2ae7SBeau Belgrave  used on 64-bit kernels, however, 32-bit can be used on all kernels.
7127dc2ae7SBeau Belgrave
7227dc2ae7SBeau Belgrave+ flags: The flags to use, if any. For the initial version this must be 0.
7327dc2ae7SBeau Belgrave  Callers should first attempt to use flags and retry without flags to ensure
7427dc2ae7SBeau Belgrave  support for lower versions of the kernel. If a flag is not supported -EINVAL
7527dc2ae7SBeau Belgrave  is returned.
7627dc2ae7SBeau Belgrave
7727dc2ae7SBeau Belgrave+ enable_addr: The address of the value to use to reflect event status. This
7827dc2ae7SBeau Belgrave  must be naturally aligned and write accessible within the user program.
7927dc2ae7SBeau Belgrave
8027dc2ae7SBeau Belgrave+ name_args: The name and arguments to describe the event, see command format
8127dc2ae7SBeau Belgrave  for details.
8227dc2ae7SBeau Belgrave
8327dc2ae7SBeau BelgraveUpon successful registration the following is set.
8427dc2ae7SBeau Belgrave
8527dc2ae7SBeau Belgrave+ write_index: The index to use for this file descriptor that represents this
8627dc2ae7SBeau Belgrave  event when writing out data. The index is unique to this instance of the file
8727dc2ae7SBeau Belgrave  descriptor that was used for the registration. See writing data for details.
88864ea0e1SBeau Belgrave
89864ea0e1SBeau BelgraveUser based events show up under tracefs like any other event under the
90864ea0e1SBeau Belgravesubsystem named "user_events". This means tools that wish to attach to the
912abfcd29SRoss Zwislerevents need to use /sys/kernel/tracing/events/user_events/[name]/enable
92864ea0e1SBeau Belgraveor perf record -e user_events:[name] when attaching/recording.
93864ea0e1SBeau Belgrave
9427dc2ae7SBeau Belgrave**NOTE:** The event subsystem name by default is "user_events". Callers should
9527dc2ae7SBeau Belgravenot assume it will always be "user_events". Operators reserve the right in the
96*d56b699dSBjorn Helgaasfuture to change the subsystem name per-process to accommodate event isolation.
97864ea0e1SBeau Belgrave
98864ea0e1SBeau BelgraveCommand Format
99864ea0e1SBeau Belgrave^^^^^^^^^^^^^^
100864ea0e1SBeau BelgraveThe command string format is as follows::
101864ea0e1SBeau Belgrave
102864ea0e1SBeau Belgrave  name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
103864ea0e1SBeau Belgrave
104864ea0e1SBeau BelgraveSupported Flags
105864ea0e1SBeau Belgrave^^^^^^^^^^^^^^^
106768c1e7fSBeau BelgraveNone yet
107864ea0e1SBeau Belgrave
108864ea0e1SBeau BelgraveField Format
109864ea0e1SBeau Belgrave^^^^^^^^^^^^
110864ea0e1SBeau Belgrave::
111864ea0e1SBeau Belgrave
112864ea0e1SBeau Belgrave  type name [size]
113864ea0e1SBeau Belgrave
114864ea0e1SBeau BelgraveBasic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
115864ea0e1SBeau BelgraveUser programs are encouraged to use clearly sized types like u32.
116864ea0e1SBeau Belgrave
117864ea0e1SBeau Belgrave**NOTE:** *Long is not supported since size can vary between user and kernel.*
118864ea0e1SBeau Belgrave
119864ea0e1SBeau BelgraveThe size is only valid for types that start with a struct prefix.
120864ea0e1SBeau BelgraveThis allows user programs to describe custom structs out to tools, if required.
121864ea0e1SBeau Belgrave
122864ea0e1SBeau BelgraveFor example, a struct in C that looks like this::
123864ea0e1SBeau Belgrave
124864ea0e1SBeau Belgrave  struct mytype {
125864ea0e1SBeau Belgrave    char data[20];
126864ea0e1SBeau Belgrave  };
127864ea0e1SBeau Belgrave
128864ea0e1SBeau BelgraveWould be represented by the following field::
129864ea0e1SBeau Belgrave
130864ea0e1SBeau Belgrave  struct mytype myname 20
131864ea0e1SBeau Belgrave
132864ea0e1SBeau BelgraveDeleting
13327dc2ae7SBeau Belgrave--------
134864ea0e1SBeau BelgraveDeleting an event from within a user process is done via ioctl() out to the
1352abfcd29SRoss Zwisler/sys/kernel/tracing/user_events_data file. The command to issue is
136864ea0e1SBeau BelgraveDIAG_IOCSDEL.
137864ea0e1SBeau Belgrave
138864ea0e1SBeau BelgraveThis command only requires a single string specifying the event to delete by
139864ea0e1SBeau Belgraveits name. Delete will only succeed if there are no references left to the
140864ea0e1SBeau Belgraveevent (in both user and kernel space). User programs should use a separate file
141864ea0e1SBeau Belgraveto request deletes than the one used for registration due to this.
142864ea0e1SBeau Belgrave
1430113d461SBeau Belgrave**NOTE:** By default events will auto-delete when there are no references left
1440113d461SBeau Belgraveto the event. Flags in the future may change this logic.
1450113d461SBeau Belgrave
14627dc2ae7SBeau BelgraveUnregistering
14727dc2ae7SBeau Belgrave-------------
14827dc2ae7SBeau BelgraveIf after registering an event it is no longer wanted to be updated then it can
14927dc2ae7SBeau Belgravebe disabled via ioctl() out to the /sys/kernel/tracing/user_events_data file.
15027dc2ae7SBeau BelgraveThe command to issue is DIAG_IOCSUNREG. This is different than deleting, where
15127dc2ae7SBeau Belgravedeleting actually removes the event from the system. Unregistering simply tells
15227dc2ae7SBeau Belgravethe kernel your process is no longer interested in updates to the event.
15327dc2ae7SBeau Belgrave
15427dc2ae7SBeau BelgraveThis command takes a packed struct user_unreg as an argument::
15527dc2ae7SBeau Belgrave
15627dc2ae7SBeau Belgrave  struct user_unreg {
15727dc2ae7SBeau Belgrave        /* Input: Size of the user_unreg structure being used */
15827dc2ae7SBeau Belgrave        __u32 size;
15927dc2ae7SBeau Belgrave
16027dc2ae7SBeau Belgrave        /* Input: Bit to unregister */
16127dc2ae7SBeau Belgrave        __u8 disable_bit;
16227dc2ae7SBeau Belgrave
16327dc2ae7SBeau Belgrave        /* Input: Reserved, set to 0 */
16427dc2ae7SBeau Belgrave        __u8 __reserved;
16527dc2ae7SBeau Belgrave
16627dc2ae7SBeau Belgrave        /* Input: Reserved, set to 0 */
16727dc2ae7SBeau Belgrave        __u16 __reserved2;
16827dc2ae7SBeau Belgrave
16927dc2ae7SBeau Belgrave        /* Input: Address to unregister */
17027dc2ae7SBeau Belgrave        __u64 disable_addr;
17127dc2ae7SBeau Belgrave  } __attribute__((__packed__));
17227dc2ae7SBeau Belgrave
17327dc2ae7SBeau BelgraveThe struct user_unreg requires all the above inputs to be set appropriately.
17427dc2ae7SBeau Belgrave
17527dc2ae7SBeau Belgrave+ size: This must be set to sizeof(struct user_unreg).
17627dc2ae7SBeau Belgrave
17727dc2ae7SBeau Belgrave+ disable_bit: This must be set to the bit to disable (same bit that was
17827dc2ae7SBeau Belgrave  previously registered via enable_bit).
17927dc2ae7SBeau Belgrave
18027dc2ae7SBeau Belgrave+ disable_addr: This must be set to the address to disable (same address that was
18127dc2ae7SBeau Belgrave  previously registered via enable_addr).
18227dc2ae7SBeau Belgrave
18327dc2ae7SBeau Belgrave**NOTE:** Events are automatically unregistered when execve() is invoked. During
18427dc2ae7SBeau Belgravefork() the registered events will be retained and must be unregistered manually
18527dc2ae7SBeau Belgravein each process if wanted.
18627dc2ae7SBeau Belgrave
187864ea0e1SBeau BelgraveStatus
188864ea0e1SBeau Belgrave------
189864ea0e1SBeau BelgraveWhen tools attach/record user based events the status of the event is updated
190864ea0e1SBeau Belgravein realtime. This allows user programs to only incur the cost of the write() or
191864ea0e1SBeau Belgravewritev() calls when something is actively attached to the event.
192864ea0e1SBeau Belgrave
19327dc2ae7SBeau BelgraveThe kernel will update the specified bit that was registered for the event as
19427dc2ae7SBeau Belgravetools attach/detach from the event. User programs simply check if the bit is set
19527dc2ae7SBeau Belgraveto see if something is attached or not.
196864ea0e1SBeau Belgrave
197864ea0e1SBeau BelgraveAdministrators can easily check the status of all registered events by reading
198864ea0e1SBeau Belgravethe user_events_status file directly via a terminal. The output is as follows::
199864ea0e1SBeau Belgrave
20027dc2ae7SBeau Belgrave  Name [# Comments]
201864ea0e1SBeau Belgrave  ...
202864ea0e1SBeau Belgrave
203864ea0e1SBeau Belgrave  Active: ActiveCount
204864ea0e1SBeau Belgrave  Busy: BusyCount
205864ea0e1SBeau Belgrave
206864ea0e1SBeau BelgraveFor example, on a system that has a single event the output looks like this::
207864ea0e1SBeau Belgrave
20827dc2ae7SBeau Belgrave  test
209864ea0e1SBeau Belgrave
210864ea0e1SBeau Belgrave  Active: 1
211864ea0e1SBeau Belgrave  Busy: 0
212864ea0e1SBeau Belgrave
213864ea0e1SBeau BelgraveIf a user enables the user event via ftrace, the output would change to this::
214864ea0e1SBeau Belgrave
21527dc2ae7SBeau Belgrave  test # Used by ftrace
216864ea0e1SBeau Belgrave
217864ea0e1SBeau Belgrave  Active: 1
218864ea0e1SBeau Belgrave  Busy: 1
219864ea0e1SBeau Belgrave
220864ea0e1SBeau BelgraveWriting Data
221864ea0e1SBeau Belgrave------------
222864ea0e1SBeau BelgraveAfter registering an event the same fd that was used to register can be used
223864ea0e1SBeau Belgraveto write an entry for that event. The write_index returned must be at the start
224864ea0e1SBeau Belgraveof the data, then the remaining data is treated as the payload of the event.
225864ea0e1SBeau Belgrave
226864ea0e1SBeau BelgraveFor example, if write_index returned was 1 and I wanted to write out an int
227864ea0e1SBeau Belgravepayload of the event. Then the data would have to be 8 bytes (2 ints) in size,
228864ea0e1SBeau Belgravewith the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
229864ea0e1SBeau Belgravevalue I want as the payload.
230864ea0e1SBeau Belgrave
231864ea0e1SBeau BelgraveIn memory this would look like this::
232864ea0e1SBeau Belgrave
233864ea0e1SBeau Belgrave  int index;
234864ea0e1SBeau Belgrave  int payload;
235864ea0e1SBeau Belgrave
236864ea0e1SBeau BelgraveUser programs might have well known structs that they wish to use to emit out
237864ea0e1SBeau Belgraveas payloads. In those cases writev() can be used, with the first vector being
238864ea0e1SBeau Belgravethe index and the following vector(s) being the actual event payload.
239864ea0e1SBeau Belgrave
240864ea0e1SBeau BelgraveFor example, if I have a struct like this::
241864ea0e1SBeau Belgrave
242864ea0e1SBeau Belgrave  struct payload {
243864ea0e1SBeau Belgrave        int src;
244864ea0e1SBeau Belgrave        int dst;
245864ea0e1SBeau Belgrave        int flags;
24627dc2ae7SBeau Belgrave  } __attribute__((__packed__));
247864ea0e1SBeau Belgrave
248864ea0e1SBeau BelgraveIt's advised for user programs to do the following::
249864ea0e1SBeau Belgrave
250864ea0e1SBeau Belgrave  struct iovec io[2];
251864ea0e1SBeau Belgrave  struct payload e;
252864ea0e1SBeau Belgrave
253864ea0e1SBeau Belgrave  io[0].iov_base = &write_index;
254864ea0e1SBeau Belgrave  io[0].iov_len = sizeof(write_index);
255864ea0e1SBeau Belgrave  io[1].iov_base = &e;
256864ea0e1SBeau Belgrave  io[1].iov_len = sizeof(e);
257864ea0e1SBeau Belgrave
258864ea0e1SBeau Belgrave  writev(fd, (const struct iovec*)io, 2);
259864ea0e1SBeau Belgrave
260864ea0e1SBeau Belgrave**NOTE:** *The write_index is not emitted out into the trace being recorded.*
261864ea0e1SBeau Belgrave
262864ea0e1SBeau BelgraveExample Code
263864ea0e1SBeau Belgrave------------
264864ea0e1SBeau BelgraveSee sample code in samples/user_events.
265