Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
ebeed8d4 |
| 22-Aug-2023 |
Masami Hiramatsu (Google) <mhiramat@kernel.org> |
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
Move generic function-proto find API and getting function parameter API to BTF library code from trace_probe.c. Th
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
Move generic function-proto find API and getting function parameter API to BTF library code from trace_probe.c. This will avoid redundant efforts on different feature.
Link: https://lore.kernel.org/all/169272155255.160970.719426926348706349.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33 |
|
#
334e5519 |
| 06-Jun-2023 |
Masami Hiramatsu (Google) <mhiramat@kernel.org> |
tracing/probes: Add fprobe events for tracing function entry and exit.
Add fprobe events for tracing function entry and exit instead of kprobe events. With this change, we can continue to trace func
tracing/probes: Add fprobe events for tracing function entry and exit.
Add fprobe events for tracing function entry and exit instead of kprobe events. With this change, we can continue to trace function entry/exit even if the CONFIG_KPROBES_ON_FTRACE is not available. Since CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS, it is not available if the architecture only supports CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not probe function entry/exit effectively on such architecture. But this can be solved if the dynamic events supports fprobe events.
The fprobe event is a new dynamic events which is only for the function (symbol) entry and exit. This event accepts non register fetch arguments so that user can trace the function arguments and return values.
The fprobe events syntax is here;
f[:[GRP/][EVENT]] FUNCTION [FETCHARGS] f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]
E.g.
# echo 'f vfs_read $arg1' >> dynamic_events # echo 'f vfs_read%return $retval' >> dynamic_events # cat dynamic_events f:fprobes/vfs_read__entry vfs_read arg1=$arg1 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval # echo 1 > events/fprobes/enable # head -n 20 trace | tail # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | sh-142 [005] ...1. 448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
Link: https://lore.kernel.org/all/168507469754.913472.6112857614708350210.stgit@mhiramat.roam.corp.google.com/
Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/all/202302011530.7vm4O8Ro-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
show more ...
|
Revision tags: v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
102227b9 |
| 29-Jul-2022 |
Daniel Bristot de Oliveira <bristot@kernel.org> |
rv: Add Runtime Verification (RV) interface
RV is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with
rv: Add Runtime Verification (RV) interface
RV is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with a more practical approach to complex systems.
RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. RV can give precise information on the runtime behavior of the monitored system while enabling the reaction for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems.
The development of this interface roots in the development of the paper:
De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva. Efficient formal verification for the Linux kernel. In: International Conference on Software Engineering and Formal Methods. Springer, Cham, 2019. p. 315-332.
And:
De Oliveira, Daniel Bristot. Automata-based formal analysis and verification of the real-time Linux kernel. PhD Thesis, 2020.
The RV interface resembles the tracing/ interface on purpose. The current path for the RV interface is /sys/kernel/tracing/rv/.
It presents these files:
"available_monitors" - List the available monitors, one per line.
For example: # cat available_monitors wip wwnr
"enabled_monitors" - Lists the enabled monitors, one per line; - Writing to it enables a given monitor; - Writing a monitor name with a '!' prefix disables it; - Truncating the file disables all enabled monitors.
For example: # cat enabled_monitors # echo wip > enabled_monitors # echo wwnr >> enabled_monitors # cat enabled_monitors wip wwnr # echo '!wip' >> enabled_monitors # cat enabled_monitors wwnr # echo > enabled_monitors # cat enabled_monitors #
Note that more than one monitor can be enabled concurrently.
"monitoring_on" - It is an on/off general switcher for monitoring. Note that it does not disable enabled monitors or detach events, but stop the per-entity monitors of monitoring the events received from the system. It resembles the "tracing_on" switcher.
"monitors/" Each monitor will have its one directory inside "monitors/". There the monitor specific files will be presented. The "monitors/" directory resembles the "events" directory on tracefs.
For example: # cd monitors/wip/ # ls desc enable # cat desc wakeup in preemptive per-cpu testing monitor. # cat enable 0
For further information, see the comments in the header of kernel/trace/rv/rv.c from this patch.
Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org
Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
show more ...
|
Revision tags: v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42 |
|
#
bb5eb8f3 |
| 23-May-2022 |
Congyu Liu <liu3101@purdue.edu> |
tracing: Disable kcov on trace_preemptirq.c
Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file
tracing: Disable kcov on trace_preemptirq.c
Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file to reduce random code coverage.
Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu
Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
show more ...
|
Revision tags: v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29 |
|
#
54ecbe6f |
| 15-Mar-2022 |
Masami Hiramatsu <mhiramat@kernel.org> |
rethook: Add a generic return hook
Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe.
Note that this is
rethook: Add a generic return hook
Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe.
Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
show more ...
|
#
cad9931f |
| 15-Mar-2022 |
Masami Hiramatsu <mhiramat@kernel.org> |
fprobe: Add ftrace based probe APIs
The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one
fprobe: Add ftrace based probe APIs
The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces,
- register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names.
The registered fprobes can be unregistered with unregister_fprobe(). e.g.
struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ...
ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets));
...
unregister_fprobe(&fp);
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
show more ...
|
Revision tags: v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16 |
|
#
7f5a08c7 |
| 18-Jan-2022 |
Beau Belgrave <beaub@linux.microsoft.com> |
user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and ftrace. Core outline of flow between user process, ioctl and trace_e
user_events: Add minimal support for trace_event into ftrace
Minimal support for interacting with dynamic events, trace_event and ftrace. Core outline of flow between user process, ioctl and trace_event APIs.
User mode processes that wish to use trace events to get data into ftrace, perf, eBPF, etc are limited to uprobes today. The user events features enables an ABI for user mode processes to create and write to trace events that are isolated from kernel level trace events. This enables a faster path for tracing from user mode data as well as opens managed code to participate in trace events, where stub locations are dynamic.
User processes often want to trace only when it's useful. To enable this a set of pages are mapped into the user process space that indicate the current state of the user events that have been registered. User processes can check if their event is hooked to a trace/probe, and if it is, emit the event data out via the write() syscall.
Two new files are introduced into tracefs to accomplish this: user_events_status - This file is mmap'd into participating user mode processes to indicate event status.
user_events_data - This file is opened and register/delete ioctl's are issued to create/open/delete trace events that can be used for tracing.
The typical scenario is on process start to mmap user_events_status. Processes then register the events they plan to use via the REG ioctl. The ioctl reads and updates the passed in user_reg struct. The status_index of the struct is used to know the byte in the status page to check for that event. The write_index of the struct is used to describe that event when writing out to the fd that was used for the ioctl call. The data must always include this index first when writing out data for an event. Data can be written either by write() or by writev().
For example, in memory: int index; char data[];
Psuedo code example of typical usage: struct user_reg reg;
int page_fd = open("user_events_status", O_RDWR); char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0); close(page_fd);
int data_fd = open("user_events_data", O_RDWR);
reg.size = sizeof(reg); reg.name_args = (__u64)"test";
ioctl(data_fd, DIAG_IOCSREG, ®); int status_id = reg.status_index; int write_id = reg.write_index;
struct iovec io[2]; io[0].iov_base = &write_id; io[0].iov_len = sizeof(write_id); io[1].iov_base = payload; io[1].iov_len = sizeof(payload);
if (page_data[status_id]) writev(data_fd, io, 2);
User events are also exposed via the dynamic_events tracefs file for both create and delete. Current status is exposed via the user_events_status tracefs file.
Simple example to register a user event via dynamic_events: echo u:test >> dynamic_events cat dynamic_events u:test
If an event is hooked to a probe, the probe hooked shows up: echo 1 > events/user_events/test/enable cat user_events_status 1:test # Used by ftrace
Active: 1 Busy: 1 Max: 4096
If an event is not hooked to a probe, no probe status shows up: echo 0 > events/user_events/test/enable cat user_events_status 1:test
Active: 1 Busy: 0 Max: 4096
Users can describe the trace event format via the following format: name[:FLAG1[,FLAG2...] [field1[;field2...]]
Each field has the following format: type name
Example for char array with a size of 20 named msg: echo 'u:detailed char[20] msg' >> dynamic_events cat dynamic_events u:detailed char[20] msg
Data offsets are based on the data written out via write() and will be updated to reflect the correct offset in the trace_event fields. For dynamic data it is recommended to use the new __rel_loc data type. This type will be the same as __data_loc, but the offset is relative to this entry. This allows user_events to not worry about what common fields are being inserted before the data.
The above format is valid for both the ioctl and the dynamic_events file.
Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
show more ...
|
Revision tags: v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8 |
|
#
6954e415 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic tha
tracing: Place trace_pid_list logic into abstract functions
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
d9777061 |
| 23-Sep-2021 |
Steven Rostedt (VMware) <rostedt@goodmis.org> |
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in
tracing: Place trace_pid_list logic into abstract functions
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ]
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users.
Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening.
The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61 |
|
#
7491e2c4 |
| 19-Aug-2021 |
Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> |
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can spec
tracing: Add a probe that attaches to trace events
A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe
Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield.
Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events
Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe.
Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
show more ...
|
Revision tags: v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46 |
|
#
bce29ac9 |
| 22-Jun-2021 |
Daniel Bristot de Oliveira <bristot@redhat.com> |
trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System Noise (*osnoise*) refers to the interference experienced by an application due to activities inside
trace: Add osnoise tracer
In the context of high-performance computing (HPC), the Operating System Noise (*osnoise*) refers to the interference experienced by an application due to activities inside the operating system. In the context of Linux, NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the system. Moreover, hardware-related jobs can also cause noise, for example, via SMIs.
The osnoise tracer leverages the hwlat_detector by running a similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing all the sources of *osnoise* during its execution. Using the same approach of hwlat, osnoise takes note of the entry and exit point of any source of interferences, increasing a per-cpu interference counter. The osnoise tracer also saves an interference counter for each source of interference. The interference counter for NMI, IRQs, SoftIRQs, and threads is increased anytime the tool observes these interferences' entry events. When a noise happens without any interference from the operating system level, the hardware noise counter increases, pointing to a hardware-related noise. In this way, osnoise can account for any source of interference. At the end of the period, the osnoise tracer prints the sum of all noise, the max single noise, the percentage of CPU available for the thread, and the counters for the noise sources.
Usage
Write the ASCII text "osnoise" into the current_tracer file of the tracing system (generally mounted at /sys/kernel/tracing).
For example::
[root@f32 ~]# cd /sys/kernel/tracing/ [root@f32 tracing]# echo osnoise > current_tracer
It is possible to follow the trace by reading the trace trace file::
[root@f32 tracing]# cat trace # tracer: osnoise # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth MAX # || / SINGLE Interference counters: # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+ # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD # | | | |||| | | | | | | | | | | <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1 <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3 <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21 <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0 <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41 <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2 <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1 <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19
In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the tracer prints a message at the end of each period for each CPU that is running an osnoise/CPU thread. The osnoise specific fields report:
- The RUNTIME IN USE reports the amount of time in microseconds that the osnoise thread kept looping reading the time. - The NOISE IN US reports the sum of noise in microseconds observed by the osnoise tracer during the associated runtime. - The % OF CPU AVAILABLE reports the percentage of CPU available for the osnoise thread during the runtime window. - The MAX SINGLE NOISE IN US reports the maximum single noise observed during the runtime window. - The Interference counters display how many each of the respective interference happened during the runtime window.
Note that the example above shows a high number of HW noise samples. The reason being is that this sample was taken on a virtual machine, and the host interference is detected as a hardware interference.
Tracer options
The tracer has a set of options inside the osnoise directory, they are:
- osnoise/cpus: CPUs at which a osnoise thread will execute. - osnoise/period_us: the period of the osnoise thread. - osnoise/runtime_us: how long an osnoise thread will look for noise. - osnoise/stop_tracing_us: stop the system tracing if a single noise higher than the configured value happens. Writing 0 disables this option. - osnoise/stop_tracing_total_us: stop the system tracing if total noise higher than the configured value happens. Writing 0 disables this option. - tracing_threshold: the minimum delta between two time() reads to be considered as noise, in us. When set to 0, the default value will be used, which is currently 5 us.
Additional Tracing
In addition to the tracer, a set of tracepoints were added to facilitate the identification of the osnoise source.
- osnoise:sample_threshold: printed anytime a noise is higher than the configurable tolerance_ns. - osnoise:nmi_noise: noise from NMI, including the duration. - osnoise:irq_noise: noise from an IRQ, including the duration. - osnoise:softirq_noise: noise from a SoftIRQ, including the duration. - osnoise:thread_noise: noise from a thread, including the duration.
Note that all the values are *net values*. For example, if while osnoise is running, another thread preempts the osnoise thread, it will start a thread_noise duration at the start. Then, an IRQ takes place, preempting the thread_noise, starting a irq_noise. When the IRQ ends its execution, it will compute its duration, and this duration will be subtracted from the thread_noise, in such a way as to avoid the double accounting of the IRQ execution. This logic is valid for all sources of noise.
Here is one example of the usage of these tracepoints::
osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2
In this example, a noise sample of 8 microseconds was reported in the last line, pointing to two interferences. Looking backward in the trace, the two previous entries were about the migration thread running after a timer IRQ execution. The first event is not part of the noise because it took place one millisecond before.
It is worth noticing that the sum of the duration reported in the tracepoints is smaller than eight us reported in the sample_threshold. The reason roots in the overhead of the entry and exit code that happens before and after any interference execution. This justifies the dual approach: measuring thread and tracing.
Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com
Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> [ Made the following functions static: trace_irqentry_callback() trace_irqexit_callback() trace_intel_irqentry_callback() trace_intel_irqexit_callback()
Added to include/trace.h: osnoise_arch_register() osnoise_arch_unregister()
Fixed define logic for LATENCY_FS_NOTIFY
Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
show more ...
|